Download guide
This section provides guidelines on the necessary steps to download data from BigPicture.
Get access to a dataset
The first step to download data from Big Picture is to get access to the dataset the user is interested in. This can be done by visiting REMS here. Log in, choose the datasets of interest and add them to the cart. After that, the user can apply for access, fill in the form and send the application. The user needs to wait for approval from the data access committee (DAC) and after it has been approved the user can download the datasets.
Preparation for downloading data
Download configuration file
Before downloading the data, the user needs to download the configuration file by logging in here. Follow the dialogue to get authenticated and then click on Download inbox s3cmd credentials
to download the configuration file named s3cmd.conf
.
Install the sda-cli tool
Follow the guidelines here to install the sda-cli tool.
Generate the public and secret key
The initial step involves creating a crypt4gh keypair using the sda-cli:
./sda-cli createKey <keypair_name>
where <keypair_name>
is the base name of the key files. The above command will create two key files named keypair_name.pub.pem
and keypair_name.sec.pem
. The public key (pub
) will be used alongside with sda-cli
and will be used by the system for encryption of the files before downloading, while the private one (sec
) will be used by the requester for decrypting the files after downloading.
Check access
After the user has been granted access to the dataset, the user can check access to the dataset by listing the datasets and their files using the sda-cli
. For listing the datasets that the user has access to, the user needs to run:
./sda-cli list -config s3cmd.conf --datasets --url https://download.bp.nbis.se (--bytes)
For listing the files of a specific dataset, the user needs to run:
./sda-cli list -config s3cmd.conf -dataset <DatasetID> --url https://download.bp.nbis.se (--bytes)
where <DatasetID>
is the ID of the dataset for which the user wants to list the files. The dataset ID can be found by running the previous command.
Download data
After having acquired access to the datasets, the configuration file and the sda-cli tool, the user can download the encrypted data. The user needs to provide the public key that was generated earlier, as well as the configuration file.
To download the data:
./sda-cli download -config s3cmd.conf -pubkey <public-key-file> -dataset-id <DatasetID> --url https://download.bp.nbis.se -outdir </path/to/output/directory> <filepath_1_to_download> <filepath_2_to_download> ...
where:
<public-key-file>
is the public key file that was generated earlier (<keypair_name>.pub.pem
)<DatasetID>
is the ID of the dataset for which the user wants to download the files</path/to/output/directory>
is the path to the directory where the files will be downloaded<filepath_*_to_download>
are the file paths of the files which can be found by listing the files of the dataset as described above
Decrypt the data
After downloading the encrypted data, the user can decrypt the files using the private key that was generated earlier by running:
./sda-cli decrypt -key <keypair_name>.sec.pem </path/to/encrypted/file>
where </path/to/encrypted/file>
is the path to the encrypted file that the user wants to decrypt and <keypair_name>.sec.pem
is the private key file that was generated earlier.