Download guide
This section provides guidelines on the necessary steps to download data from BigPicture.
Get access to a dataset
The first step to download data from Big Picture is to get access to the dataset the user is interested in. This can be done by visiting REMS here. Log in, choose the datasets of interest and add them to the cart. After that, the user can apply for access, fill in the form and send the application. The user needs to wait for approval from the data access committee (DAC) and after it has been approved the user can download the datasets.
Preparation for downloading data
Download configuration file
Before downloading the data, the user needs to download the configuration file by logging in here. Follow the dialogue to get authenticated and then click on Download inbox s3cmd credentials to download the configuration file named s3cmd.conf.
Install the sda-cli tool
Follow the guidelines here to install the latest version of sda-cli tool. The following examples are tested in SDA-CLI v0.3.1.
Generate the public and secret key
The initial step involves creating a crypt4gh keypair using the sda-cli:
./sda-cli createKey <keypair_name>where <keypair_name> is the base name of the key files. The above command will create two key files named keypair_name.pub.pem and keypair_name.sec.pem. The public key (pub) will be used alongside with sda-cli and will be used by the system for encryption of the files before downloading, while the private one (sec) will be used by the requester for decrypting the files after downloading.
Check access
After the user has been granted access to the dataset, the user can check access to the dataset by listing the datasets and their files using the sda-cli.
Note: There are currently two URLs for the download service:
- https://download.bp.nbis.se: Points to the old storage bucket.
- https://download2.bp.nbis.se: Points to the new storage bucket.
We are developing a unified solution that will eventually require only a single URL, regardless of where the data is stored. Until then, please ensure you use the correct URL based on the location of your data files. For listing, both URLs can be used to list the files in a dataset. For downloading, you must use the specific URL where the files are stored.
If you use the incorrect URL during a download, the process will fail with an unexpected EOF error:
Error: failed to get file for download, reason: failed to get response, reason: Get "...": unexpected EOF
If you encounter this error, switch to the alternative URL and retry the download.
List datasets and files
For listing the datasets that the user has access to, the user needs to run:
./sda-cli --config s3cmd.conf list --datasets --url https://download.bp.nbis.se (--bytes)For listing the files of a specific dataset, the user needs to run:
./sda-cli --config s3cmd.conf list --dataset <DatasetID> --url https://download.bp.nbis.se (--bytes)where <DatasetID> is the ID of the dataset for which the user wants to list the files. The --bytes flag is optional; when enabled, the file sizes will be shown in bytes instead of a human-readable format. The dataset ID can be found by running the previous command.
Download data
Download file(s)
After having acquired access to the datasets, the configuration file and the sda-cli tool, the user can download the encrypted data. The user needs to provide the public key that was generated earlier, as well as the configuration file.
To download the data:
./sda-cli --config s3cmd.conf download --pubkey <public-key-file> --dataset-id <DatasetID> --url https://download.bp.nbis.se --outdir </path/to/output/directory> <filepath_1_to_download> <filepath_2_to_download> ...where:
<public-key-file>is the public key file that was generated earlier (<keypair_name>.pub.pem)<DatasetID>is the ID of the dataset for which the user wants to download the files</path/to/output/directory>is the path to the directory where the files will be downloaded<filepath_*_to_download>are the file paths of the files which can be found by listing the files of the dataset as described above
Download multiple files from a list
For bulk downloads, you can provide a list of file paths in a text file (e.g., filepaths.txt) instead of listing them individually.
./sda-cli --config s3cmd.conf download --pubkey <public-key-file> --dataset-id <DatasetID> --url https://download.bp.nbis.se --outdir </path/to/output/directory> --from-file filepaths.txtNote: The filepaths.txt should contain one relative file path per line.
Download all files for a dataset
To download an entire dataset, follow this two-step process. First, list all files in the dataset and save their paths to a file by running the following command:
./sda-cli --config s3cmd.conf list --datasets --url https://download.bp.nbis.se | awk '{hold=$4} NR>1{print prev} {prev=hold}' > filepaths.txtthen download them by using the saved file from the previous step:
./sda-cli -config s3cmd.conf download -pubkey <public-key-file> -dataset-id <DatasetID> --url https://download.bp.nbis.se -outdir </path/to/output/directory> --from-file filepaths.txtDecrypt the data
After downloading the encrypted data, the user can decrypt the files using the private key that was generated earlier by running:
./sda-cli decrypt --key <keypair_name>.sec.pem </path/to/encrypted/file>where </path/to/encrypted/file> is the path to the encrypted file that the user wants to decrypt and <keypair_name>.sec.pem is the private key file that was generated earlier.
Known problems
The
--datasetflag for downloading an entire dataset is currently not working. As a workaround, please use the previous two-step process to download all files within a dataset.The
--recursiveflag for downloading a given folder recursively is currently not functioning as intended. We are actively developing a fix, which will be included in an upcoming release.