Download guide
This section provides guidelines on the necessary steps to download data from BigPicture.
Get access to a dataset
The first step to download data from Big Picture is to get access to the dataset the user is interested in. This can be done by visiting REMS here. Log in, choose the datasets of interest and add them to the cart. After that, the user can apply for access, fill in the form and send the application. The user needs to wait for approval from the data access committee (DAC) and after it has been approved the user can download the datasets.
Preparation for downloading data
Download configuration file
Before downloading the data, the user needs to download the configuration file by logging in here. Follow the dialogue to get authenticated and then click on Download credentials to access the Archive to download the configuration file named s3cmd-download.conf.
Install the sda-cli tool
Follow the guidelines here to install the latest version of sda-cli tool. The following examples are tested in SDA-CLI v0.3.3.
Generate the public and secret key
The initial step involves creating a Crypt4GH key pair using the sda-cli tool:
./sda-cli createKey <keypair_name>where <keypair_name> is the base name of the key files. The above command will create two key files named keypair_name.pub.pem and keypair_name.sec.pem. The public key (pub) will be used alongside with sda-cli and will be used by the system for encryption of the files before downloading, while the private one (sec) will be used by the requester for decrypting the files after downloading.
Check access
After the user has been granted access to the dataset, the user can check access to the dataset by listing the datasets and their files using the sda-cli.
Note: There are currently two URLs for the download service:
- https://download.bp.nbis.se: Points to the old storage bucket.
- https://download2.bp.nbis.se: Points to the new storage bucket.
We are developing a unified solution that will eventually require only a single URL, regardless of where the data is stored. Until then, please ensure you use the correct URL based on the location of your data files. For listing, both URLs can be used to list the files in a dataset. For downloading, you must use the specific URL where the files are stored.
If you use the incorrect URL during a download, the process will fail with an unexpected EOF error:
Error: failed to get file for download, reason: failed to get response, reason: Get "...": unexpected EOF
If you encounter this error, switch to the alternative URL and retry the download.
List datasets and files
For listing the datasets that the user has access to, the user needs to run:
./sda-cli --config s3cmd-download.conf list --datasets --url https://download.bp.nbis.se (--bytes)For listing the files of a specific dataset, the user needs to run:
./sda-cli --config s3cmd-download.conf list --dataset <DatasetID> --url https://download.bp.nbis.se (--bytes)where <DatasetID> is the ID of the dataset for which the user wants to list the files. The --bytes flag is optional; when enabled, the file sizes will be shown in bytes instead of a human-readable format. The dataset ID can be found by running the previous command.
Download data
Download file(s)
After having acquired access to the datasets, the configuration file and the sda-cli tool, the user can download the encrypted data. The user needs to provide the public key that was generated earlier, as well as the configuration file.
To download the data:
./sda-cli --config s3cmd-download.conf download --pubkey <public-key-file> --dataset-id <DatasetID> --url https://download.bp.nbis.se --outdir </path/to/output/directory> <filepath_1_to_download> <filepath_2_to_download> ...where:
<public-key-file>is the public key file that was generated earlier (<keypair_name>.pub.pem)<DatasetID>is the ID of the dataset for which the user wants to download the files</path/to/output/directory>is the path to the directory where the files will be downloaded<filepath_*_to_download>are the file paths of the files which can be found by listing the files of the dataset as described above
Download multiple files from a list
For bulk downloads, you can provide a list of file paths in a text file (e.g., filepaths.txt) instead of listing them individually.
./sda-cli --config s3cmd-download.conf download --pubkey <public-key-file> --dataset-id <DatasetID> --url https://download.bp.nbis.se --outdir </path/to/output/directory> --from-file filepaths.txtNote: The filepaths.txt should contain one relative file path per line.
Download all files of a dataset
To download all files associated with a specific dataset, use the --dataset flag.
./sda-cli --config s3cmd-download.conf download --pubkey <public-key-file> --dataset-id <DatasetID> --url https://download.bp.nbis.se --outdir </path/to/output/directory> --datasetSkip existing files
Use the --continue flag to skip files already present in the output directory. This is ideal for resuming interrupted downloads or updating large datasets.
Note: This flag skips completed files only; partially downloaded files will be overwritten from the beginning rather than resumed from the break point.
# Example 1: Resuming from a list of specific file paths
./sda-cli --config s3cmd-download.conf download --pubkey <public-key-file> --dataset-id <DatasetID> --url https://download.bp.nbis.se --outdir </path/to/output/directory> --from-file filepaths.txt --continue
# Example 2: Resuming an entire dataset download
./sda-cli --config s3cmd-download.conf download --pubkey <public-key-file> --dataset-id <DatasetID> --url https://download.bp.nbis.se --outdir </path/to/output/directory> --dataset --continueDecrypt the data
After downloading the encrypted data, the user can decrypt the files using the private key that was generated earlier by running:
./sda-cli decrypt --key <keypair_name>.sec.pem </path/to/encrypted/file>where </path/to/encrypted/file> is the path to the encrypted file that the user wants to decrypt and <keypair_name>.sec.pem is the private key file that was generated earlier.
Report new problems
If you identify any issues with the downloaded dataset, please contact the Honest Broker Taskforce at submit@bigpicture.nl.