Preparation guide
Before you submit your data, please make sure
- The root folder of the submission should actually be the dataset folder which includes several subfolders. See the example of structure folder below:
DATASET_{IDENTIFIER}
|--- METADATA
| |--- dataset.xml (contains: Dataset)
| |--- policy.xml (contains: Policy)
| |--- image.xml (contains: Images)
| |--- annotation.xml (contains: Annotations)
| |--- observation.xml (contains: Observations)
| |--- observer.xml (contains: Observers)
| |--- sample.xml (contains: Biological Beings, Cases (if present), Specimens, Blocks and Slides)
| |--- staining.xml (contains: Stainings)
|---IMAGES
| |--- IMAGE_{IDENTIFIER}*
| | |--- *.dcm files of an Image
| |--- IMAGE_{IDENTIFIER}*
| | |--- *.dcm files of an Image
|--- ANNOTATIONS+
| |--- *.geojson
|--- LANDING_PAGE***
| |--- landingpage.xml (contains: Landing Page)
| |--- THUMBNAILS
| | |--- *.jpg
|--- PRIVATE**** - not shared with users
| |--- rems.xml - not shared with users
| |--- organisation.xml - not shared with users
| |--- datacite.xml (contains: DataCite, optional) - not shared with users
* The root of the folder must be the written as "DATASET_{IDENTIFIER}" with
IDENTIFIER being either the accession ID of the Dataset generated by the
repository (when data is downloaded), or the ALIAS defined by the
submitter at dataset creation and submission.
** Folders containing WSIs files (I.e. *.dcm) must be named
"IMAGE_{IDENTIFIER}" with IDENTIFIER being either the accession ID of a
given Image the files relate to generated by the repository (when data is
downloaded), or the ALIAS defined by the submitter at dataset creation or
submission.
*** IMPORTANT: Anything in this folder should be expected to be visible to
the entire world.
+ If the dataset does not contain Annotations the respective .xml files
or directory can be omitted.
**** This folder contains metadata that will not be shared with users that
have gotten access to a dataset
- All the files should be encrypted with crypt4gh and the extensions must be
c4gh
, e.g:image.xml.c4gh
,image1.dcm.c4gh
etc - The metadata should be stored in two different subfolders:
METADATA
andPRIVATE
. - The only files that may exist in the
METADATA
folder are the following:dataset.xml
,image.xml
,observation.xml
,observer.xml
(optional),policy.xml
,sample.xml
,annotation.xml
(optional) andstaining.xml
. - The only files that may exist in the
PRIVATE
folder are the following:dac.xml
andsubmission.xml
. - The file
image.xml
should include the full path of each dicom image and includes also the checksums of both encrypted and unencrypted files, e.g:
<FILES>
<FILE filename="IMAGES/IMAGE_{IDENTIFIER}/*.dcm" checksum_method="SHA256" checksum="<encrypted_checksum>" unencrypted_checksum="<unencrypted_checksum>" filetype="dcm"/>
</FILES>