Preparation guide

Before you submit your data, please make sure

  1. The root folder of the submission should actually be the dataset folder which includes several subfolders. See the example of structure folder below:
    DATASET_{IDENTIFIER}
    |--- METADATA
    |    |--- dataset.xml (contains: Dataset)
    |    |--- policy.xml (contains: Policy)
    |    |--- image.xml (contains: Images)
    |    |--- annotation.xml (contains: Annotations) 
    |    |--- observation.xml (contains: Observations)
    |    |--- observer.xml (contains: Observers)
    |    |--- sample.xml (contains: Biological Beings, Cases (if present), Specimens, Blocks and Slides)
    |    |--- staining.xml (contains: Stainings)
    |---IMAGES
    |    |--- IMAGE_{IDENTIFIER}*
    |    |    |--- *.dcm files of an Image
    |    |--- IMAGE_{IDENTIFIER}*
    |    |    |--- *.dcm files of an Image
    |--- ANNOTATIONS+
    |    |--- *.geojson
    |--- LANDING_PAGE***
    |    |--- landingpage.xml (contains: Landing Page) 
    |    |--- THUMBNAILS
    |    |    |--- *.jpg
    |--- PRIVATE**** - not shared with users
    |    |--- rems.xml - not shared with users
    |    |--- organisation.xml - not shared with users
    |    |--- datacite.xml (contains: DataCite, optional) - not shared with users

    *    The root of the folder must be the written as "DATASET_{IDENTIFIER}" with
         IDENTIFIER being either the accession ID of the Dataset generated by the
         repository (when data is downloaded), or the ALIAS defined by the
         submitter at dataset creation and submission.
    **   Folders containing WSIs files (I.e. *.dcm) must be named
         "IMAGE_{IDENTIFIER}" with IDENTIFIER being either the accession ID of a
         given Image the files relate to generated by the repository (when data is
         downloaded), or the ALIAS defined by the submitter at dataset creation or
         submission.
    ***  IMPORTANT: Anything in this folder should be expected to be visible to
         the entire world.
    +    If the dataset does not contain Annotations the respective .xml files
         or directory can be omitted.
    **** This folder contains metadata that will not be shared with users that
         have gotten access to a dataset
  1. All the files should be encrypted with crypt4gh and the extensions must be c4gh, e.g: image.xml.c4gh, image1.dcm.c4gh etc
  2. The metadata should be stored in two different subfolders: METADATA and PRIVATE.
  3. The only files that may exist in the METADATA folder are the following: dataset.xml, image.xml, observation.xml, observer.xml (optional), policy.xml, sample.xml, annotation.xml (optional) and staining.xml.
  4. The only files that may exist in the PRIVATE folder are the following: dac.xml and submission.xml.
  5. The file image.xml should include the full path of each dicom image and includes also the checksums of both encrypted and unencrypted files, e.g:
    <FILES>
        <FILE filename="IMAGES/IMAGE_{IDENTIFIER}/*.dcm" checksum_method="SHA256" checksum="<encrypted_checksum>" unencrypted_checksum="<unencrypted_checksum>" filetype="dcm"/>
    </FILES>