Checksums

To help developers who want to distribute or run immutable copies of tools and workflows, Dockstore (1.9+) can provide checksums for files and Docker images when certain conditions are met. Currently, checksums are not accessible through the UI, but can be fetched by using various TRS endpoints. Keep reading to learn what is supported and how to retrieve the checksum information.

File Descriptor Checksum Support

As of 1.9, Dockstore will calculate a SHA-1 checksum during a refresh for every container, descriptor, and test parameter file included in a tool or workflow. Once the refresh is done, you must publish your entry in order to access the information via our TRS V2 endpoints. More specifically, the endpoints that contain checksums for files are as follows:

The id parameter used in the endpoints above can be found on an entry’s public page; underneath the Info tab, look for the bolded words TRS.

After gathering the checksum using the above method you can verify a descriptor’s checksum using the shasum terminal application. This is done by requesting the PLAIN_WDL descriptor and piping the output to shasum.

trsid=%23workflow%2Fgithub.com%2Fdockstore-testing%2Fdockstore-workflow-md5sum-unified%2Fwdl
version=1.2.0
curl -s https://dockstore.org/api/ga4gh/trs/v2/tools/$trsid/versions/$version/PLAIN-WDL/descriptor | shasum

The resulting checksum should match what was provided by the API above.

If you use the Dockstore CLI client descriptor checksums are verified before being sent to the execution engine.

CLI Descriptor Validation Support

By default, when launching tools or workflows from the CLI, primary and secondary descriptors will be validated using their SHA-1 checksums. Checksums are not validated when launching local entries.

You can prevent checksum validation with the --ignore-checksums flag. For example, the following command will not validate descriptor checksums:

dockstore [tool/workflow] launch --ignore-checksums --entry <entryPath> --json <parameterFile>

Note that if there are no remote checksums stored for a descriptor (i.e. the entry has not been refreshed since the addition of checksum support in Dockstore 1.9), this will not be considered a fatal checksum mismatch, and the launch command will continue to execute.

Docker Image Checksum Support

Checksum support for Docker images is more nuanced than it is for files. For quick reference, the table below displays the languages and Docker image repositories currently available, and what action on Dockstore is required to collect this information.

Dockstore Entry Type Language Docker Image Repository Gathered On
Tool (non-hosted) CWL, WDL Quay.io, Docker Hub, GitLab Refresh
Workflow CWL, WDL, Nextflow Quay.io, Docker Hub Snapshot

Once you perform the required action, you must also publish your entry in order to see the checksum info via the TRS endpoints. Descriptions for the two endpoints of note are as follows:

  • To see all versions of an entry, use our toolsGet endpoint and fill out the id parameter
  • To see a single version of an entry, go here and fill out id and version_id

Just like the file endpoints, the id parameter used in the endpoints above can be found on an entry’s public page; underneath the Info tab, look for the bolded words TRS.

To verify a checksum as reported by the Dockstore API matches what you download from the Docker registry first find the checksum and image path using one of the above methods for the image you would like to verify. Then download the image using the Docker CLI client.

docker pull quay.io/briandoconnor/dockstore-tool-md5sum:1.0.4

When the download has completed a Digest is provided in the terminal output. This should match the checksum provided by the Dockstore API.

Verifying the image checksum can give you better guarantees the image has not changed since the workflow was published to Dockstore. However, in some cases the image checksum may diverge, for example, if the image was defined in a git branch that has since been updated. For best results, and to avoid your Docker image being deleted because of a registry’s retention policy, use Docker images referred to by a tagged version or digest. The verification features available may vary between execution engines.

For more information on Docker registry retention policies see posts from `Docker <https://www.docker.com/blog/scaling-dockers-business-to-serve-millions-more-developers-storage/`_, AWS, or Azure.

Tools

As noted in the table above, Docker image checksums are grabbed on refresh and should work as long as the image is from Quay.io, Docker Hub, or GitLab. It’s also important to note that this is done for the Docker image registered on the tool through Dockstore and not necessarily the one included within the descriptor file itself.

Workflows

For workflows, Docker image checksums are grabbed on snapshot. However, the Docker images we can retrieve from descriptor files are more limited compared to the other checksum support covered so far. Although we can generally provide checksum info for referenced Docker images for CWL, WDL, and NFL, there are some caveats. Most conditions are language specific, but for all workflow langagues, the images referenced must be from Quay.io or Docker Hub and they must include a version. The following are the known constraints for each language.

Common Workflow Language

  • Various fields can be used to reference a Docker image, but we only support “dockerPull” for now.
  • “$import” or “$include” can be used to reference a local or http(s) CWL descriptor, but we do not check for Docker image references made within files using http(s).

Workflow Descriptor Language

  • The WDL docker attribute can be evaluated as an expression, but we only support it when the attribute is set using a string.
runtime {
  # Unsupported
  # docker: "ubuntu:" + "18.04"

  # Unsupported
  # docker: "ubuntu:" + version

  # Supported
  docker: "ubuntu:18.04"
}

Nextflow

  • Similar to WDL, a container can be set equal to an expression in Nextflow. Dockstore again supports simple strings, but also the container being set to a variable defined in the params scope. However, we do not support other types of expressions.
// nextflow.config
params {
  container = 'ubuntu:18.04'
  versionName = '18.04'
}

// conf/base.config
process {
  // Unsupported
  container = "ubuntu:${params.versionName}"

  // Supported
  container = 'ubuntu:18.04'
  // Supported
  container = params.container
}
  • A Nextflow workflow can contain a “profiles” scope. Here, you can create different sets of configuration attributes. The workflow can then be run with whichever profiles are specified as a command line argument. If a Docker image is referenced within a profile, Dockstore will not recognize it.
// nextflow.config
params {
  container = 'ubuntu:18.04'
}

profiles {
  exampleProfile {
    // Unsupported
    container = 'ubuntu:18.04'
  }
}

// conf/base.config
process {
  // Supported
  container = params.container
}