Welcome to Dockstore Documentation!

Note

Our code lives on GitHub at dockstore/dockstore and dockstore/dockstore-ui2.

Dockstore is an open platform used by the GA4GH for sharing Docker-based tools described with either the Common Workflow Language (CWL), the Workflow Description Language (WDL), or Nextflow (NFL).

If this is your first time learning about Dockstore, we recommend starting with the Getting Started Guide. This will introduce you to the core concepts of Dockstore, leaving you with a good understanding of the platform. However, if you are simply looking to launch tools and workflows, we recommend going straight to the End User Topics or our quickstart guide.

Getting Started with Galaxy

Galaxy workflow support in Dockstore is new and a preview of more full featured support to come.

Unlike WDL and CWL, Galaxy workflows in the near term are primarily created and modified from the Galaxy workflow editor (GUI), instead of a text editor.

Additionally Galaxy workflows cannot be launched by the Dockstore CLI, or launched via a ‘Launch with’ button on the workflow information page.

Tutorial Goals

  • Learn about Galaxy
  • Create and run a basic Galaxy workflow
  • Export the workflow to a file
  • Setup a GitHub account and repository
  • Push the workflow to GitHub
  • Make a GitHub release

Create a basic Galaxy workflow

Create and run your workflow in Galaxy. Here is a tutorial for Creating, Editing, Importing Galaxy Workflows

Export the workflow to a file

In Galaxy:

  • Click on the Galaxy UI Workflow link at the top of the page.
  • Click on the workflow name to expose the drop down menu.
  • Click Download

The exported JSON file with a ‘.ga’ suffix describes the inputs, outputs, and Galaxy Tool Shed dependencies for your workflow.

Download

Download

Setting up GitHub

You will need to add the Galaxy workflow file you downloaded to a source code repository that Dockstore knows about. GitHub is a good choice, and if you are not familiar with GitHub you can use this tutorial to set up an account and repository.

Upload the workflow to GitHub

  • Go to your repository and click on the Upload Files menu item under Add Files
  • Click on the ‘choose your files’ link
  • Select your ‘.ga’ Galaxy workflow file
  • Click on ‘Commit changes’

These steps are outlined here.

Releasing on GitHub

Now that we’ve successfully created our workflow in Galaxy and tested it the workflow is ready to share with others. Making a release on GitHub will tag your GitHub repository with a version tag so you can always get back to this particular release. Follow the steps outlined here to create a release.

Next Steps

Now that you have a git repository that includes a Galaxy workflow, and you have tested it and are satisfied that it works the next step is to register it on Dockstore.

If you haven’t set up a Dockstore account follow the next tutorial to create an account on Dockstore and link third party services, which includes GitHub. Otherwise follow the instructions for workflow registration.

Dockstore GitHub App

GitHub apps are a GitHub feature used to improve the interaction between external applications and GitHub. Users can grant a GitHub app specific permissions on the repos and/or organizations of their choosing.

The Dockstore GitHub App adds support for registering and automatically syncing workflows and services. Check out the following guides to learn more:

Migrating Your Existing Workflows to Use GitHub Apps

Dockstore 1.9.0 provides users with a way to keep their workflows automatically updated (instead of needing to manually refresh) by using GitHub apps. Here, we will go over how to migrate your existing Dockstore workflows to use GitHub apps. This tutorial assumes that you are are familiar with the /.dockstore.yml file by having read our overview of GitHub apps. and the Registration With GitHub Apps section in our workflow registration document.

GitHub App Installation

The first step to migrating a workflow is the same as adding a new workflow via GitHub apps: install our Dockstore GitHub app onto your repository or organization. You do this by navigating to /my-workflows, clicking the + button on the left hand sidebar, selecting Register using GitHub Apps, and then clicking + Manage Dockstore Installation on GitHub. You’ll then be redirected to GitHub where you can select which repositories can be accessed by the GitHub app.

_images/add-workflow-button.png _images/register-workflow-github-apps.png _images/gh-app-reg-1.png

Creating a .dockstore.yml File

Once the GitHub app is installed on the correct repo, the next step is to create a /.dockstore.yml file. We’ll cover a very straigtforward example first, but depending on how you configured the workflow during registration and whether your GitHub repository houses multiple workflows published on Dockstore, there will be additional steps to writing your /.dockstore.yml file.

Let’s say we have the following CWL workflow registered on Dockstore that came from this repository and you would like to convert the master branch.

Workflow to Migrate

As noted in our other documentation, create a /.dockstore.yml file in the root directory of the branch you want to migrate (in this example, it’s the master branch) in your repository. The file should look like the following

version: 1.2
workflows:
   - subclass: CWL
     primaryDescriptorPath: /Dockstore.cwl
     testParameterFiles:
         - /test/dockstore.cwl.json

The information above was filled out using the following:

  • subclass is taken from the Descriptor Type
  • primaryDescriptorPath is from Workflow Path
  • testParameterFiles is from Test File Path

During the original registration for your workflow, you may have filled out the Workflow Name field shown in the picture below.

Workflow to Migrate

This field is required when you want to register multiple workflows from the same repo, but you may have filled it out for other reasons. To check if the workflow you want to migrate has a workflow name set, select the workflow and look at the title on top as shown in the picture below.

The title consists of: <sourceControl>/<organization name>/<repository name>/<optional workflow name>:<version name>

If you see a workflow name inserted, you must include the name field in your /.dockstore.yml.

version: 1.2
workflows:
   - subclass: CWL
     primaryDescriptorPath: /Dockstore.cwl
     testParameterFiles:
         - /test/dockstore.cwl.json
     name: optional-name

If you have multiple workflows registered on Dockstore that stem from the same GitHub repo, a single /.dockstore.yml can be used to convert them. Again, you need to check for the Workflow Name field being set because it’s need for multi workflow repositories. If the name field in the dockstore.yml doesn’t match the Workflow Name field in Dockstore, the migration of your workflow on Dockstore will not go through and it will instead create a new Dockstore entry. Let’s say we want to convert these two workflows that come from this repository.

_images/github-apps-multiple-workflows.png _images/github-apps-multiple-workflows-with-name.png

Your /.dockstore.yml would look like the following:

version: 1.2
workflows:
   - subclass: CWL
     primaryDescriptorPath: /Dockstore.cwl
     testParameterFiles:
         - /test/dockstore.cwl.json
   - subclass: WDL
     primaryDescriptorPath: /Dockstore.wdl
     testParameterFiles:
         - /test/dockstore.wdl.json
     name: optional-name

Testing the Migration

Note

Push events will only be captured by Dockstore after installing the GitHub app onto the repo.

To test out your GitHub app integration, make a push to a branch. Navigate to or refresh your browser on the My Workflows page, and select the workflow you wanted to convert. You should see that the Workflow Information section looks a bit different.

_images/workflow-information-after-migration.png

It now lists the mode as DOCKSTORE_YML instead of FULL and information about paths is no longer included. You are also no longer able to refresh or restub the workflow any more. Since you can’t refresh the entire workflow anymore, new versions from GitHub (releases/branches) that you want to add to Dockstore must have a DOCKSTORE_YML. However, you can still refresh already existing versions/branches on Dockstore that you haven’t converted by going to the Versions tab, clicking Actions, and selecting Refresh Version.

See also

Troubleshooting and FAQ - tips on resolving Dockstore GitHub App issues.

Troubleshooting and Frequently Asked Questions

Why should I migrate my existing workflows to use GitHub Apps and a .dockstore.yml?

Installing our Dockstore GitHub App onto your GitHub repository or organization will automatically sync your workflow on Dockstore whenever code is pushed to GitHub. This means less manual work for workflow developers, and less waiting for content to update.

This requires the addition of a /.dockstore.yml file to your repository on GitHub. This file contains workflow information such as workflow path, test parameter paths, etc. that Dockstore will use to setup the corresponding workflow on Dockstore. It’s important to note, that you will need a /.dockstore.yml file on each branch of your GitHub repository if you want to sync multiple branches (versions) of your workflow.

You can read more about it at Automatic Syncing with GitHub Apps and .dockstore.yml.

How does this change my development flow?

Adding a /.dockstore.yml file to a template branch (ex: master, develop, main), will make it so any new branches created from this template will be automatically added to and synced on Dockstore.

Therefore, as long as your workflow is already registered on Dockstore and your /.dockstore.yml is configured correctly, then updates to the workflow (including adding new versions) should happen continuously and automatically.

For this setup, if you do not want a new GitHub branch to generate a corresponding workflow-version on Dockstore, simply remove the /.dockstore.yml from the branch before it is pushed to the remote/origin repository.

Note: If you want to edit version information, such as workflow path, you will have to update the /.dockstore.yml file directly on the corresponding GitHub branch. You can no longer do this directly on Dockstore.

How do I check if the Dockstore GitHub App was installed?

If you don’t see changes, try waiting a couple of minutes and refreshing the browser on the My Workflows page again.

You can also verify that the GitHub app was given access to the right repository or organization. If access was given to the wrong organization or repository, you’ll need to push another commit after correcting it to activate the sync to Dockstore.

  • Go to your repo on GitHub, click the Settings tab, click Integrations on the left and verify our app is installed and configured correctly
_images/github-repo-settings.png
  • Double check the /.dockstore.yml file.

    • Is it in the root directory?
    • Is it on the right branch?
    • Are all indentation levels correct?
    • Does the name field match, if applicable?

The changes made to my GitHub repo aren’t appearing on Dockstore, but I’ve already installed the GitHub app and made the .dockstore.yml file. How can I figure out what’s going wrong?

If you’ve already tried waiting a couple of minutes and refreshing the browser on the My Workflows page, you can view GitHub App logs through Dockstore to see if there have been any errors. Navigate to the /my-workflows page and expand the GitHub Organization that the repository belongs to on the left hand side. Then click on the bottom where it says See GitHub App Logs.

_images/github-app-logs-button.png

Once loaded, the following window will be displayed.

_images/github-app-logs-window.png

Here you can view all the GitHub app events Dockstore is aware of and whether they failed or were successful. If there was a failure, you can expand that row and view the error message as shown below.

_images/github-app-logs-error-message.png

In the case shown above, the error message is from parsing the following /.dockstore.yml file.

version: 1.2
test:
workflows:
   - subclass: CWL
     primaryDescriptorPath: /Dockstore.cwl
     testParameterFiles:

It is saying that a key named test was found, but that key does not exist in our .dockstore.yml schema. It should be removed.

If you’re having trouble finding the relevant logs, try searching for the name of your repository by using the filter on the upper left. You can also sort the rows by clicking on a column heading. For example, if you click the Success column heading once, it will list all the events that failed first.

Can I use GitHub Apps to register tools?

The Dockstore GitHub App currently only supports Workflows and Services.

Why was a new workflow registered instead of migrating my existing one?

During the original registration for your workflow, you may have filled out the Workflow Name field shown in the picture below. A new separate workflow will be registered if the original Workflow Name isn’t included or doesn’t match the name field in your /.dockstore.yml.

Workflow to Migrate

How can I convert my entire existing workflow at once?

Currently you cannot convert all existing branches/versions at once. You must add a /.dockstore.yml to each branch in order for the GitHub app automatically detect and sync changes with the corresponding version on Dockstore.

If you have a /.dockstore.yml file in your master or develop branches on GitHub, any new branches you create from these as your template will have a /.dockstore.yml.

Note

The topics in this tutorial are experimental. We are working on improving support for rootless containers, but for now, some things may not be compatible.

Docker Alternatives

In some situations using Docker may be impractical because it requires all users to have root access. Several alternatives have been developed to make it possible to run rootless containers, including Singularity and rootless Docker. While Dockstore uses Docker by default, if necessary it may be possible to run your workflows with one of these alternatives. Because the call to Docker or an alternative is made by the workflow runner, usually cwltool or Cromwell, and not Dockstore directly, the difficulty of configuring a Docker alternative depends on the workflow type. Some Dockstore entries will run seamlessly without Docker, and some may be entirely incompatible in a rootless environment.

Singularity

Singularity is perhaps the most well-supported Docker alternative. Singularity can pull Docker images and build them into its own image format (.sif), but not all Docker features are compatible. For instance, dockerfile USER commands are not compatible with Singularity. A common problem observed when running Dockstore entries with Singularity is that the process fails on singularity pull because the entry’s dockerfile or its base image contains a USER root command. In many cases the use of root may be unnecessary. Whenever possible, dockerfiles on Dockstore should avoid using root.

Note

A best practice when using Docker for workflows is not to rely on a specific user. This is doubly true for Singularity where it is not just best practice but necessary.

Singularity provides a fake root option that might circumvent the problems using root in certain situations. There does not seem to be a way to use this option through cwltool. It can be used with Cromwell by editing the Singularity command format set in your Cromwell config file.

More information about compatibility of dockerfiles with Singularity can be found here.

Singularity can be installed following the instructions here. Note that the installation is relatively complicated and requires sudo. Neither the macOS version (Singularity Desktop) nor the Debian/Ubuntu package version currently available (2.6.1) is compatible. You will need to download a version >3.0.0 and build it from source.

cwltool

Singularity is available as a command line option for cwltool like this:

cwltool --singularity <workflow> <input json>

To set this option through Dockstore, add the following line to your ~/.dockstore/config:

cwltool-extra-parameters: --singularity

Cromwell

Cromwell can be configured to use Singularity instead of Docker as described here. This requires creating a Cromwell config file with a section describing the backend provider settings. Examples of this are available in the Cromwell GitHub here.

To tell Dockstore to run Cromwell with a custom configuration, such as the example config file linked above, add a line to your ~/.dockstore/config:

cromwell-vm-options: -Dconfig.file=<absolute path to your Cromwell conf>

Rootless Docker

Rootless Docker, a product of Docker, is very convenient because no configuration of Dockstore is required to use it. When it is installed, all docker commands are run in rootless mode without needing to set this as an option. Therefore, the normal Docker commands invoked by cwltool and Cromwell will be executed rootlessly.

Rootless Docker installation is simple and does not require root. Regular Docker must not already be installed. Just execute the installation script:

curl -sSL https://get.docker.com/rootless | sh

It may display a message that you need to add it to the PATH or do some other configuration. You can confirm that rootless Docker is working with docker info; under Security Options it should output rootless.

cwltool

cwltool documents support for some Docker alternatives but does not mention rootless Docker. In our testing, it seems the docker run command issued by cwltool is incompatible with rootless Docker and causes a permissions error with the volume mapping. cwltool with rootless Docker did not work for any tested workflows.

Cromwell

As rootless Docker does not require any change of configuration to use, it can be used with Cromwell through Dockstore despite the lack of a Cromwell config option.

Cromwell does not document support for rootless Docker, but they seem to be compatible. Most WDL workflows we tried worked smoothly with rootless Docker.

Cromwell supports most CWL features as well as WDL. You can use Cromwell instead of cwltool when running CWL files with Dockstore by adding the following line to your ~/.dockstore/config:

cwlrunner: cromwell

This may not work with all CWL entries, but it is a good workaround for the cwltool incompatibility described above.

Checksums

To help developers who want to distribute or run immutable copies of tools and workflows, Dockstore (1.9+) can provide checksums for files and Docker images when certain conditions are met. Currently, checksums are not accessible through the UI, but can be fetched by using various TRS endpoints. Keep reading to learn what is supported and how to retrieve the checksum information.

File Descriptor Checksum Support

As of 1.9, Dockstore will calculate a SHA-1 checksum during a refresh for every container, descriptor, and test parameter file included in a tool or workflow. Once the refresh is done, you must publish your entry in order to access the information via our TRS V2 endpoints. More specifically, the endpoints that contain checksums for files are as follows:

The id parameter used in the endpoints above can be found on an entry’s public page; underneath the Info tab, look for the bolded words TRS.

CLI Descriptor Validation Support

By default, when launching tools or workflows from the CLI, primary and secondary descriptors will be validated using their SHA-1 checksums. Checksums are not validated when launching local entries.

You can prevent checksum validation with the --ignore-checksums flag. For example, the following command will not validate descriptor checksums:

dockstore [tool/workflow] launch --ignore-checksums --entry <entryPath> --json <parameterFile>

Note that if there are no remote checksums stored for a descriptor (i.e. the entry has not been refreshed since the addition of checksum support in Dockstore 1.9), this will not be considered a fatal checksum mismatch, and the launch command will continue to execute.

Docker Image Checksum Support

Checksum support for Docker images is more nuanced than it is for files. For quick reference, the table below displays the languages and Docker image repositories currently available, and what action on Dockstore is required to collect this information.

Dockstore Entry Type Language Docker Image Repository Gathered On
Tool (non-hosted) CWL, WDL Quay.io, Docker Hub, GitLab Refresh
Workflow CWL, WDL, Nextflow Quay.io, Docker Hub Snapshot

Once you perform the required action, you must also publish your entry in order to see the checksum info via the TRS endpoints. Descriptions for the two endpoints of note are as follows:

  • To see all versions of an entry, use our toolsGet endpoint and fill out the id parameter
  • To see a single version of an entry, go here and fill out id and version_id

Just like the file endpoints, the id parameter used in the endpoints above can be found on an entry’s public page; underneath the Info tab, look for the bolded words TRS.

Tools

As noted in the table above, Docker image checksums are grabbed on refresh and should work as long as the image is from Quay.io, Docker Hub, or GitLab. It’s also important to note that this is done for the Docker image registered on the tool through Dockstore and not necessarily the one included within the descriptor file itself.

Workflows

For workflows, Docker image checksums are grabbed on snapshot. However, the Docker images we can retrieve from descriptor files are more limited compared to the other checksum support covered so far. Although we can generally provide checksum info for referenced Docker images for CWL, WDL, and NFL, there are some caveats. Most conditions are language specific, but for all workflow langagues, the images referenced must be from Quay.io or Docker Hub and they must include a version. The following are the known constraints for each language.

Common Workflow Language
  • Various fields can be used to reference a Docker image, but we only support “dockerPull” for now.
  • “$import” or “$include” can be used to reference a local or http(s) CWL descriptor, but we do not check for Docker image references made within files using http(s).
Workflow Descriptor Language
  • The WDL docker attribute can be evaluated as an expression, but we only support it when the attribute is set using a string.
runtime {
  # Unsupported
  # docker: "ubuntu:" + "18.04"

  # Unsupported
  # docker: "ubuntu:" + version

  # Supported
  docker: "ubuntu:18.04"
}
Nextflow
  • Similar to WDL, a container can be set equal to an expression in Nextflow. Dockstore again supports simple strings, but also the container being set to a variable defined in the params scope. However, we do not support other types of expressions.
// nextflow.config
params {
  container = 'ubuntu:18.04'
  versionName = '18.04'
}

// conf/base.config
process {
  // Unsupported
  container = "ubuntu:${params.versionName}"

  // Supported
  container = 'ubuntu:18.04'
  // Supported
  container = params.container
}
  • A Nextflow workflow can contain a “profiles” scope. Here, you can create different sets of configuration attributes. The workflow can then be run with whichever profiles are specified as a command line argument. If a Docker image is referenced within a profile, Dockstore will not recognize it.
// nextflow.config
params {
  container = 'ubuntu:18.04'
}

profiles {
  exampleProfile {
    // Unsupported
    container = 'ubuntu:18.04'
  }
}

// conf/base.config
process {
  // Supported
  container = params.container
}

Verification

What is a verified tool/workflow?

A verified tool/workflow indicates that it was successfully run and verified by either:

Historically, the majority of tool validation has been done by the docktesters team currently headed by Miguel Vazquez and formerly headed by Francis Ouellette.

We also strive to use this to highlight tools that share a common set of recommended characteristics:

  • tools should include a description and an author
  • tools should include a README.md or similar in their source repo describing any other relevant information about the tool
  • tools should include at least one test parameter file indicating how to run the tool on some sample data
  • the Dockerfile should be helpful in reconstructing how a tool was built from source
  • tools and/or their reference data should be publically available

Why would I want to verify?

There are several reasons why you would want a tool/workflow to be verified. If you’re a platform owner, verifying would indicate to others that your platform is compatible with many tools/workflows on Dockstore so others will be more likely to use your platform. If you’re tool/workflow developer, verifying would assure others that your tool/workflow is of high quality and is very likely to work for others.

How do I verify?

If you are an admin/curator, follow the Verification Process section in this document. If you are not an admin/curator, please send us a heads-up via our GitHub issues or Gitter!

How do I tell if a tool/workflow is verified?

There are 3 new indicators on Dockstore.org that indicates whether or not the tool/workflow is verified.

First, go to the page of the tool/workflow such as https://dockstore.org/containers/quay.io/briandoconnor/dockstore-tool-md5sum:1.0.4?tab=info. Since this tool/workflow is verified, 3 indicators can be seen:

Tool Page

Tool Page

  1. At the top left, the checkmark indicates that at least one of the tool/workflow’s version has been verified. As a whole, this tool/workflow is considered verified.
  2. At the top right, the recent versions of the tool/workflow are listed. There is a checkmark if a specific version of the tool/workflow is verified.
  3. The bottom right shows whether the currently selected/viewed version is verified. The selected version is indicated in the URL as well as the title (e.g. quay.io/briandoconnor/dockstore-tool-md5sum:1.0.4). In this example, the version 1.0.4 is selected/viewed. This bottom right verification box contains more details such as the platform and verifier. This example shows that “Dockstore CLI” is the platform and “Phase 1 GA4GH Tool Execution Challenge” is the verifier.

Additional information for all verified versions can be viewed at a glance in the versions tab:

Versions Tab

Versions Tab

Once again, the checkmarks indicate the version is verified. Platforms which the version was verified on are displayed to the right of it. In this case, it’s “Dockstore CLI”.

Verification Process

Note

Verifying is only available for admins and curators. Please contact one if you want your tool/workflow to be verified.

  1. Go to https://dockstore.org/api/static/swagger-ui/index.html#/extendedGA4GH/verifyTestParameterFilePost
  2. Click “Try it out”
  3. Provide a “type”. See the description for allowable values.
  4. Provide the TRS ID for the tool/workflow being verified. For example, the dockstore-tool-md5sum tool has the TRS ID: “quay.io/briandoconnor/dockstore-tool-md5sum” as shown in the “Info” tab with the label: “TRS CWL” or “TRS WDL”
  5. Provide the version_id of the tool/workflow to verify. It can be any version listed in the Version tab of the tool/workflow. dockstore-tool-md5sum has the following versions currently: 1.0.4, master, develop, 1.0.3, latest, 1.0.2, 1.0.1, and 1.0.0. It is recommended to only verify versions that are unlikely to change (tags).
  6. Provide the “relative_path” of the test parameter file being verified. The path of the test parameter file is relative to the primary descriptor. This path can be found using the files endpoint or by viewing the files tab of a tool/workflow such as: https://dockstore.org/containers/quay.io/briandoconnor/dockstore-tool-md5sum:1.0.4?tab=files and then further selecting the Test Parameter Files tab and view the right-most “File” dropdown. This relative path must be a test parameter file, providing a descriptor will not work.
  7. Provide the “platform”. Some examples are: HCA, Cromwell, Arvados, etc.
  8. Select the “verified” status either as “true” or “false”. Use “true” to verify, “false” to “unverify”.
  9. Provide “metadata”, this is typically the verifier’s identity which can be something like “GA4GH/DREAM Challenge”
  10. Lastly, provide your Dockstore token using the lock icon at the top right of the endpoint

Below is a screenshot of someone verifying the “test.json” test parameter file of the “master” version of the “dockstore-tool-md5sum” tool.

Swagger-UI

Swagger-UI

The curl command results in something like:

curl -X POST "https://dockstore.org/api/api/ga4gh/v2/extended/quay.io%2Fbriandoconnor%2Fdockstore-tool-md5sum/versions/master/CWL/tests/test.json?platform=Dockstore%20CLI&verified=true&metadata=Phase%201%20GA4GH%20Tool%20Execution%20Challenge" -H  "accept: application/json" -H  "Authorization: Bearer iamafakebearertoken"

A successful response will result in something like:

{
  "Dockstore CLI": {
    "metadata": "Phase 1 GA4GH Tool Execution Challenge",
    "verified": true
  }
}

Additional Verification Information

To see more verification information about a specific version, first select the version.

Then click “More Info” in the “Verification and Logs” panel in the bottom right.

A popup will appear:

Verification Information

Verification Information

It lists the platform it was verified on, the platform version, test parameter file that was used, and metadata (verifier). Below it, there may be an additional Logs section which contains information from Dockstore ToolTester.

Dockstore ToolTester

Dockstore ToolTester is a semi-automated process where Dockstore will attempt to launch certain verified tools/workflows through the latest Dockstore CLI. Typically this launching process occurs before a Dockstore CLI release and is done so in order to ensure compatibility. The logs contain much useful information:

  • Dockstore CLI version used
  • pip packages installed
  • version of the tool/workflow that was launched
  • time when launched
  • runner that was used (cromwell, cwltool, etc)
  • files used (which descriptor file, which test parameter file)

Dockstore Best Practices

This document is intended for tool and workflow developers and the users who register them on Dockstore. Following these best practices will help make your repositories more organized, findable, and usable for others on Dockstore.

Structuring and Organizing Your Git Repositories

Unless you’re writing your tool/workflow directly on Dockstore, your entry will be hosted on a Git repository (we currently support integration with GitHub, Bitbucket, and GitLab). Whichever repository service you use, we recommend that your tool and workflow repositories are put under a non-personal organization, if possible. This allows for better organization and collaboration, and also provides a fallback for others if you become inactive on the git repository site.

We generally advise against git repositories that contain multiple tools or workflows. But we recognize that it’s a common way to share code, so we do support it. However, there are two benefits for having only one tool or workflow per repository. First, your Dockstore entries can have shorter names because an extra name is required to distinguish between the entries during registration. The other plus is that there will be less clutter on the Versions tab of each entry. Let’s say you have a repo that contains three different workflows on separate branches and there are many tags/releases related to each one. Depending on how you register the workflows on Dockstore, each entry could have the other two branches and every single tag/release on its Versions tab. All of this clutter can be overwhelming or confusing for users to see.

One exception to this rule is when you describe the same tool or workflow with multiple descriptor languages. For example, we describe the same bamstats tool in our Getting Started with CWL and Getting Started with WDL tutorials. A single tool on Dockstore can have CWL and WDL descriptor files. Users will be able to see you’ve provided both and can easily switch between the descriptors in the UI. Unfortunately, you cannot have a single workflow entry on Dockstore contain multiple descriptor types. You will have to register them as separate workflows by using the Workflow Name field during registration.

Improving Your Entries On Dockstore

Making It Easier to Find

There are a few things you can do to make your tool/workflow easier to find on our site.

Let’s start with making your entry easier to find on our search page. One way is to add author and description metadata to your descriptor file. Adding an author will make it selectable on the Author facet and a description helps because the text search uses it as one of the fields to sift through. For more detailed information on these metadata fields, check out the following info for each language:

Note

In CWL descriptors, you can include information about your input and output files and our search will understand it. This information will be visible on the facets Input File Formats and Output File Formats. Read CWL’s guide on file formats to learn how.

You can also provide a description by writing a README.md file instead. If you do not provide description metadata in your descriptor, then we will try to pull the README.md file as a fallback.

Another tip was already mentioned above; host your repositories on a non-personal organization. Similarly, try to use a non-personal namespace to register your Docker images. Doing this will group your tools/workflows together under our Tool: Namespace and Workflow: Organization facets. This also helps by letting you add other developers that can manage your content on and off Dockstore if you ever become unavailable.

You should also consider adding labels to your entry since Labels is another facet on the search page. You can do this by going to My Tools or My Workflows page. On the right hand side, above the tabs, you will see the text Manage labels.

_images/manage-labels.png

Once you click the text, you’ll be able to add or remove labels for your entry. To get ideas for what to use as labels, look at what others are using on the search page. Also note that each label cannot have spaces and must be all lowercase. If you want a label be multiple words, separate them by hypen instead of spaces.

You can also add a checker workflow to your tools and workflows to make use of our Has Checker Workflows facet. Checker workflows also guarantee that your entry, given some input, produces the expected output on a platform different from the one where you are developing. Read our Checker Workflows tutorial to learn more.

Another option for making your work more visible is using our Organization and Collection pages. An Organization is a landing page for your collaborations, institutions, consortiums, companies, etc. Here you can explain the work and goals your group has, and highlight your tools and workflows by adding them to a Collection.

Making It Easier to Understand

Once a user has found your entry, they need to understand what it does and how to use it! The most important thing you can do so others understand your work is to provide a thorough description. You can do this by filling out the metadata field as explained in the best practices tutorials linked above. If no description is found in the descriptor file, we will use the README.md file. Your description, using either method, can be formatted using markdown. Once registered, it will be parsed by Dockstore and made available on the Info tab of an entry. Because it will be one the of first things a user will see when looking at your entry, you should make it as detailed as possible. Here is a list of items to write about:

  • About Section.

    • What does your tool or workflow do?
    • Are you part of a bigger organization? What are some of their goals?
  • How to Use Section.

    • What are the system requirements? Minimum and recommended

    • Describe the input and output files (Can also be included in CWL descriptor files. See blue ! Note box above.)

      • What are their names?
      • What data do they contain?
      • What is the format?
    • Can you provide time and/or cloud cost estimations for running your tool/workflow with a given input?

    • If available, link to tutorials using your entry.

    • If available, link to a sample or complete dataset to use.

  • Related To Section

    • Does your tool/workflow work together with other entries? If so, describe how they can be used together and provide links.
    • Link out to other similar entries you think could be useful to others.
  • Contact Section (Can be included in other metadata fields as mentioned in the Making it Easier to Find section.)

  • Citations

    • Does your workflow employ packages that should be cited?

Making It Easier to Use

Although it’s not always possible, you should provide input data needed to run the entry. You can do this a few different ways:

  • Provide links to the data needed in your description.
  • Have your entry download the input files using a link. You can do this by putting them in a test parameter file (recommended) or directly in the descriptor files.
  • Have the files within the Docker image being used. If you do this, make sure you provide a description of the structure and expected files in the description above.

Note

You can learn more about test parameter files by reading any of the Testing Locally Sections for CWL, WDL, or Nextflow.

Best Practices for Secure and FAIR Workflows

This comprehensive document contains best practices for developing secure tools or workflows that also exemplify the FAIR (Findable, Accessible, Interoperable, Reusable) guiding principles.

Version Control Best Practices

  • Host your source code, workflow descriptor file, and Dockerfile in a git repository. Dockstore currently supports GitHub, BitBucket, and GitLab. We recommend GitHub because the GitHub App integrates easily with Dockstore. If you are new to using version control, you can start with these introductory documents:

  • Create an organization on a git repository and have your collaborators publish their peer reviewed tools or workflows within the organization. (Here are instructions for GitHub).

    • Organizations can centralize your work and help to foster a culture of peer review through Pull Requests.
    • Submitting to an organization rather than hosting on an individual account provides a fallback for others if you become inactive on the git repository site.
  • Plan your repository structure

    • The repository should include the workflow language descriptor file(s), the Dockerfile used to create a custom container (if applicable), a license, and a thorough README.md.

    • Here are examples of nicely organized repositories for workflow development:

  • Use branches to separate the development of distinct features for your workflow.

    • There should always be at least one ‘main’ branch that points to the most stable copy of your workflow.

    • Any new development of features, optimizations, etc. should be created on a new branch/version that diverges from the main branch.

      • If developing multiple new features simultaneously or if multiple people are creating content, work should be split into separate branches.
      • It’s best to split into branches by independent feature units, ex: “add-QC-before-alignment”.
      • Once your feature is stable, create a pull request to merge the branch into your main branch. Once merged, you can delete the development branch if no longer needed.
  • Publish releases of workflow to save your work at a stable version for publication and citation. On GitHub these are ‘tags’ (learn how to manage tags). Below, we discuss how such releases can become immutable when synced with the snapshots feature on Dockstore.

Image / Container Best Practices

  • Because anyone can publish an image in a public repository (Docker Hub, Quay, etc.), you should be cautious of third-party containers because they may contain malware or insecure software, or may have insecure settings. These may result in cryptojacking. See an example of a malicious image in this GitHub repo.

  • When creating custom images, we recommend starting with official images. This way you know that you are starting with a secure base since these images are maintained to remove vulnerabilities.

  • You may find helpful images from sources such as BioContainer that maintains images for 1K+ bioinformatics tools. We cannot guarantee that BioContainer images are secure, so we recommend you scan all non-official images for vulnerabilities. Tools such as Snyk and Trivy scan containers for security concerns.

  • If you detect a vulnerability in a container you are interested in, we suggest you 1) contact the maintainer to update the image, or 2) if there is a Dockerfile, use it as a template to update the image yourself. Try inspecting the Dockerfile and only include those parts you feel are trustworthy. Consider upgrading versions of packages as they may be a source of vulnerabilities.

  • Use Dockerfiles to describe and configure images:

  • Keep images light:

    • More packages increases risks; try to avoid installing unnecessary packages in your images. That being said, starting with a very bare image (such as Alpine) may lead to a long setup, or difficulties in debugging.

    • Images tagged with “-slim” contain the minimum components needed to run, without being as strict as Alpine-based images. They can often provide a happy medium between a reduced size, enhanced security, and usability.

    • Some helpful starting images are suggested below:

    • A good rule of thumb is that each image should have a specific purpose. Avoid installing all of the software you need for an entire analysis in one container, instead use multiple containers.

    • Don’t include test data inside the image. Recommendations for hosting test data alongside your workflow can be found in the section below titled Accessible.

  • Publish your pre-built image in an open source container registry (such as DockerHub or Quay.io):

    • Automate builds using an image registry that is configured to trigger a build whenever a change is pushed to the Dockerfile source control repository.
    • Similar to our suggestion to publish your workflow under a GitHub organization, publish your images in an organization on a container registry. Additionally, this may make it easier for your institute to pay for a group plan to ensure your images never expire.
  • Limitation on and expiration of images: At the time of writing this, DockerHub has announced some new policies around pull limits as well as their intention to expire DockerHub images from free accounts that haven’t been pulled for some defined period of time (update: this policy is delayed). For example, this could mean that a workflow that hasn’t been run in one year may no longer be reproducible if the image has been removed.

  • Alternative options include:

    • Using images from paid organizations on DockerHub
    • Paying for a DockerHub account (this may be more cost effective if you’re able to create an organization with multiple accounts)
    • DockerHub offers exceptions to some open source projects that you may be able to get depending on your use case
    • Hosting the image on a different repository such as Google Container Repository, Quay.io, GitHub Packages, AWS ECR, etc.
    • Migrating images to another repository to mitigate the impact of DockerHub pull request limits (see example).

Tool / Workflow Best Practices

Findable

  • Once your workflow is ready to share with the community, publish it in Dockstore.

  • When publishing on Dockstore, include robust metadata. Dockstore parses metadata that enables search capabilities for finding your tool/workflow more easily. Metadata also helps your workflow be more reusable. Essential metadata fields include:

    • Naming:

      • Keep the workflow name short
      • Use all lowercase letters for compatibility with other platforms such as DockerHub
    • Authorship, contact information, and description:

      • You can add author and description metadata to your descriptor file. Adding an author will make it selectable on the Author facet in Dockstore’s search and a description helps because the text search uses it as one of the fields to sift through.
    • Include Dockstore labels to enhance searchability.

  • Above, we discussed the value of organization features in version control and container registries. You can also share your workflow in a Dockstore Organization and Collection. This feature can, for example, showcase workflows that group together to make a complete analysis.

Accessible

  • Publishing your tool or workflow in Dockstore promotes accessibility:

    • Dockstore does not require a user to sign in to search published content, which increases transparency and usability to a greater audience.
    • Dockstore implements its own REST API and also a standardized GA4GH API that can be used for sharing tools and workflows.
  • Use Dockstore’s snapshot feature to provide an immutable release of your workflow that can be verified.

    • Dockstore archives important metadata associated with a published and snapshotted version of tool or workflow to ensure provenance
    • See Dockstore’s best practices for snapshots, including adding a description and metadata to improve searchability and usability of your workflow.
  • Mint a snapshot of your workflow with a Digital Object Identifier (DOI).

    • Users can request a DOI (generated via Zenodo) for their workflow through Dockstore.

    • DOIs enhance reproducibility and make it easier to cite a specific version of your workflow in a publication.

Interoperable

  • Wrap your pipeline in one or more workflow languages supported by Dockstore:

  • Provide a parameter file (JSON or YAML) containing example parameters used for launching your workflow.

    • The parameter file is where you should link to open access test data for your tool or workflow (learn more in Reusable).
    • You can submit multiple parameter files so consider sharing one for a local run (you can use the Dockstore CLI to launch tools and workflows locally) as well as examples for a launch-with partner (such as BioData Catalyst or AnVIL).
  • Provide a checker workflow.

    • Checker workflows are additional workflows you can associate with a tool or workflow. The purpose of them is to ensure that a tool or workflow, given some inputs, produces the expected outputs on a platform different from the one where you are developing.
    • Providing a checker workflow gives other researchers confidence that they can run the work on their system correctly.

Reusable

  • Best practices when referencing the image from the image repository is to provide the digest format of the image as an immutable record in the tool or workflow. Here is an example of a digest format referenced in a workflow task:
task digestDocker {
        command {
                echo "hello world"
         }
        runtime {
        docker:"pkrusche/hap.py@sha256:f63e020c4062e0be8d081a50de16562f2ba161166e896655868efdb5527a8640
        }
}
  • The examples below show how not to reference a container in a workflow task. These exmaple formats can change and cause the workflow to no longer be reproducible.

Do not reference parameterized images:

task paramterizedDocker {
        input {
                String docker_image
        }
        command {
                echo "hello world"
        }
        runtime {
        docker: docker_image
        }
}

Do not reference by version, e.g. “v1”.

task VersionDocker {
        command {
                echo "hello world"
        }
        runtime {
                docker: "pkrusche/hap.py:v1.0"
        }
}

Do not use untagged or “latest”.

task latestDocker {
        command {
                echo "hello world"
        }
runtime {
        docker: "pkrusche/hap.py:latest"
        }
}
  • Provide open access test data with your published workflow. Test data can be shared as inputs in a JSON.

    • As mentioned in Image / Container Best Practices, test data should be hosted outside of the container.

      • GitHub can host small files such as csv or tsv (for example: trait data)
      • Broad’s Terra platform hosts multiple genomic files in this open access Google bucket
    • Consider providing both a full sample run and a small down-sampled development test.

      • A small development dataset is necessary for checker workflows. It also helps others explore your workflow without incurring heavy resource/computational costs.
      • A full-sized sample is helpful for benchmarking your workflow and providing end-users with realistic compute and cost requirements.
  • Provide a permissive license such as the MIT License, or choose a license that best fits your needs. It can be a text file in the git repository where the workflow is published (see this example).

  • Provide a thorough README in the git repository. Here is an example of thorough documentation.

    • We suggest including the following sections:

      • An introductory description of the goal of the analysis.
      • A pipeline summary that includes the software packages used by the pipeline.
      • A quick start guide that includes inputs and outputs and specifies which inputs are required versus optional.
      • Relevant links to external resources, such as expanded documentation.
      • Contact information for the organization or individual pipeline maintainer.
      • Any available cost or benchmarking information.
      • How to cite the use of your workflow (including references for the original software authors).

Templates

Service Version 1.2 Template

# The .dockstore.yml for version 1.2
version: 1.2

# A required key named service
service:
  # The subclass is required, and can be DOCKER_COMPOSE, KUBERNETES, HELM, SWARM, or NOT_APPLICABLE.
  subclass:
  name:
  author:
  description:

  # `publish` is a service-wide setting and will affect ALL branches/tags; only set this as needed in a main branch.
  # `publish` may be set to true to publish an unpublished service, or false to unpublish a published service.
  # Omitting the publish setting leaves the publish-state unchanged (recommended for all non-primary branches).
  publish:

  # These are files the Dockstore will index. They will be directly downloadable from Dockstore. Wildcards are not supported.
  files:

  # The service launcher will execute the scripts in the following order. All steps other than start are optional,
  # and if they are missing or have no value specified, will be ignored.
  #
  # 1. preprovision -- Invoked before any data has been downloaded and some initialization is required. Not sure if we need this one.
  # 2. prestart -- Executed after data has been downloaded locally, but before service has started (see the data section).
  # 3. start -- Starts up the service.
  # 4. poststart -- Associated script will run after the service has started
  # 5. postprovision -- After the service has been started. This might be invoked multiple times, e.g., if the user decides to load multiple sets of data.

  # In addition, the following scripts, if present and with a value, are for use by the launcher:
  # 1. port - Which port the service is exposing. This provides a generic way for the tool to know which port is being exposed, e.g., to reverse proxy it.
  # 2. healthcheck - exit code of 0 if service is running normally, non-0 otherwise.
  # 3. stop - stops the service
  scripts:
    start:
    postprovision:
    port:
    stop:

  # These are environment variables that the launcher is responsible for passing to any scripts that it invokes.
  # The names must be valid environment variable names.
  # Users can specify the values of the parameters in the input parameter JSON (see below).
  # These variables are service-specific, i.e., the service creator decides what values, if any, to
  # expose as environment variables.
  # There are three parts to the environment variable
  #    - The name
  #    - An optional default value, which will be used if the user does not specify in the input file
  #    - An optional description, which can be used by the service launcher as a prompt
  environment:
    httpPort:
      default:
      description:

  # This section describes data that should be provisioned locally for use by
  # the service. The service launcher is responsible for provisioning the data.
  #
  # Each key in this section is the name of a dataset. Each dataset can have
  # 1 to n files.
  #
  # Each dataset has the following properties:
  #   - targetDirectory -- required, indicates where the files in the dataset should be downloaded to. Paths are relative.
  #   - files -- required, 1 to n files, where each file has the following attributes:
  #           - description -- a description of the file
  #           - targetDirectory -- optionally override the dataset's targetDirectory if this file needs to go in a different location.
  data:
    dataset_1:
      targetDirectory:
      files:
        tsv:
          description:
        metadata:
          description:

  # Filters allow specifying sets of branches and tags to include for the service.
  # If no filters are given, all branches and tags are included.
  # Branches and tags are arrays of pattern-strings.
  # Pattern-strings use Unix-style Glob syntax by default (Ex: `develop`, `myworkflow/**`)
  # https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/file/FileSystem.html#getPathMatcher(java.lang.String)
  # or RegEx when the string is surrounded by / (Ex: `/develop/`, `/myworkflow\/.*/`).
  filters:
    branches:
    tags:

Changelog

All stable releases of Dockstore have an associated changelog. You can view these on our GitHub repository page

News and Events

This section is for keeping up to date with all news and events related to Dockstore. The items are listed in chronological order.

Dockstore 1.7.0

Highlighted new features include

  • Snapshotted versions and DOIs - Versions can be snapshotted to freeze them at a particular point in time - Zenodo integration allowing publishers to create DOIs for snapshotted versions
  • CWL launch with CGC (Seven Bridges)
  • Language support updates
    • Support for WDL 1.0 tools and workflows
    • Support for CWL 1.1 tools and workflows
    • Cromwell update from 36 to 44
  • Migrated documentation to readthedocs (same URL)
  • When not logged-in, the home page will better introduce new users to Dockstore
  • CLI now tested with (and recommending) Java 11 and Python 3
  • Closed beta feature

See a full list of our changes on GitHub

Breaking changes

Major

  • none intended

Minor

  • while workflow launching should be unaffected, metadata editing through the 1.6.0 CLI will no longer function till an upgrade to the 1.7.0 CLI
  • Improved parsing code and support for WDL 1.0 means that some previously invalid tools and workflows should now be valid (or vice versa) - A refresh of the tools and workflows is required to redo validation
  • My Tools and My Workflows links are now in account dropdown

Dockstore 1.8.0

Highlighted new features include:

  • New launch-with partners
    • AnVIL
  • Redesigned logged-in homepage displays more information for returning users including
    • News and updates
    • Featured content such as new organizations and workflows
    • Recently modified tools, workflows and orgs
  • Notification system for news such as TOS updates, pending system updates
  • Security updates
  • Performance improvements
  • Usability improvements
  • Can source workflow descriptions from README.md if not present in workflow itself
  • DOI landing pages hosted at Zenodo will now link to exact versions of workflows
  • Preview feature
  • Services are now reachable from many parts of the UI
  • Behind the scenes work on CI/CD, infrastructure improvements, more frequent releases
  • Also some behind the scenes work leading up to the implementation of the final v2 GA4GH TRS standard

As always, see our full list of changes on GitHub

Breaking changes

Major

None intended

Minor

  • If upgrading the Dockstore CLI, our Dockstore script has changed and should be downloaded anew from the onboarding wizard

Dockstore 1.9.0

Highlighted new features include:

  • Preview feature for Galaxy workflow support
  • Performance improvements for a large variety of endpoints
  • GitHub app support in preview to allow for automated update of workflow content
  • Implementation of the TRS v2 final standard
    • TRS v2 beta standard support is deprecated, but still present
  • Capture of file and docker image checksums from GitHub, Quay.io, GitLab, and Docker Hub on workflow snapshot to support immutable workflows (see breakdown for details here)
  • Support for linking ORCID profiles to Dockstore user profiles
    • Currently displayed in organization and stargazer views
  • A large number of usability improvements and fixes to the user interface
  • Partial API migration from swagger 2.0 to openapi 3.0 for the Dockstore API
  • Improved language plugin support so it’s easier for Dockstore to support additional languages (like Galaxy)
  • A large variety of security updates and bug fixes

As always, see our full list of changes on GitHub

Breaking changes

Major

None intended

Minor

  • If upgrading the Dockstore CLI, our Dockstore script has changed and should be downloaded anew from the onboarding wizard
  • The CLI refresh command will be broken until you update to CLI version 1.9.

In Affiliation with

collaboratory       oicr       ga4gh       ucsc

Workflow Languages

wdl       wdl       nextflow

Works With

dnastack       sevenbridges       terra