Note
This tutorial is a continuation of Getting Started With Docker. Please complete that tutorial prior to doing this one.
Getting Started with WDL¶
Tutorial Goals¶
- Learn about the Workflow Description Language (WDL)
- Create a basic WDL Tool which uses a Docker image
- Run the Tool locally
- Describe a sample parameterization of the Tool
- Push the Tool onto GitHub
Describe Your Tool in WDL¶
Besides CWL, you can also describe tools via the WDL language. WDL does not directly have the concept of a Tool built in to the language like CWL. Instead, we define a tool as a one task WDL workflow, where the task has an associated Docker image.
We provide a hello world example as follows:
version 1.0
task hello {
input {
String name
}
command {
echo 'hello ${name}!'
}
output {
File response = stdout()
}
runtime {
docker: 'ubuntu:latest'
}
}
workflow test {
call hello
}
The runtime section of a task allows you to use a docker image to run the task in. In this example we use the basic Ubuntu image. This image should match the Dockerfile that you register on Dockstore alongside your WDL descriptor files.
Again, we provide an example from the dockstore-tool-bamstats repository:
version 1.0
task bamstats {
input {
File bam_input
Int mem_gb
}
command {
bash /usr/local/bin/bamstats ${mem_gb} ${bam_input}
}
output {
File bamstats_report = "bamstats_report.zip"
}
runtime {
docker: "quay.io/collaboratory/dockstore-tool-bamstats:1.25-6_1.0"
memory: mem_gb + "GB"
}
meta {
author: "Andrew Duncan"
}
}
workflow bamstatsWorkflow {
input {
File bam_input
Int mem_gb
}
call bamstats { input: bam_input=bam_input, mem_gb=mem_gb }
}
Let us break it down piece by piece.
You’ll notice that there are two main sections of the file. First is a task section where we define the task level inputs and outputs of a given step, along with the runtime. Next, there is a workflow section where we define workflow level inputs and outputs, and the calling of the task.
At the top of the task section we define two inputs: the input bam file and the amount of memory in GB to use to run the task. This looks very similar to variable declaration in most programming languages.
input {
File bam_input
Int mem_gb
}
Next is the command section. This specifies what command we want to run on the command line. We can also pass the command parameters based on the inputs described above. Here we pass the amount of memory to use and the input BAM file to a script from the quay.io/collaboratory/dockstore-tool-bamstats:1.25-6_1.0 docker image.
command {
bash /usr/local/bin/bamstats ${mem_gb} ${bam_input}
}
The output section defines the expected output for the task. Here the output is a ZIP file containing the results of the script.
output {
File bamstats_report = "bamstats_report.zip"
}
The runtime section is very important to Dockstore. It is here where we define what Docker image to use to run the task in. We also define how much memory the Docker container should use.
runtime {
docker: "quay.io/collaboratory/dockstore-tool-bamstats:1.25-6_1.0"
memory: mem_gb + "GB"
}
Finally, we have a metadata section where we can store key value pairs. It is free-form, so we could put anything here. Dockstore is able to pick up author, email, and description if they are defined here. All metadata values must be a single-line string.
The description field can be used to add documentation and Dockstore will treat the string as markdown, rendering accordingly. When writing a description in markdown that requires newlines, specify the newlines with n`:raw-latex:n.
Note
If no description is defined in the descriptor file, the README from the corresponding Git repository is used.
Below we show an example metadata section and how it will display on your workflow’s landing page:
meta {
author: "Andrew Duncan"
email: "andrew@foobar.com"
description: "## Bamstats \n This is the Bamstats workflow.\n\n Adding documentation improves clarity."
}

wdl_metadata
The workflow section here consists of two main parts. The first section is an input section, where we define the input BAM file and the memory to use.
File bam_input
Int mem_gb
Finally there is the call section where we actually call the tasks. Without this section our tool will not do anything. In this section we call the bamstats tool, and pass it the two required parameters.
call bamstats { input: bam_input=bam_input, mem_gb=mem_gb }
Testing Locally¶
So at this point, you’ve created a Docker-based tool and have described how to call that tool using WDL. Let’s test running the BAMStats using the Dockstore command line and descriptor rather than just directly calling it via Docker. This will test that the WDL correctly describes how to run your tool.
The first thing I’ll do is setup the Dockstore CLI locally. This will have me install all of the dependencies needed to run the Dockstore CLI on my local machine.
Next thing I’ll do is create a completely local dataset and JSON parameterization file:
$> wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase3/data/NA12878/alignment/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam
This downloads to my current directory. I could choose another location,
it really doesn’t matter. I’m using a sample I checked in already:
test.wdl.json
.
{
"bamstatsWorkflow.bam_input": "NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam",
"bamstatsWorkflow.mem_gb": "4"
}
Tip
The Dockstore CLI can handle inputs with HTTPS, FTP, GS, and S3 URLs but that’s beyond the scope of this tutorial.
You can see in the above I give the relative path to the input under
bam_input
and the memory in GB that I want to use for the task.
At this point, let’s run the tool with our local inputs and outputs via the JSON config file:
$> dockstore tool launch --local-entry Dockstore.wdl --json test.wdl.json
Creating directories for run of Dockstore launcher in current working directory: /home/aduncan/Documents/dockstore-tool-bamstats
Provisioning your input files to your local machine
Downloading: bamstatsWorkflow.bam_input from NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam to: /home/aduncan/Documents/dockstore-tool-bamstats/cromwell-input/aca839a6-92c4-4234-bc6d-460bcfe6f4d6/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam
Calling out to Cromwell to run your workflow
java -jar /home/aduncan/.dockstore/libraries/cromwell-30.2.jar run /home/aduncan/Documents/dockstore-tool-bamstats/Dockstore.wdl --inputs /tmp/foo7282099563694004806json
Cromwell exit code: 0
Cromwell stdout:
[2018-08-30 14:23:40,47] [info] Running with database db.url = jdbc:hsqldb:mem:93932e57-4451-41b9-8d64-c550c1f8afc6;shutdown=false;hsqldb.tx=mvcc [2018-08-30 14:23:43,78] [info] Running migration RenameWorkflowOptionsInMetadata with a read batch size of 100000 and a write batch size of 100000 [2018-08-30 14:23:43,78] [info] [RenameWorkflowOptionsInMetadata] 100% [2018-08-30 14:23:43,84] [info] Running with database db.url = jdbc:hsqldb:mem:0424c7e8-e6fd-41dc-a21a-5daa245e038c;shutdown=false;hsqldb.tx=mvcc [2018-08-30 14:23:44,05] [info] Slf4jLogger started [2018-08-30 14:23:44,15] [info] Metadata summary refreshing every 2 seconds. [2018-08-30 14:23:44,16] [info] Starting health monitor with the following checks: DockerHub, Engine Database [2018-08-30 14:23:44,17] [info] WriteMetadataActor configured to write to the database with batch size 200 and flush rate 5 seconds. [2018-08-30 14:23:44,18] [info] CallCacheWriteActor configured to write to the database with batch size 100 and flush rate 3 seconds. [2018-08-30 14:23:44,64] [info] SingleWorkflowRunnerActor: Submitting workflow [2018-08-30 14:23:44,67] [info] Workflow 4d24ebd1-5151-4b07-82d7-272b184fd0eb submitted. [2018-08-30 14:23:44,67] [info] SingleWorkflowRunnerActor: Workflow submitted 4d24ebd1-5151-4b07-82d7-272b184fd0eb [2018-08-30 14:23:44,67] [info] 1 new workflows fetched [2018-08-30 14:23:44,67] [info] WorkflowManagerActor Starting workflow 4d24ebd1-5151-4b07-82d7-272b184fd0eb [2018-08-30 14:23:44,68] [info] WorkflowManagerActor Successfully started WorkflowActor-4d24ebd1-5151-4b07-82d7-272b184fd0eb [2018-08-30 14:23:44,68] [info] Retrieved 1 workflows from the WorkflowStoreActor [2018-08-30 14:23:45,18] [info] MaterializeWorkflowDescriptorActor [4d24ebd1]: Call-to-Backend assignments: bamstatsWorkflow.bamstats -> Local [2018-08-30 14:23:45,25] [warn] Local [4d24ebd1]: Key/s [memory] is/are not supported by backend. Unsupported attributes will not be part of job executions. [2018-08-30 14:23:45,25] [warn] Couldn't find a suitable DSN, defaulting to a Noop one. [2018-08-30 14:23:45,26] [info] Using noop to send events. [2018-08-30 14:23:47,30] [info] WorkflowExecutionActor-4d24ebd1-5151-4b07-82d7-272b184fd0eb [4d24ebd1]: Starting calls: bamstatsWorkflow.bamstats:NA:1 [2018-08-30 14:23:47,88] [warn] BackgroundConfigAsyncJobExecutionActor [4d24ebd1bamstatsWorkflow.bamstats:NA:1]: Unrecognized runtime attribute keys: memory [2018-08-30 14:23:47,92] [info] BackgroundConfigAsyncJobExecutionActor [4d24ebd1bamstatsWorkflow.bamstats:NA:1]: bash /usr/local/bin/bamstats 4 /cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats/inputs/home/aduncan/Documents/dockstore-tool-bamstats/cromwell-input/aca839a6-92c4-4234-bc6d-460bcfe6f4d6/NA12878.chrom20.ILLUMINA.bwa.CEU.low_coverage.20121211.bam [2018-08-30 14:23:47,93] [info] BackgroundConfigAsyncJobExecutionActor [4d24ebd1bamstatsWorkflow.bamstats:NA:1]: executing: docker run \ --cidfile /home/aduncan/Documents/dockstore-tool-bamstats/cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats/execution/docker_cid \ --rm -i \ \ --entrypoint /bin/bash \ -v /home/aduncan/Documents/dockstore-tool-bamstats/cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats:/cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats \ quay.io/collaboratory/dockstore-tool-bamstats@sha256:8472101666cda2a29be9abe8184ec2c7cae4360b75e712706921476b6b537679 /cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats/execution/script [2018-08-30 14:23:47,95] [info] BackgroundConfigAsyncJobExecutionActor [4d24ebd1bamstatsWorkflow.bamstats:NA:1]: job id: 27953 [2018-08-30 14:23:47,95] [info] BackgroundConfigAsyncJobExecutionActor [4d24ebd1bamstatsWorkflow.bamstats:NA:1]: Status change from - to WaitingForReturnCodeFile [2018-08-30 14:25:08,39] [info] BackgroundConfigAsyncJobExecutionActor [4d24ebd1bamstatsWorkflow.bamstats:NA:1]: Status change from WaitingForReturnCodeFile to Done [2018-08-30 14:25:10,30] [info] WorkflowExecutionActor-4d24ebd1-5151-4b07-82d7-272b184fd0eb [4d24ebd1]: Workflow bamstatsWorkflow complete. Final Outputs: { "bamstatsWorkflow.bamstats.bamstats_report": "/home/aduncan/Documents/dockstore-tool-bamstats/cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats/execution/bamstats_report.zip" } [2018-08-30 14:25:10,32] [info] WorkflowManagerActor WorkflowActor-4d24ebd1-5151-4b07-82d7-272b184fd0eb is in a terminal state: WorkflowSucceededState [2018-08-30 14:25:27,76] [info] SingleWorkflowRunnerActor workflow finished with status 'Succeeded'. { "outputs": { "bamstatsWorkflow.bamstats.bamstats_report": "/home/aduncan/Documents/dockstore-tool-bamstats/cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats/execution/bamstats_report.zip" }, "id": "4d24ebd1-5151-4b07-82d7-272b184fd0eb" } [2018-08-30 14:25:27,81] [info] Message [cromwell.core.actor.StreamActorHelper$StreamFailed] without sender to Actor[akka://cromwell-system/deadLetters] was not delivered. [1] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. [2018-08-30 14:25:27,81] [info] Message [cromwell.core.actor.StreamActorHelper$StreamFailed] without sender to Actor[akka://cromwell-system/deadLetters] was not delivered. [2] dead letters encountered. This logging can be turned off or adjusted with configuration settings 'akka.log-dead-letters' and 'akka.log-dead-letters-during-shutdown'. [2018-08-30 14:25:27,84] [info] Automatic shutdown of the async connection [2018-08-30 14:25:27,84] [info] Gracefully shutdown sentry threads. [2018-08-30 14:25:27,84] [info] Shutdown finished.
Cromwell stderr:
Saving copy of Cromwell stdout to: /home/aduncan/Documents/dockstore-tool-bamstats/Cromwell.stdout.txt
Saving copy of Cromwell stderr to: /home/aduncan/Documents/dockstore-tool-bamstats/Cromwell.stderr.txt
Output files left in place
So that’s a lot of information but you can see the process was a success. The output is kind of hard to parse, but look for the following text
Workflow bamstatsWorkflow complete. Final Outputs: {
"bamstatsWorkflow.bamstats.bamstats_report": "/home/aduncan/Documents/dockstore-tool-bamstats/cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats/execution/bamstats_report.zip"
}
The final output can be found at
/home/aduncan/Documents/dockstore-tool-bamstats/cromwell-executions/bamstatsWorkflow/4d24ebd1-5151-4b07-82d7-272b184fd0eb/call-bamstats/execution/bamstats_report.zip
.
So what’s going on here? What’s the Dockstore CLI doing? It can best be summed up with this image:

Lifecycle
The command line first provisions input files. In our case, the files
were local so no provisioning was needed. But as the tip above
mentioned, these can be various URLs pointing to remote files. After
provisioning the docker image is pulled and ran via the Cromwell
command line. This uses the Dockstore.wdl
and parameterization JSON
file (test.wdl.json
) to construct the underlying docker run
command. Finally, the Dockstore CLI provisions files back.
Tip
You can use --debug
to get much more information during
this run, including the actual call to Cromwell (which can be super
helpful in debugging):
The following command is an example of how the Dockstore CLI calls out to Cromwell:
java -jar /home/aduncan/.dockstore/libraries/cromwell-30.2.jar run /home/aduncan/Documents/dockstore-tool-bamstats/Dockstore.wdl --inputs /tmp/foo7282099563694004806json
Tip
The dockstore
CLI automatically create a datastore
directory in the current working directory where you execute the command
and uses it for inputs/outputs. It can get quite large depending on the
tool/inputs/outputs being used. Plan accordingly e.g. execute the
dockstore CLI in a directory located on a partition with sufficient
storage.
Adding a Test Parameter File¶
We are able to register the above input parameterization of the tool into Dockstore so that users can see and test an example with our tool. Users can manually add test parameter files for a given tool tag or workflow version through both the command line and the versions tab in the UI.
Tip
Make sure that any required input files are given as publically accessible URLs so that a user can run the example successfully.
Releasing on GitHub¶
At this point, we’ve successfully created our tool in Docker, tested it,
written a workflow language descriptor that describes how to run it, and
tested running this via the Dockstore command line. All of this work has
been done locally; so if we encounter problems along the way, it is fast
to perform debug cycles. At this point, we’re confident that the tool is
bug free and ready to share with others. It’s time to release
1.25-6_1.1
Releasing will tag your GitHub repository with a version tag so you can
always get back to this particular release. I’m going to use the tag
1.25-6_1.1
which I’ll need to update the Docker image tag in
my CWL/WDL/Nextflow file. Note that if you’re following the tutorial
using a forked version of the bamstats repo, your organization name
should be different. GitHub makes it very easy to release:

Release
I click on “releases” in my forked version of the GitHub project page and then follow the directions to create a new release. Simple as that!
Tip
HubFlow is an excellent way to manage the lifecycle of releases on GitHub. Take a look!
Building on Quay.io¶
Now that you’ve perfected the Dockerfile
, built the image on your
local host, tested running the Docker container and tool packaged
inside, and have released this version on GitHub, it’s time to push the
image to a place where others can use it. For this you can use Docker
Hub or GitLab but we prefer Quay.io since it
integrates really nicely with Dockstore.
You can manually docker push
the image you have already built but
the most reliable and transparent thing you can do is link your GitHub
repository (and the Dockerfile contained within) to Quay.io. This will
cause Quay to automatically build the Docker image every time there is a
change.
Log onto Quay.io now and setup a new repository (click the “+” icon).

New Quay Repo
For your sanity, you should match the name to what you were using
previously. So in this case, it’s my username then the same repo name as
in GitHub denis-yuen/dockstore-tool-bamstats
. Also, Dockstore will
only work with Public
repositories currently. Notice I’m selecting
“Link to a GitHub Repository Push.” This is because we want Quay to
automatically build our Docker image every time we update the repository
on GitHub. Very slick!

Build Trigger
Click through to select the organization and repo that will act as the
source for your image. Here I select the GitHub repo for
denis-yuen/dockstore-tool-bamstats
but this should be the username
or organization you used in your tutorial run-through.
It will then ask if there are particular branches you want to build; I typically just let it build everything.
So every time you do a commit to your GitHub repo, Quay automatically builds and tags a Docker image. If this is overkill for you, consider setting up a regular expression to trigger builds at this step.

Build Trigger
It will also ask you where your Dockerfile is located and where your build context is (normally the root).
At this point, you can confirm your settings and “Create Trigger” followed by “Run Trigger Now” to actually perform the initial build of the Docker images. You’ll need to click on the little gear icon next to your build trigger to accomplish this.

Manual Trigger
Manually trigger it with a version name of 1.25-6_1.1
for this
tutorial. Normally, I let the build trigger build a new tag for each new
release on GitHub. “latest” on Quay.io is built any time I check-in on
any branch. This can be useful for development but is discouraged in
favour of a tagged version number for formal releases of your tool.
In my example, I should see a 1.25-6_1.1
listed for this Quay.io
Docker repository:

Build Tags
And I do, so this Docker image has been built successfully by Quay and is ready for sharing with the community.
Next Steps¶
Follow the next tutorial to create an account on Dockstore and link third party services.