Dockstore tools and workflows can also be run through a number of online services that we’re going to loosely call “commercial batch services.” These services share the following characteristics: they spin up the underlying infrastructure and run commands, often in Docker containers, while freeing you from running the batch computing software yourself. While not having any understanding of CWL, these services can be used naively to run tools and workflows, and in a more sophisticated way to implement a CWL-compatible workflow engine.
Google Pipeline and DataBiosphere dsub are also worth a look. In particular, both Google Genomics Pipelines and dsub provide tutorials on how to run (Dockstore!) tools if you have some knowledge on how to construct the command-line for a tool yourself.
Consonance pre-dates Dockstore and was the framework used to run much of the data analysis for the PCAWG project by running Seqware workflows. Documentation for this incarnation of Dockstore can be found at Working with PanCancer Data on AWS and ICGC on AWS.
Consonance has subsequently been updated to run Dockstore tools and has also been adopted at the UCSC Genomics Institute for this purpose. Also, using cwltool under-the-hood to provide CWL compatibility, Consonance provides DIY open-source support for provisioning AWS VMs and starting CWL tasks. We recommend having some knowledge of AWS EC2 before attempting this route.
Consonance’s strategy is to provision either on-demand VMs or spot priced VMs depending on cost, and delegates runs of CWL tools to these provisioned VMs with one tool executing per VM. A Java-based web service and RabbitMQ provide for communication between workers and the launcher, while an Ansible playbook is used to setup workers for execution.