Dockstore Dictionary
.dockstore.yml
This file is part of GitHub App registration. It indexes workflows or tools within a repository, including their optional test parameter files, and the author(s) of said workflows or tools.
Further reading: /assets/templates/template
absolute path
A path that starts with the character
/
and contains the full set of directories necessary to resolve a file, starting from the root directory of the repository or filesystem. For example:/Dockstore.cwl
or/bin/sh
AGC
abbreviation for Amazon Genomics CLI
Amazon Genomics CLI
A CLI-based tool that supports launching bioinformatics-related workflows on AWS cloud infrastructure. The Dockstore CLI can launch workflows on AWS using Amazon Genomics CLI’s WES implementation.
see also AGC
Further reading: https://aws.amazon.com/blogs/industries/announcing-amazon-genomics-cli-preview/
AnVIL Project
abbreviation for Analysis Visualization and Informatics Labspace
A federated cloud platform funded by NHGRI designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. Sometimes referred to as just “the AnVIL” or “AnVIL”.
Further reading: https://anvilproject.org/
API
abbreviation for Application Programmer Interface
A software connection or interface used to exchange data, often between two different platforms. Communication between different cloud platforms is mediated by various APIs, such as TES.
AWS
abbreviation for Amazon Web Services
A provider of cloud services, most notably cloud computing and cloud storage, available on-demand and hosted by Amazon. Netflix and AirBnB are examples of a system that is powered by AWS. Some bioinformatics systems such as Seven Bridges can leverage AWS by launching workflows on EC2 instances.
see also GCP
Further reading: https://docs.aws.amazon.com/index.html?nc2=h_ql_doc_do
BD Catalyst
abbreviation for BioData Catalyst
BDC
[pronounced “bee-dee-see”]
abbreviation for BioData Catalyst
BioData Catalyst
A cloud-based platform funded by NHLBI to provide tools, applications, and workflows in secure workspaces to expand research in heart, lung, blood, and sleep health.
Further reading: https://dockstore.org/organizations/bdcatalyst
Cancer Genomics Cloud
A cloud platform by Seven Bridges and funded by NCI for bioinformatics analysis.
categories
A group of workflows or tools curated by Dockstore with a similar scientific purpose.
CGC
abbreviation for Cancer Genomics Cloud
CLI
abbreviation for Command Line Interface
A program that can be interacted with on the command line, usually via “Terminal” on MacOS and Linux or “cmd”/Command Prompt on Windows. CLI programs generally do not have a graphical user interface.
Further reading: https://en.wikipedia.org/wiki/Command-line_interface
cloud computing
Doing computational tasks on a remote machine that is made available on-demand without the user having to manage all aspects of it. Generally implies that the user is essentially renting computational resources from someone else. Well-known cloud providers include GCP, AWS, Microsoft Azure, and Alibaba Cloud.
Further reading: https://en.wikipedia.org/wiki/Cloud_computing
collection
A group of at least one entry on Dockstore that the members of an organization found useful, created themselves, or considered interesting. Each collection has a description, which you can read to see why the organization compiled workflows/tools in a collection
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
Common Workflow Language
A workflow language that describes how to run command-line tools. WDL and CWL are relatively similar in principle, and code written in one language can often be translated into the other with some workarounds, but they are two different standards and each have unique features. For example, CWL has the ability to use Javascript expressions within its own commands. CWL makes a distinction between a tool and a workflow.
Further reading: https://www.commonwl.org/user_guide/
container
An emulated computer system that contains programs and their prerequisites, but does not contain the entire operating system. Unlike a VM, a container shares the same kernel as the host OS. A well known type of container is a Docker container.
Further reading: https://en.wikipedia.org/wiki/OS-level_virtualization
Cromwell
An open-source WDL executor managed by the Broad Institute. Cromwell is the default WDL executor for the Dockstore CLI and is the executor used by Terra.
Note
This term as we define it here is associated with Broad Institute and may have different definitions in other contexts.
see also miniwdl
Further reading: https://cromwell.readthedocs.io/en/stable/
CWL
abbreviation for Common Workflow Language
cwltool
An open-source CWL executor which serves as the official reference implementation of Common Workflow Language. It is used by the Dockstore CLI to run CWL tools and workflows.
Further reading: https://github.com/common-workflow-language/cwltool
DAG
abbreviation for Directed Acyclic Graph
A directional graph like a flowchart that does not have any loops. On Dockstore we use DAGs to show the steps that a workflow takes.
Further reading: https://cran.r-project.org/web/packages/ggdag/vignettes/intro-to-dags.html
descriptor file
Docker
[pronounced “daw-ker”, rhymes with walker]
A program that can create “images” which are somewhat similar to virtual machines, as well as run those images. In the context of bioinformatics, this technology has two main benefits: First, a Docker image bundles up everything a given piece of software needs to run, meaning that someone who wants to run (for example) samtools via Docker only needs to install Docker, not samtools. Second, an instance of a Docker image is a relatively standardized environment even when running on different backends, meaning that two people running the same software in the same Docker image on two different computers are likely to get the exact same results. In other words, Docker is good for reproducibility and ease of use.
Further reading: https://docker-curriculum.com/
Docker container
In order to actually use the software inside a Docker image using the docker run command, the Docker program creates a writable layer on top of the image, which leads to the creation of a Docker container. You can think of a Docker image as an unchanging template, and a Docker container as a writable instance generated from that template. A Docker image can exist on its own, but a Docker container requires a Docker image.
Further reading: https://www.docker.com/resources/what-container/
Docker image
A read-only file that represents a filesystem that contains some sort of code and that code’s dependencies. A Docker image can be created using the docker build command in conjunction with a Dockerfile. If a workflow language references a Docker image, then the workflow executor will download that Docker image (unless was already downloaded previously) and add a writable layer onto the Docker image, which results in the creation of a Docker container.
Dockerfile
A file describing the creation of a Docker image by running commands that each form a layer.
Further reading: https://docs.docker.com/engine/reference/builder/
Dockstore CLI
abbreviation for Dockstore Command Line Interface
A command-line program developed by Dockstore. It is not required to use Dockstore, but it has many features to make running and developing workflows easier.
see also CLI
Further reading: /advanced-topics/dockstore-cli/dockstore-cli-faq
Dockstore GitHub App
The GitHub App that allows for Dockstore to automatically sync changes made in a GitHub repository with an entry in Dockstore.
see also GitHub App registration
Further reading: /getting-started/github-apps/github-apps-landing-page
DOI
abbreviation for Digital Object Identifier
An identifier that provides a long-lasting link to some sort of immutable digital object. On Dockstore, you can use Zenodo to mint a DOI of your workflows and tools to increase reproducibility.
DRS
[pronounced “derse”, rhymes with verse]
abbreviation for Data Repository Service
Note
This term as we define it here is associated with GA4GH and may have different definitions in other contexts.
Further reading: https://github.com/ga4gh/data-repository-service-schemas
DS-I Africa
abbreviation for Data Science for health discovery and Innovation in Africa
An NIH initiative to leverage data science to address the African continent’s public health needs.
Further reading: https://commonfund.nih.gov/africadata
EC2
abbreviation for Elastic Compute Cloud
The cloud computing side of AWS. You can make use of Amazon’s Spot Instance feature, which may reduce the cost of running workflows, when using EC2 instances.
Further reading: https://docs.aws.amazon.com/ec2/index.html
egress
[pronounced “ee-gress”, rhymes with aggress]
The action of leaving a place. In the context of cloud computing, data egress refers to data being moved from one location to another, such as from the cloud to a local machine, between cloud providers, and between locations of a single cloud provider. Data egress often results in the charge of fees (usually called egress charges). Data egress can be one of the most expensive cloud costs incurred. Sometimes, the person hosting the file is charged for data egress. Other times, the person downloading the file is charged (such as when downloading files from a Google bucket that has the requester-pays option enabled).
Note
This term as we define it here is associated with cloud computing and may have different definitions in other contexts.
eLwazi
[pronounced “el-woz-ee”, derived from the Xhosa word for knowledge (uLwazi) and the Luganda word for rock symbolizing robustness (Olwazi)]
An African-lead open data science platform funded as part of the NIH’s DS-I Africa program.
Further reading: https://elwazi.org/
entry
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
environment variable
A variable that affects how processes run on a computer. For example, cwltool references the environment variable $TMPDIR when deciding where to place files.
Further reading: https://en.wikipedia.org/wiki/Environment_variable
faceted search
A type of search which allows users to narrow down their results based upon certain aspects of the things being searched. On Dockstore, our faceted search at <https://dockstore.org/search> allows users to narrow down their search to a particular workflow language, author, and/or other fields.
Further reading: https://en.wikipedia.org/wiki/Faceted_search
FAIR
[pronounced “fair”, rhymes with pear]
abbreviation for Findable, Accessible, Interoperable, and Reusable
A set of guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. This concept is often applied to data, but can be applied to other assets such as workflows.
Further reading: https://www.go-fair.org/fair-principles/
GA4GH
abbreviation for Global Alliance For Genomics and Health
A network of public and private institutions which aims to accelerate progress in genomic research and human health by cultivating a common framework of standards and harmonized approaches for effective and responsible genomic and health-related data sharing.
Further reading: https://www.ga4gh.org/
Galaxy
An open-source platform that uses FAIR principles, most well-known for its web-based UI used to create and run a variety of bioinformatics tools. A Galaxy instance is a running Galaxy interface/server that can be used to create and execute tools and workflows.
Further reading: https://galaxyproject.org/
Galaxy workflow
Further reading: https://galaxyproject.org/learn/advanced-workflow/
GCP
abbreviation for Google Cloud Platform
A system used for cloud computing and cloud storage hosted by Google. Well-known users of GCP include LinkedIn and Verizon, but GCP can also power bioinformatics. Terra is an example of a bioinformatics system that runs on a GCP backend. When running workflows on GCP backends, make sure to account for the storage needed for your workflow, as GCP compute backends do not automatically scale their storage size at runtime. GCP backends allow you to make use of Google’s preemptible feature, which may reduce the cost of running workflows.
see also EC2
Further reading: https://cloud.google.com/gcp
Gen3
A data science platform affiliated with the University of Chicago. Hosts phenotypic and genotypic data for the BD Catalyst, AnVIL Project, Kids First, and eLwazi grants.
Further reading: https://gen3.org/
GitHub App registration
The recommended way to register a tool or workflow on Dockstore. This involves creating a .dockstore.yml file on the GitHub repository (other source-control methods are not supported) that hosts the tool or workflow, as well as installing the Dockstore GitHub App. This allows a Dockstore entry to remain in sync with the source-control repository automatically, including new branches, tagged commits, and releases created on GitHub after registration of the entry.
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
Further reading: /getting-started/github-apps/github-apps-landing-page
GitHub App tool
A tool registered using the Dockstore GitHub App.
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
see also GitHub App registration
GitHub App workflow
A workflow registered with the Dockstore GitHub App.
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
see also GitHub App registration
immutable
Unchanging, unable to be modified. Immutability implies that an object cannot be updated.
interoperable
The ability of data or tools from multiple resources to effectively integrate data, or operate processes, across all systems with a moderate degree of effort.
JSON
[pronounced “jason”]
abbreviation for JavaScript Object Notation
A human-readable file format that originated in JavaScript, but is now used by a variety of applications. Dockstore supports the inclusion of JSON and YAML files in entries to provide sample inputs for workflow and tool entries. Some workflow executors, such as Cromwell, can use these files to configure their inputs rather than having to manually listing every input when calling the workflow on the command line.
see also YAML
Further reading: https://www.json.org/json-en.html
Jupyter
[pronounced “Jupiter” like the planet]
A project focused on developing “notebooks” for programming languages, most famously Python due to it starting as a splinter of iPython in the early 2010s. Other languages such as R are also supported. Jupyter notebooks allow for blocks of code to be nestled between markdown text, allowing for easy documentation of the code blocks and reproducibility of analysis.
Further reading: https://jupyter.org/
kernel
An operating system’s core program that is always loaded in memory, and modulates interactions between software and physical hardware, including but not limited to managing memory access for any program currently in RAM.
Further reading: https://en.wikipedia.org/wiki/Kernel_(operating_system)
Kids First
abbreviation for Gabriella Miller Kids First Program
An NIH program, supported by the NIH Common Fund, relating to the influence of genomics on pediatric health, with a focus on pediatric cancer and structural birth abnormalities (such as cleft palate).
Further reading: https://commonfund.nih.gov/kidsfirst/highlights
labels
On Dockstore, we use labels to “tag” Dockstore entries with information about them. Workflow or tool developers can add labels to a Dockstore entry page that they have edit access to. An entry’s labels will appear in search results.
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
launch with
On Dockstore, this refers to the functionality of exporting a workflow to one of our cloud execution partners.
layer
In the context of Docker, a layer is a component of a Docker image. Each RUN, COPY, and ADD instruction in a Dockerfile will lead to the creation of a layer.
Further reading: https://docs.docker.com/storage/storagedriver/#images-and-layers
legacy registration
One of the two main ways of registering a tool or workflow. Legacy methods support a variety of source-control repositories, but new changes to the tool or workflow after registration will not be reflected on Dockstore until the maintainer of the Dockstore entry manually refreshes the tool or workflow in Dockstore’s UI. For this reason, we generally recommend people use GitHub App registration instead.
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
legacy tool
On Dockstore, we use this term to refer to a tool that is registered using a legacy registration method. Legacy tools are not automatically synchronized with their source control repository, but can be updated manually by the tool maintainer. Additionally, legacy tools require a Dockerfile to be registered, and are versioned based on the tags of their associated Docker image. A legacy tool can be converted into a GitHub App tool via the method described here.
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
legacy workflow
On Dockstore, we use this term to refer to a workflow that is registered using a legacy registration method. Legacy workflows are not automatically synchronized with their source control repository, but can be updated manually by the workflow maintainer. A legacy workflow can be converted into a GitHub App workflow via the method described here.
Note
This term as we define it here is associated with Dockstore and may have different definitions in other contexts.
miniwdl
A Python-based WDL executor managed by the Chan Zuckerberg Initiative.
Note
This term as we define it here is associated with Chan Zuckerberg Initiative and may have different definitions in other contexts.
see also Cromwell
Further reading: https://github.com/chanzuckerberg/miniwdl
NCI
abbreviation for National Cancer Institute
A division of the NIH focused on cancer research.
Note
This term as we define it here is associated with NIH and may have different definitions in other contexts.
Further reading: https://www.nih.gov/about-nih/what-we-do/nih-almanac/national-cancer-institute-nci
NCPI
abbreviation for NIH Cloud Platform Interoperability
An effort to connect five NIH cloud projects and ensure they are interoperable. The five projects covered under this are the AnVIL Project, BioData Catalyst, Cancer Research Data Commons, Kids First, and the National Center for Biotechnology Information.
Note
This term as we define it here is associated with NIH and may have different definitions in other contexts.
Further reading: https://datascience.nih.gov/nih-cloud-platform-interoperability-effort
Nextflow
A Java-based computational workflow engine. Dockstore supports the hosting of Nextflow workflows.
Further reading: https://www.nextflow.io/
NFL
abbreviation for Nextflow
NHGRI
abbreviation for National Human Genome Research Institute
A division of the NIH that focus on genomics research. Funds the AnVIL Project.
Note
This term as we define it here is associated with NIH and may have different definitions in other contexts.
Further reading: https://www.genome.gov/
NHLBI
abbreviation for National Heart, Lungs, and Blood Institute
A division of the NIH that focuses on heart, lung, blood, and sleep health. Funds the BioData Catalyst platform.
Note
This term as we define it here is associated with NIH and may have different definitions in other contexts.
Further reading: https://www.nhlbi.nih.gov/
NIH
abbreviation for National Institutes of Health
An American government institution, part of the Department of Health and Human Services (HHS), that engages in medical research.
Further reading: https://www.nih.gov/
notebook
An interactive document, made up of “cells” containing code, text, and images, authored and executed in a browser-based programming environment. Project Jupyter popularized Python-based notebooks and maintains related specifications and software.
Further reading: /getting-started/getting-started-with-notebooks
OICR
abbreviation for Ontario Institute for Cancer Research
A non-profit research institute based in Toronto that is focused on cancer detection and treatment. One of the two institutes involved in the development of Dockstore, the other being UCSC.
Further reading: https://oicr.on.ca/
ORCID
[pronounced “or-kid”, rhymes with kid]
abbreviation for Open Researcher and Contributor ID
A unique ID used to identify researchers and their work in a way that doesn’t solely rely on names.
Further reading: https://info.orcid.org/what-is-orcid/
organization
In the context of Dockstore, an organization is a representation of some sort of institute, grant, project, or company. Organizations are approved by Dockstore admins, but any user with at least two external accounts linked to their Dockstore account (and have the authority to speak for the institute, grant, etc. in a technical manner) can request the creation of an organization on Dockstore.
Further reading: https://dockstore.org/organizations
parameter file
parent image
A Docker image which acts as the base upon which another Docker image is built. For example, including
FROM ubuntu:22.04
in a Dockerfile means that the resulting image will include everything inside ubuntu:22.04, plus any changes made by other commands in the Dockerfile. Parent images are sometimes called base images, but strictly speaking a base image is different (see further reading).
Further reading: https://docs.docker.com/glossary/#parent-image
preemptible
A type of GCP VM which may have its running jobs interrupted at any given time, and will be shut down if running for more than 24 hours. A preemptible machine is significantly cheaper than a standard VM, at the cost of possibly stopping before your computational work is finished. You can use preemptible machines when running workflows on GCP backends to save on compute costs.
Note
This term as we define it here is associated with Google and may have different definitions in other contexts.
see also Spot Instance
Further reading: https://cloud.google.com/compute/docs/instances/preemptible
primary descriptor file
The descriptor file that provides the overall description of a workflow or tool, which Dockstore processes first when the workflow or tool is registered.
secondary descriptor file
An ancillary descriptor file, referenced by the primary descriptor file or another secondary descriptor file, that describes part of a workflow or tool.
Seven Bridges
A cloud-based workflow execution platform developed by Seven Bridges Genomics. Seven Bridges supports the execution of CWL workflows and features a graph-based GUI to make workflow development easier. The computational backend of a Seven Bridges workspace can be selected by the user, with both GCP and AWS being supported. Dockstore supports directly importing CWL workflows into a Seven Bridges workspace. Seven Bridges is part of the BioData Catalyst consortium.
see also Terra
Further reading: https://www.sevenbridges.com/platform/
Spot Instance
A type of EC2 instance which is usually much cheaper than the typical on-demand EC2 cost. A spot instance is not guaranteed to be available at any given time, as it is based upon currently unused EC2 availability.
Note
This term as we define it here is associated with Amazon and may have different definitions in other contexts.
see also preemptible
Further reading: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html
Task Execution Service
Further reading: https://ga4gh.github.io/task-execution-schemas/docs/
Terra
A cloud-based workflow execution platform developed by the Broad Institute. Terra supports the execution of WDL workflows, Jupyter/R notebooks, and integrated apps. The computational backend of a Terra workspace is based upon Google, allowing Google-specific features such as preemptible machines to be used in workflows. Dockstore supports directly importing WDL workflows into a Terra workspace. Terra is part of the BioData Catalyst, AnVIL Project, and eLwazi consortia.
see also Seven Bridges
Further reading: https://terra.bio
TES
abbreviation for Task Execution Service
tool
see also workflow
Further reading: /getting-started/intro-to-dockstore-tools-and-workflows
topic
A short text description of an entry, collection, or organization. You can specify an entry’s topic in .dockstore.yml.
TRS
[pronounced “terse”, rhymes with verse]
abbreviation for Tool Registry Service
Note
This term as we define it here is associated with GA4GH and may have different definitions in other contexts.
Further reading: https://ga4gh.github.io/tool-registry-service-schemas/
UCSC
abbreviation for University of California, Santa Cruz
A public university located in Santa Cruz that is focused on undergraduate and graduate education and research. The Genomics Institute, a branch of UCSC’s engineering department, is one of the two institutes involved in the development of Dockstore, the other being OICR.
Further reading: https://www.ucsc.edu
VM
abbreviation for virtual machine
An emulated computer system that runs on another computer system. Usually implies that an entire operating system(s) (the guest OS) is being run on top of another operating system (the host OS) via the host’s hypervisor. The hypervisor manages the execution of processes of the guest operating system. This is in contrast to a container, which do not involve hypervisors nor run entire guest operating systems.
see also container
WDL
[pronounced “widdle”, rhymes with riddle]
abbreviation for Workflow Description Language
WES
[pronounced “wes”, rhymes with mess]
abbreviation for Workflow Execution Service
workflow
A command line program wrapped in a descriptor language, which usually has multiple steps. In CWL, a workflow is usually made up of multiple tools. Other languages consider a workflow to be the basic unit.
see also tool
Further reading: /getting-started/intro-to-dockstore-tools-and-workflows
Workflow Description Language
A workflow language managed by the Open WDL Project that is designed to describe command line tools. Usually written as WDL. WDL and CWL are relatively similar in principle, and code written in one language can often be translated into the other with some workarounds, but they are two different standards and each have unique features. Unlike CWL, WDL does not have an official reference implementation, but Cromwell and miniwdl are popular implementations.
Further reading: https://openwdl.org/
Workflow Execution Service
A standardized API developed by GA4GH for describing a standard programmatic way to run and manage workflows. This standard, also known as WES, can be launched using the Dockstore CLI as described in this Dockstore blog post: <https://medium.com/dockstore/dockstore-partners-with-aws-agc-to-make-launching-workflows-quick-and-easy-7213510dabd8>
Further reading: https://ga4gh.github.io/workflow-execution-service-schemas/
YAML
[pronounced “yah-mul”, rhymes with camel]
abbreviation for YAML Ain’t Markup Language
Human-readable data-serialization language. Commonly used for configuration files.
see also JSON
Further reading: https://yaml.org/