Dockstore Dictionary

.dockstore.yml

This file is part of GitHub App registration. It indexes workflows or tools within a repository, including their optional test parameter files, and the author(s) of said workflows or tools.

Further reading: /assets/templates/template

absolute path

A path that starts with the character / and contains the full set of directories necessary to resolve a file, starting from the root directory of the repository or filesystem. For example: /Dockstore.cwl or /bin/sh

AGC

abbreviation for Amazon Genomics CLI

Amazon Genomics CLI

A CLI-based tool that supports launching bioinformatics-related workflows on AWS cloud infrastructure. The Dockstore CLI can launch workflows on AWS using Amazon Genomics CLI’s WES implementation.

see also AGC

Further reading: https://aws.amazon.com/blogs/industries/announcing-amazon-genomics-cli-preview/

AnVIL Project

abbreviation for Analysis Visualization and Informatics Labspace

A federated cloud platform funded by NHGRI designed to manage and store genomics and related data, enable population-scale analysis, and facilitate collaboration through the sharing of data, code, and analysis results. Sometimes referred to as just “the AnVIL” or “AnVIL”.

Further reading: https://anvilproject.org/

API

abbreviation for Application Programmer Interface

A software connection or interface used to exchange data, often between two different platforms. Communication between different cloud platforms is mediated by various APIs, such as TES.

AWS

abbreviation for Amazon Web Services

A provider of cloud services, most notably cloud computing and cloud storage, available on-demand and hosted by Amazon. Netflix and AirBnB are examples of a system that is powered by AWS. Some bioinformatics systems such as Seven Bridges can leverage AWS by launching workflows on EC2 instances.

see also GCP

Further reading: https://docs.aws.amazon.com/index.html?nc2=h_ql_doc_do

BD Catalyst

abbreviation for BioData Catalyst

BDC

[pronounced “bee-dee-see”]

abbreviation for BioData Catalyst

BioData Catalyst

A cloud-based platform funded by NHLBI to provide tools, applications, and workflows in secure workspaces to expand research in heart, lung, blood, and sleep health.

Further reading: https://biodatacatalyst.nhlbi.nih.gov/

Cancer Genomics Cloud

A cloud platform by Seven Bridges and funded by NCI for bioinformatics analysis.

categories

A group of workflows or tools curated by Dockstore with a similar scientific purpose.

CGC

abbreviation for Cancer Genomics Cloud

CLI

abbreviation for Command Line Interface

A program that can be interacted with on the command line, usually via “Terminal” on MacOS and Linux or “cmd”/Command Prompt on Windows. CLI programs generally do not have a graphical user interface.

Further reading: https://en.wikipedia.org/wiki/Command-line_interface

cloud computing

Doing computational tasks on a remote machine that is made available on-demand without the user having to manage all aspects of it. Generally implies that the user is essentially renting computational resources from someone else. Well-known cloud providers include GCP, AWS, Microsoft Azure, and Alibaba Cloud.

Further reading: https://en.wikipedia.org/wiki/Cloud_computing

collection

A group of at least one entry on Dockstore that the members of an organization found useful, created themselves, or considered interesting. Each collection has a description, which you can read to see why the organization compiled workflows/tools in a collection

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

Common Workflow Language

A workflow language that describes how to run command-line tools. WDL and CWL are relatively similar in principle, and code written in one language can often be translated into the other with some workarounds, but they are two different standards and each have unique features. For example, CWL has the ability to use Javascript expressions within its own commands.

see also CWL, WDL

Further reading: https://www.commonwl.org/user_guide/

container

An emulated computer system that contains programs and their prerequisites, but does not contain the entire operating system. Unlike a VM, a container shares the same kernel as the host OS. A well known type of container is a Docker container.

Further reading: https://en.wikipedia.org/wiki/OS-level_virtualization

Cromwell

An open-source WDL executor managed by the Broad Institute. Cromwell is the default WDL executor for the Dockstore CLI and is the executor used by Terra.

Note

This term as we define it here is associated with Broad Institute and may have different definitions in other contexts.

Further reading: https://cromwell.readthedocs.io/en/stable/

CWL

abbreviation for Common Workflow Language

DAG

abbreviation for Directed Acyclic Graph

A directional graph like a flowchart that does not have any loops. On Dockstore we use DAGs to show the steps that a workflow takes.

Further reading: https://cran.r-project.org/web/packages/ggdag/vignettes/intro-to-dags.html

descriptor file

A file used to programmatically describe a tool or workflow. This file represents the instructions that will actually be executed. On Dockstore, we support .ga, .cwl, .wdl, and .nfl file extensions for Galaxy, CWL, WDL, and Nextflow respectively.

Docker

[pronounced “daw-ker”, rhymes with walker]

A program that can create “images” which are somewhat similar to virtual machines, as well as run those images. In the context of bioinformatics, this technology has two main benefits: First, a Docker image bundles up everything a given piece of software needs to run, meaning that someone who wants to run (for example) samtools via Docker only needs to install Docker, not samtools. Second, an instance of a Docker image is a relatively standardized environment even when running on different backends, meaning that two people running the same software in the same Docker image on two different computers are likely to get the exact same results. In other words, Docker is good for reproducibility and ease of use.

Further reading: https://docker-curriculum.com/

Docker container

In order to actually use the software inside a Docker image using the docker run command, the Docker program creates a writable layer on top of the image, which leads to the creation of a Docker container. You can think of a Docker image as an unchanging template, and a Docker container as a writable instance generated from that template. A Docker image can exist on its own, but a Docker container requires a Docker image.

Further reading: https://www.docker.com/resources/what-container/

Docker image

A read-only file that represents a filesystem that contains some sort of code and that code’s dependencies. A Docker image can be created using the docker build command in conjunction with a Dockerfile. If a workflow language references a Docker image, then the workflow executor will download that Docker image (unless was already downloaded previously) and add a writable layer onto the Docker image, which results in the creation of a Docker container.

Dockerfile

A file describing the creation of a Docker image by running commands that each form a layer.

Further reading: https://docs.docker.com/engine/reference/builder/

Dockstore CLI

abbreviation for Dockstore Command Line Interface

A command-line program developed by Dockstore. It is not required to use Dockstore, but it has many features to make running and developing workflows easier.

see also CLI

Further reading: /advanced-topics/dockstore-cli/dockstore-cli-faq

Dockstore GitHub App

The GitHub App that allows for Dockstore to automatically sync changes made in a GitHub repository with an entry in Dockstore.

see also GitHub App registration

Further reading: /getting-started/github-apps/github-apps-landing-page

DOI

abbreviation for Digital Object Identifier

An identifier that provides a long-lasting link to some sort of immutable digital object. On Dockstore, you can use Zenodo to mint a DOI of your workflows and tools to increase reproducibility.

DRS

[pronounced “derse”, rhymes with verse]

abbreviation for Data Repository Service

A standardized API, created by the GA4GH Cloud Work Stream, that provides portable access to repositories of data resources.

Note

This term as we define it here is associated with GA4GH and may have different definitions in other contexts.

Further reading: https://github.com/ga4gh/data-repository-service-schemas

DS-I Africa

abbreviation for Data Science for health discovery and Innovation in Africa

An NIH initiative to leverage data science to address the African continent’s public health needs.

Further reading: https://commonfund.nih.gov/africadata

EC2

abbreviation for Elastic Compute Cloud

The cloud computing side of AWS. You can make use of Amazon’s Spot Instance feature, which may reduce the cost of running workflows, when using EC2 instances.

Further reading: https://docs.aws.amazon.com/ec2/index.html

egress

[pronounced “ee-gress”, rhymes with aggress]

The action of leaving a place. In the context of cloud computing, data egress refers to data being moved from one location to another, such as from the cloud to a local machine, between cloud providers, and between locations of a single cloud provider. Data egress often results in the charge of fees (usually called egress charges). Data egress can be one of the most expensive cloud costs incurred. Sometimes, the person hosting the file is charged for data egress. Other times, the person downloading the file is charged (such as when downloading files from a Google bucket that has the requester-pays option enabled).

Note

This term as we define it here is associated with cloud computing and may have different definitions in other contexts.

eLwazi

[pronounced “el-woz-ee”, derived from the Xhosa word for knowledge (uLwazi) and the Luganda word for rock symbolizing robustness (Olwazi)]

An African-lead open data science platform funded as part of the NIH’s DS-I Africa program.

Further reading: https://elwazi.org/

entry

Shorthand for a tool or workflow that has been registered on Dockstore.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

FAIR

[pronounced “fair”, rhymes with pear]

abbreviation for Findable, Accessible, Interoperable, and Reusable

A set of guidelines to improve the Findability, Accessibility, Interoperability, and Reuse of digital assets. This concept is often applied to data, but can be applied to other assets such as workflows.

Further reading: https://www.go-fair.org/fair-principles/

GA4GH

abbreviation for Global Alliance For Genomics and Health

A network of public and private institutions which aims to accelerate progress in genomic research and human health by cultivating a common framework of standards and harmonized approaches for effective and responsible genomic and health-related data sharing.

Further reading: https://www.ga4gh.org/

Galaxy

An open-source platform that uses FAIR principles, most well-known for its web-based UI used to create and run a variety of bioinformatics tools. A Galaxy instance is a running Galaxy interface/server that can be used to create and execute tools and workflows.

Further reading: https://galaxyproject.org/

Galaxy workflow

A type of workflow that follows the standards of the Galaxy execution system. Dockstore supports the registration of Galaxy workflows with the file extension .ga

Further reading: https://galaxyproject.org/learn/advanced-workflow/

GCP

abbreviation for Google Cloud Platform

A system used for cloud computing and cloud storage hosted by Google. Well-known users of GCP include LinkedIn and Verizon, but GCP can also power bioinformatics. Terra is an example of a bioinformatics system that runs on a GCP backend. When running workflows on GCP backends, make sure to account for the storage needed for your workflow, as GCP compute backends do not automatically scale their storage size at runtime. GCP backends allow you to make use of Google’s preemptible feature, which may reduce the cost of running workflows.

see also EC2

Further reading: https://cloud.google.com/gcp

Gen3

A data science platform affiliated with the University of Chicago. Hosts phenotypic and genotypic data for the BD Catalyst, AnVIL Project, Kids First, and eLwazi grants.

Further reading: https://gen3.org/

GitHub App registration

The recommended way to register a tool or workflow on Dockstore. This involves creating a .dockstore.yml file on the GitHub repository (other source-control methods are not supported) that hosts the tool or workflow, as well as installing the Dockstore GitHub App. This allows a Dockstore entry to remain in sync with the source-control repository automatically, including new branches, tagged commits, and releases created on GitHub after registration of the entry.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

Further reading: /getting-started/github-apps/github-apps-landing-page

GitHub App tool

A tool registered using the Dockstore GitHub App.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

see also GitHub App registration

GitHub App workflow

A workflow registered with the Dockstore GitHub App.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

see also GitHub App registration

immutable

Unchanging, unable to be modified. Immutability implies that an object cannot be updated.

interoperable

The ability of data or tools from multiple resources to effectively integrate data, or operate processes, across all systems with a moderate degree of effort.

JSON

[pronounced “jason”]

abbreviation for JavaScript Object Notation

A human-readable file format that originated in JavaScript, but is now used by a variety of applications. Dockstore supports the inclusion of JSON and YAML files in entries to provide sample inputs for workflow and tool entries. Some workflow executors, such as Cromwell, can use these files to configure their inputs rather than having to manually listing every input when calling the workflow on the command line.

see also YAML

Further reading: https://www.json.org/json-en.html

Jupyter

[pronounced “Jupiter” like the planet]

A project focused on developing “notebooks” for programming languages, most famously Python due to it starting as a splinter of iPython in the early 2010s. Other languages such as R are also supported. Jupyter notebooks allow for blocks of code to be nestled between markdown text, allowing for easy documentation of the code blocks and reproducibility of analysis.

Further reading: https://jupyter.org/

kernel

An operating system’s core program that is always loaded in memory, and modulates interactions between software and physical hardware, including but not limited to managing memory access for any program currently in RAM.

Further reading: https://en.wikipedia.org/wiki/Kernel_(operating_system)

Kids First

abbreviation for Gabriella Miller Kids First Program

An NIH program, supported by the NIH Common Fund, relating to the influence of genomics on pediatric health, with a focus on pediatric cancer and structural birth abnormalities (such as cleft palate).

Further reading: https://commonfund.nih.gov/kidsfirst/highlights

labels

On Dockstore, we use labels to “tag” Dockstore entries with information about them. Workflow or tool developers can add labels to a Dockstore entry page that they have edit access to. An entry’s labels will appear in search results.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

launch with

On Dockstore, this refers to the functionality of exporting a workflow to one of our cloud execution partners.

layer

In the context of Docker, a layer is a component of a Docker image. Each RUN, COPY, and ADD instruction in a Dockerfile will lead to the creation of a layer.

Further reading: https://docs.docker.com/storage/storagedriver/#images-and-layers

legacy registration

One of the two main ways of registering a tool or workflow. Legacy methods support a variety of source-control repositories, but new changes to the tool or workflow after registration will not be reflected on Dockstore until the maintainer of the Dockstore entry manually refreshes the tool or workflow in Dockstore’s UI. For this reason, we generally recommend people use GitHub App registration instead.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

legacy tool

On Dockstore, we use this term to refer to a tool that is registered using a legacy registration method. Legacy tools are not automatically synchronized with their source control repository, but can be updated manually by the tool maintainer. Additionally, legacy tools require a Dockerfile to be registered, and are versioned based on the tags of their associated Docker image. A legacy tool can be converted into a GitHub App tool via the method described here.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

legacy workflow

On Dockstore, we use this term to refer to a workflow that is registered using a legacy registration method. Legacy workflows are not automatically synchronized with their source control repository, but can be updated manually by the workflow maintainer. A legacy workflow can be converted into a GitHub App workflow via the method described here.

Note

This term as we define it here is associated with Dockstore and may have different definitions in other contexts.

NCI

abbreviation for National Cancer Institute

A division of the NIH focused on cancer research.

Further reading: https://www.nih.gov/about-nih/what-we-do/nih-almanac/national-cancer-institute-nci

NCPI

abbreviation for NIH Cloud Platform Interoperability

An effort to connect five NIH cloud projects and ensure they are interoperable. The five projects covered under this are the AnVIL Project, BioData Catalyst, Cancer Research Data Commons, Kids First, and the National Center for Biotechnology Information.

Note

This term as we define it here is associated with NIH and may have different definitions in other contexts.

Further reading: https://datascience.nih.gov/nih-cloud-platform-interoperability-effort

Nextflow

A Java-based computational workflow engine. Dockstore supports the hosting of Nextflow workflows.

Further reading: https://www.nextflow.io/

NFL

abbreviation for Nextflow

An uncommon acronym for Nextflow. This abbreviation is not used as frequently as CWL or WDL, but does see usage occasionally.

NHGRI

abbreviation for National Human Genome Research Institute

A division of the NIH that focus on genomics research. Funds the AnVIL Project.

Further reading: https://www.genome.gov/

NHLBI

abbreviation for National Heart, Lungs, and Blood Institute

A division of the NIH that focuses on heart, lung, blood, and sleep health. Funds the BioData Catalyst platform.

Further reading: https://www.nhlbi.nih.gov/

NIH

abbreviation for National Institutes of Health

An American government institution, part of the Department of Health and Human Services (HHS), that engages in medical research.

Further reading: https://www.nih.gov/

OICR

abbreviation for Ontario Institute for Cancer Research

A non-profit research institute based in Toronto that is focused on cancer detection and treatment. One of the two institutes involved in the development of Dockstore, the other being UCSC.

Further reading: https://oicr.on.ca/

ORCID

[pronounced “or-kid”, rhymes with kid]

abbreviation for Open Researcher and Contributor ID

A unique ID used to identify researchers and their work in a way that doesn’t solely rely on names.

Further reading: https://info.orcid.org/what-is-orcid/

organization

In the context of Dockstore, an organization is a representation of some sort of institute, grant, project, or company. Organizations are approved by Dockstore admins, but any user with at least two external accounts linked to their Dockstore account (and have the authority to speak for the institute, grant, etc. in a technical manner) can request the creation of an organization on Dockstore.

Further reading: https://dockstore.org/organizations

parameter file

A JSON or YAML file that describes the inputs to a workflow, such as runtime parameters or links to cloud data.

preemptible

A type of GCP VM which may have its running jobs interrupted at any given time, and will be shut down if running for more than 24 hours. A preemptible machine is significantly cheaper than a standard VM, at the cost of possibly stopping before your computational work is finished. You can use preemptible machines when running workflows on GCP backends to save on compute costs.

Note

This term as we define it here is associated with Google and may have different definitions in other contexts.

see also Spot Instance

Further reading: https://cloud.google.com/compute/docs/instances/preemptible

primary descriptor file

The descriptor file that provides the overall description of a workflow or tool, which Dockstore processes first when the workflow or tool is registered.

secondary descriptor file

An ancillary descriptor file, referenced by the primary descriptor file or another secondary descriptor file, that describes part of a workflow or tool.

Seven Bridges

A cloud-based workflow execution platform developed by Seven Bridges Genomics. Seven Bridges supports the execution of CWL workflows and features a graph-based GUI to make workflow development easier. The computational backend of a Seven Bridges workspace can be selected by the user, with both GCP and AWS being supported. Dockstore supports directly importing CWL workflows into a Seven Bridges workspace. Seven Bridges is part of the BioData Catalyst consortium.

see also Terra

Further reading: https://www.sevenbridges.com/platform/

Spot Instance

A type of EC2 instance which is usually much cheaper than the typical on-demand EC2 cost. A spot instance is not guaranteed to be available at any given time, as it is based upon currently unused EC2 availability.

Note

This term as we define it here is associated with Amazon and may have different definitions in other contexts.

see also preemptible

Further reading: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-spot-instances.html

Task Execution Service

A standardized API developed by GA4GH for describing and executing batch execution tasks.

Further reading: https://ga4gh.github.io/task-execution-schemas/docs/

Terra

A cloud-based workflow execution platform developed by the Broad Institute. Terra supports the execution of WDL workflows, Jupyter/R notebooks, and integrated apps. The computational backend of a Terra workspace is based upon Google, allowing Google-specific features such as preemptible machines to be used in workflows. Dockstore supports directly importing WDL workflows into a Terra workspace. Terra is part of the BioData Catalyst, AnVIL Project, and eLwazi consortia.

see also Seven Bridges

Further reading: https://terra.bio

TES

abbreviation for Task Execution Service

tool

A single command line program wrapped in a descriptor language. Languages that formally describe tools (such as CWL) may chain them together into a workflow.

see also workflow

Further reading: /getting-started/intro-to-dockstore-tools-and-workflows

TRS

[pronounced “terse”, rhymes with verse]

abbreviation for Tool Registry Service

A standardized API, created by the GA4GH Cloud Work Stream, that provides portable access to a registry of tools, workflows, and associated files. Every resource in a TRS registry has a public ID that can be used to retrieve it. Dockstore provides a TRS interface.

Note

This term as we define it here is associated with GA4GH and may have different definitions in other contexts.

Further reading: https://ga4gh.github.io/tool-registry-service-schemas/

UCSC

abbreviation for University of California, Santa Cruz

A public university located in Santa Cruz that is focused on undergraduate and graduate education and research. The Genomics Institute, a branch of UCSC’s engineering department, is one of the two institutes involved in the development of Dockstore, the other being OICR.

Further reading: https://ucsc.edu

VM

abbreviation for virtual machine

An emulated computer system that runs on another computer system. Usually implies that an entire operating system(s) (the guest OS) is being run on top of another operating system (the host OS) via the host’s hypervisor. The hypervisor manages the execution of processes of the guest operating system. This is in contrast to a container, which do not involve hypervisors nor run entire guest operating systems.

see also container

WDL

[pronounced “widdle”, rhymes with riddle]

abbreviation for Workflow Description Language

WES

[pronounced “wes”, rhymes with mess]

abbreviation for Workflow Execution Service

workflow

A command line program wrapped in a descriptor language, which usually has multiple steps. In CWL, a workflow is usually made up of multiple tools. Other languages consider a workflow to be the basic unit.

see also tool

Further reading: /getting-started/intro-to-dockstore-tools-and-workflows

Workflow Description Language

A workflow language managed by the Open WDL Project that is designed to describe command-line tools. Usually written as WDL. WDL and CWL are relatively similar in principle, and code written in one language can often be translated into the other with some workarounds, but they are two different standards and each have unique features.

see also WDL, CWL

Further reading: https://openwdl.org/

Workflow Execution Service

A standardized API developed by GA4GH for describing a standard programmatic way to run and manage workflows. This standard, also known as WES, can be launched using the Dockstore CLI as described in this Dockstore blog post: <https://medium.com/dockstore/dockstore-partners-with-aws-agc-to-make-launching-workflows-quick-and-easy-7213510dabd8>

Further reading: https://ga4gh.github.io/workflow-execution-service-schemas/

YAML

[pronounced “yah-mul”, rhymes with camel]

abbreviation for YAML Ain’t Markup Language

Human-readable data-serialization language. Commonly used for configuration files.

see also JSON

Further reading: https://yaml.org/