How to create docker and singularity images
General organization¶
-
All Docker and Singularity images should be built off a
Dockerfile
file that lives in its own public GitHub repository in the rasilab GitHub organization account. -
All Docker images should be publicly available in our lab's GitHub container repository.
-
For routine analyses and writing, use our lab's default R, Python, and Latex images. These images should contain commonly used libraries in our lab, as you can see in the
Dockerfile
of each image. -
If you require additional R, Python, or Latex libraries not present in the above default images, build child images based off the above base images. For example, see the MaGeCK package built off our default Python package.
-
If you use a R, Python, or Latex library quite often and you notice that others in the lab are also using it, then incorporate it in the lab's default image while updating the repository and package tag.
-
If you need standalone bioinformatic packages, derive them off the Miniconda3 4.12.0 base image. For example, see the Samtools package in our lab's package repository. You can also use these images as basis for other packages. For example, see the Bowtie2 package in our lab's package repository.
Steps to build and tag Docker images¶
To create or update a Dockerfile
and the associated image
, follow these steps:
-
Edit the
Dockerfile
inVScode
to add or remove software. Always make modifications at the end of theDockerfile
so that only few intermediate images need to be re-created. This decreases the space and time requirements for building and pushing the image. -
If you are creating a new GitHub repository for a new package, use a Dockerfile and README template from a previous package GitHub repository, for eg. MaGeCK. Make sure to set the repository name and description appropriately, and update its visibility to public.
-
Build the Docker image from the folder that contains the
Dockerfile
. Give a specific tag (see below for tag convention). -
Assuming the above build command is successful, first commit and tag the
Dockerfile
. You might have to set the remote if you created a new GitHub repository usinggit remote set-url origin <url>
. GitHub will prompt you for this when you create -
We use the same tag for the GitHub repository containing the
Dockerfile
and the correspondingimage
. Both tags follow the semantic versioning convention. In practice, this means:-
Use tags of the form
X.Y.Z
. Start with0
forX
and1
forY
and0
forZ
. -
If your package is for a single software package (eg. MaGeCK), then use the same version tag as the software package version you are installing. For example, see our lab's MaGeCK image.
-
If you fix an error in a Dockerfile/image that contains several different packages (eg. our lab's Python image), increment the third digit
Z
. -
If you add an extra package that will not break compatibility of the Docker image with previous versions, increment the second digit
Y
. -
If you make any incompatible changes, increment the first digit
X
(avoid this as much as possible).
-
-
Push the Docker image to the GitHub container registry. The first time you do this from a computer, you will have to set up a Personal Access Token (classic version) as explained here. Make sure to give the
write:packages
permission when you create the token. -
You can pull these images as
Singularity
images using the following command (used on Fred Hutch cluster): -
If you are pulling Singularity images that are not public (avoid this as much as possible), then you need to authenticate with GitHub packages using a personal access token. To do this create a GitHub Personal Access Token and save it in your
~/.bash_profile
as follows: -
Note that
Docker
is not allowed on the Fred Hutch cluster. You can useSingularity
images instead.
How to use images locally¶
-
You can pull images to your local machine by:
-
You can create a named container from the above image by:
-
You can use the container to run a command. For example,
-
You can also use the containers in a VSCode
.devcontainer.json
file (for eg. to run Jupyter Notebooks):
How to use our lab's Singularity containers in Snakemake workflows on the Fred Hutch cluster¶
We often use our lab's standard Docker containers for running specific Snakemake rules on the Fred Hutch cluster. If you use a specification of the form singularity: "docker://ghcr.io/rasilab/r:1.0.0"
within a Snakemake rule, then this Docker container will be pulled to your local directory, converted to a Singularity container, and stored at .snakemake/singularity
in your current working directory. However, it takes a long time to download and convert each Docker container and each conntainer also uses up several GB of space. To avoid this, Rasi has stored a copy of all commonly used Singularity containers in our lab at /fh/scratch/delete90/subramaniam_a/user/rasi/singularity
. You can symbolically link this folder to your workflow by executing the following from the folder that contains your Snakemake file:
This will make all common singularity containers (files ending with .simg
) immediately available to your Snakemake workflow. Note that Snakemake names the Singularity containers based on their SHA IDs.