How to create docker and singularity images

General organization¶

All Docker and Singularity images should be built off a Dockerfile file that lives in its own public GitHub repository in the rasilab GitHub organization account.
All Docker images should be publicly available in our lab's GitHub container repository.
For routine analyses and writing, use our lab's default R, Python, and Latex images. These images should contain commonly used libraries in our lab, as you can see in the Dockerfile of each image.
If you require additional R, Python, or Latex libraries not present in the above default images, build child images based off the above base images. For example, see the MaGeCK package built off our default Python package.
If you use a R, Python, or Latex library quite often and you notice that others in the lab are also using it, then incorporate it in the lab's default image while updating the repository and package tag.
If you need standalone bioinformatic packages, derive them off the Miniconda3 4.12.0 base image. For example, see the Samtools package in our lab's package repository. You can also use these images as basis for other packages. For example, see the Bowtie2 package in our lab's package repository.

Steps to build and tag Docker images¶

To create or update a Dockerfile and the associated image, follow these steps:

Edit the Dockerfile in VScode to add or remove software. Always make modifications at the end of the Dockerfile so that only few intermediate images need to be re-created. This decreases the space and time requirements for building and pushing the image.
If you are creating a new GitHub repository for a new package, use a Dockerfile and README template from a previous package GitHub repository, for eg. MaGeCK. Make sure to set the repository name and description appropriately, and update its visibility to public.
Build the Docker image from the folder that contains the Dockerfile. Give a specific tag (see below for tag convention).
```
docker build -t <image_name:tag> .
# example
docker build -t ghcr.io/rasilab/mageck:0.5.9 .
```
Assuming the above build command is successful, first commit and tag the Dockerfile. You might have to set the remote if you created a new GitHub repository using git remote set-url origin <url>. GitHub will prompt you for this when you create
```
git add Dockerfile
git commit -m "Update Dockerfile with <software>"
git tag -a <tag> -m "Update Dockerfile with <software>"
git push origin <tag>
git push --all
```
We use the same tag for the GitHub repository containing the Dockerfile and the corresponding image. Both tags follow the semantic versioning convention. In practice, this means:
1. Use tags of the form X.Y.Z. Start with 0 for X and 1 for Y and 0 for Z.
2. If your package is for a single software package (eg. MaGeCK), then use the same version tag as the software package version you are installing. For example, see our lab's MaGeCK image.
3. If you fix an error in a Dockerfile/image that contains several different packages (eg. our lab's Python image), increment the third digit Z.
4. If you add an extra package that will not break compatibility of the Docker image with previous versions, increment the second digit Y.
5. If you make any incompatible changes, increment the first digit X (avoid this as much as possible).
Push the Docker image to the GitHub container registry. The first time you do this from a computer, you will have to set up a Personal Access Token (classic version) as explained here. Make sure to give the write:packages permission when you create the token.
```
docker push <image_name:tag>
# example
docker push ghcr.io/rasilab/python:1.2.0
```

You can pull these images as Singularity images using the following command (used on Fred Hutch cluster):

module load Singularity
singularity pull docker://<image_name:tag>
# example
singularity pull docker://ghcr.io/rasilab/python:1.0.0

If you are pulling Singularity images that are not public (avoid this as much as possible), then you need to authenticate with GitHub packages using a personal access token. To do this create a GitHub Personal Access Token and save it in your ~/.bash_profile as follows:
```
export SINGULARITY_DOCKER_USERNAME=<your_github_username>
export SINGULARITY_DOCKER_PASSWORD=<your_personal_access_token>
```
Note that Docker is not allowed on the Fred Hutch cluster. You can use Singularity images instead.

How to use images locally¶

You can pull images to your local machine by:

docker pull docker://ghcr.io/rasilab/PACAGE_NAME:X.Y.Z

You can create a named container from the above image by:

docker run -i -d --name pandoc-latex -v $HOME:$HOME ghcr.io/rasilab/pandoc-latex:1.1.0

You can use the container to run a command. For example,

docker exec -w $(pwd) pandoc-latex pandoc manuscript.md --citeproc --template=template.tex --metadata-file=pandoc-options.yaml --pdf-engine=xelatex -o manuscript.pdf --filter=pandoc-svg.py

You can also use the containers in a VSCode .devcontainer.json file (for eg. to run Jupyter Notebooks):
```
"image": "ghcr.io/rasilab/r:1.0.0",
```

How to use our lab's Singularity containers in Snakemake workflows on the Fred Hutch cluster¶

We often use our lab's standard Docker containers for running specific Snakemake rules on the Fred Hutch cluster. If you use a specification of the form singularity: "docker://ghcr.io/rasilab/r:1.0.0" within a Snakemake rule, then this Docker container will be pulled to your local directory, converted to a Singularity container, and stored at .snakemake/singularity in your current working directory. However, it takes a long time to download and convert each Docker container and each conntainer also uses up several GB of space. To avoid this, Rasi has stored a copy of all commonly used Singularity containers in our lab at /fh/scratch/delete90/subramaniam_a/user/rasi/singularity. You can symbolically link this folder to your workflow by executing the following from the folder that contains your Snakemake file:

mkdir .snakemake
cd .snakemake
ln -s /fh/scratch/delete90/subramaniam_a/user/rasi/singularity .

This will make all common singularity containers (files ending with .simg) immediately available to your Snakemake workflow. Note that Snakemake names the Singularity containers based on their SHA IDs.