Skip to content

How to create docker and singularity images

General organization

  • All Docker and Singularity images should be built off a Dockerfile file that lives in its own public GitHub repository in the rasilab GitHub organization account.

  • All Docker images should be publicly available in our lab's GitHub container repository.

  • For routine analyses and writing, use our lab's default R, Python, and Latex images. These images should contain commonly used libraries in our lab, as you can see in the Dockerfile of each image.

  • If you require additional R, Python, or Latex libraries not present in the above default images, build child images based off the above base images. For example, see the MaGeCK package built off our default Python package.

  • If you use a R, Python, or Latex library quite often and you notice that others in the lab are also using it, then incorporate it in the lab's default image while updating the repository and package tag.

  • If you need standalone bioinformatic packages, derive them off the Miniconda3 4.12.0 base image. For example, see the Samtools package in our lab's package repository. You can also use these images as basis for other packages. For example, see the Bowtie2 package in our lab's package repository.

Steps to build and tag Docker images

To create or update a Dockerfile and the associated image, follow these steps:

  • Edit the Dockerfile in VScode to add or remove software. Always make modifications at the end of the Dockerfile so that only few intermediate images need to be re-created. This decreases the space and time requirements for building and pushing the image.

  • If you are creating a new GitHub repository for a new package, use a Dockerfile and README template from a previous package GitHub repository, for eg. MaGeCK. Make sure to set the repository name and description appropriately, and update its visibility to public.

  • Build the Docker image from the folder that contains the Dockerfile. Give a specific tag (see below for tag convention).

    docker build -t <image_name:tag> .
    # example
    docker build -t ghcr.io/rasilab/mageck:0.5.9 .
    
  • Assuming the above build command is successful, first commit and tag the Dockerfile. You might have to set the remote if you created a new GitHub repository using git remote set-url origin <url>. GitHub will prompt you for this when you create

    git add Dockerfile
    git commit -m "Update Dockerfile with <software>"
    git tag -a <tag> -m "Update Dockerfile with <software>"
    git push origin <tag>
    git push --all
    
  • We use the same tag for the GitHub repository containing the Dockerfile and the corresponding image. Both tags follow the semantic versioning convention. In practice, this means:

    1. Use tags of the form X.Y.Z. Start with 0 for X and 1 for Y and 0 for Z.

    2. If your package is for a single software package (eg. MaGeCK), then use the same version tag as the software package version you are installing. For example, see our lab's MaGeCK image.

    3. If you fix an error in a Dockerfile/image that contains several different packages (eg. our lab's Python image), increment the third digit Z.

    4. If you add an extra package that will not break compatibility of the Docker image with previous versions, increment the second digit Y.

    5. If you make any incompatible changes, increment the first digit X (avoid this as much as possible).

  • Push the Docker image to the GitHub container registry. The first time you do this from a computer, you will have to set up a Personal Access Token (classic version) as explained here. Make sure to give the write:packages permission when you create the token.

    docker push <image_name:tag>
    # example
    docker push ghcr.io/rasilab/python:1.2.0
    
  • You can pull these images as Singularity images using the following command (used on Fred Hutch cluster):

    module load Singularity
    singularity pull docker://<image_name:tag>
    # example
    singularity pull docker://ghcr.io/rasilab/python:1.0.0
    
  • If you are pulling Singularity images that are not public (avoid this as much as possible), then you need to authenticate with GitHub packages using a personal access token. To do this create a GitHub Personal Access Token and save it in your ~/.bash_profile as follows:

    export SINGULARITY_DOCKER_USERNAME=<your_github_username>
    export SINGULARITY_DOCKER_PASSWORD=<your_personal_access_token>
    
  • Note that Docker is not allowed on the Fred Hutch cluster. You can use Singularity images instead.

How to use images locally

  • You can pull images to your local machine by:

    docker pull docker://ghcr.io/rasilab/PACAGE_NAME:X.Y.Z
    
  • You can create a named container from the above image by:

    docker run -i -d --name pandoc-latex -v $HOME:$HOME ghcr.io/rasilab/pandoc-latex:1.1.0
    
  • You can use the container to run a command. For example,

    docker exec -w $(pwd) pandoc-latex pandoc manuscript.md --citeproc --template=template.tex --metadata-file=pandoc-options.yaml --pdf-engine=xelatex -o manuscript.pdf --filter=pandoc-svg.py
    
  • You can also use the containers in a VSCode .devcontainer.json file (for eg. to run Jupyter Notebooks):

    "image": "ghcr.io/rasilab/r:1.0.0",
    

How to use our lab's Singularity containers in Snakemake workflows on the Fred Hutch cluster

We often use our lab's standard Docker containers for running specific Snakemake rules on the Fred Hutch cluster. If you use a specification of the form singularity: "docker://ghcr.io/rasilab/r:1.0.0" within a Snakemake rule, then this Docker container will be pulled to your local directory, converted to a Singularity container, and stored at .snakemake/singularity in your current working directory. However, it takes a long time to download and convert each Docker container and each conntainer also uses up several GB of space. To avoid this, Rasi has stored a copy of all commonly used Singularity containers in our lab at /fh/scratch/delete90/subramaniam_a/user/rasi/singularity. You can symbolically link this folder to your workflow by executing the following from the folder that contains your Snakemake file:

mkdir .snakemake
cd .snakemake
ln -s /fh/scratch/delete90/subramaniam_a/user/rasi/singularity .

This will make all common singularity containers (files ending with .simg) immediately available to your Snakemake workflow. Note that Snakemake names the Singularity containers based on their SHA IDs.

How to use the Singularity container for interactive data analysis in R and Python

Steps on the remote machine (for example, Fred Hutch rhino cluster).

  • On a terminal, login into rhino
ssh rhino02
  • Make sure that any conda initialization is commented out in your .bashrc or .bash_profile file on the remote machine. This step is important. Otherwise, VScode will not recognize the conda environments within the Singularity container.

  • Do the remote operations below from within a tmux session so that you can detach and logout of your remote session and still keep the container running.

  • Start a tmux session:

tmux new -s tunnel
  • Make Singularity available:
module load Singularity
  • Pull the Singularity container from the Subramaniam lab GitHub Packages Repo:

```bash cd /fh/scratch/delete90/subramaniam_a/user/rasi/singularity/ singularity exec -B /fh -B /hpc r_python_1.3.0.sif /bin/bash

  • Start a VScode CLI tunnel from within the container:

./code tunnel
If you are doing the above the first time, you will have to login to GitHub using the displayed code and also name the tunnel.

Steps on local machine (for example, your lab desktop computer)

  • On VScode, open a remote window

    • Search for the command Remote-Tunnels: Connect to Tunnel:
    • Alternatly, the bottom left-hand corner with >< symbol for Open Remote Window
  • Use the GitHub account to start the tunnel

  • Select the active tunnel

  • You can now open any folder on the remote machine and create a Jupyter notebook.

  • You should be able to pick the Python interpreter at /opt/conda/bin/python or the Jupyter R kernel at /opt/conda/envs/R/lib/R/bin/R to run your Python or R notebook.