How to create docker and singularity images
General organization¶
-
All Docker and Singularity images should be built off a
Dockerfile
file that lives in its own public GitHub repository in the rasilab GitHub organization account. -
All Docker images should be publicly available in our lab's GitHub container repository.
-
For routine analyses and writing, use our lab's default R, Python, and Latex images. These images should contain commonly used libraries in our lab, as you can see in the
Dockerfile
of each image. -
If you require additional R, Python, or Latex libraries not present in the above default images, build child images based off the above base images. For example, see the MaGeCK package built off our default Python package.
-
If you use a R, Python, or Latex library quite often and you notice that others in the lab are also using it, then incorporate it in the lab's default image while updating the repository and package tag.
-
If you need standalone bioinformatic packages, derive them off the Miniconda3 4.12.0 base image. For example, see the Samtools package in our lab's package repository. You can also use these images as basis for other packages. For example, see the Bowtie2 package in our lab's package repository.
Steps to build and tag Docker images¶
To create or update a Dockerfile
and the associated image
, follow these steps:
-
Edit the
Dockerfile
inVScode
to add or remove software. Always make modifications at the end of theDockerfile
so that only few intermediate images need to be re-created. This decreases the space and time requirements for building and pushing the image. -
If you are creating a new GitHub repository for a new package, use a Dockerfile and README template from a previous package GitHub repository, for eg. MaGeCK. Make sure to set the repository name and description appropriately, and update its visibility to public.
-
Build the Docker image from the folder that contains the
Dockerfile
. Give a specific tag (see below for tag convention). -
Assuming the above build command is successful, first commit and tag the
Dockerfile
. You might have to set the remote if you created a new GitHub repository usinggit remote set-url origin <url>
. GitHub will prompt you for this when you create -
We use the same tag for the GitHub repository containing the
Dockerfile
and the correspondingimage
. Both tags follow the semantic versioning convention. In practice, this means:-
Use tags of the form
X.Y.Z
. Start with0
forX
and1
forY
and0
forZ
. -
If your package is for a single software package (eg. MaGeCK), then use the same version tag as the software package version you are installing. For example, see our lab's MaGeCK image.
-
If you fix an error in a Dockerfile/image that contains several different packages (eg. our lab's Python image), increment the third digit
Z
. -
If you add an extra package that will not break compatibility of the Docker image with previous versions, increment the second digit
Y
. -
If you make any incompatible changes, increment the first digit
X
(avoid this as much as possible).
-
-
Push the Docker image to the GitHub container registry. The first time you do this from a computer, you will have to set up a Personal Access Token (classic version) as explained here. Make sure to give the
write:packages
permission when you create the token. -
You can pull these images as
Singularity
images using the following command (used on Fred Hutch cluster): -
If you are pulling Singularity images that are not public (avoid this as much as possible), then you need to authenticate with GitHub packages using a personal access token. To do this create a GitHub Personal Access Token and save it in your
~/.bash_profile
as follows: -
Note that
Docker
is not allowed on the Fred Hutch cluster. You can useSingularity
images instead.
How to use images locally¶
-
You can pull images to your local machine by:
-
You can create a named container from the above image by:
-
You can use the container to run a command. For example,
-
You can also use the containers in a VSCode
.devcontainer.json
file (for eg. to run Jupyter Notebooks):
How to use our lab's Singularity containers in Snakemake workflows on the Fred Hutch cluster¶
We often use our lab's standard Docker containers for running specific Snakemake rules on the Fred Hutch cluster. If you use a specification of the form singularity: "docker://ghcr.io/rasilab/r:1.0.0"
within a Snakemake rule, then this Docker container will be pulled to your local directory, converted to a Singularity container, and stored at .snakemake/singularity
in your current working directory. However, it takes a long time to download and convert each Docker container and each conntainer also uses up several GB of space. To avoid this, Rasi has stored a copy of all commonly used Singularity containers in our lab at /fh/scratch/delete90/subramaniam_a/user/rasi/singularity
. You can symbolically link this folder to your workflow by executing the following from the folder that contains your Snakemake file:
This will make all common singularity containers (files ending with .simg
) immediately available to your Snakemake workflow. Note that Snakemake names the Singularity containers based on their SHA IDs.
How to use the Singularity container for interactive data analysis in R and Python¶
Steps on the remote machine (for example, Fred Hutch rhino
cluster).¶
- On a terminal, login into
rhino
-
Make sure that any
conda
initialization is commented out in your.bashrc
or.bash_profile
file on the remote machine. This step is important. Otherwise, VScode will not recognize theconda
environments within the Singularity container. -
Do the remote operations below from within a
tmux
session so that you can detach and logout of your remote session and still keep the container running. -
Start a tmux session:
- Make Singularity available:
- Pull the Singularity container from the Subramaniam lab GitHub Packages Repo:
```bash cd /fh/scratch/delete90/subramaniam_a/user/rasi/singularity/ singularity exec -B /fh -B /hpc r_python_1.3.0.sif /bin/bash
- Start a VScode CLI tunnel from within the container:
Steps on local machine (for example, your lab desktop computer)¶
-
On VScode, open a remote window
- Search for the command
Remote-Tunnels: Connect to Tunnel:
- Alternatly, the bottom left-hand corner with
><
symbol forOpen Remote Window
- Search for the command
-
Use the
GitHub
account to start the tunnel -
Select the active tunnel
-
You can now open any folder on the remote machine and create a Jupyter notebook.
- You should be able to pick the Python interpreter at
/opt/conda/bin/python
or the Jupyter R kernel at/opt/conda/envs/R/lib/R/bin/R
to run your Python or R notebook.