How to run jobs in containers

Because looper uses divvy for computing configuration, running jobs in containers is easy! Divvy can use the same template system to do either cluster computing or to run jobs in linux containers (for example, using docker or singularity). The way to do this is similar to how you would run jobs on a cluster, but you use templates that run jobs in containers. You can even run jobs in a container on a cluster.

Learning objectives

How does looper interact with linux containers?
How do I use looper to run a job in a container?

1. Get your container image.

This could be a docker image (hosted on dockerhub), which you would download via docker pull, or it could be a singularity image you have saved in a local folder. This is pipeline-specific, and you'll need to download the image recommended by the authors of the pipeline or pipelines you want to run.

2. Specify the image in your `pipeline_interface`

The pipeline_interface.yaml file will need a compute section for each pipeline that can be run in a container, specifying the image. For example:

pipeline_interface.yaml

compute:
  singularity_image: ${SIMAGES}myimage
  docker_image: databio/myimage

For singularity images, you just need to make sure that the images indicated in the pipeline_interface are available in those locations on your system. For docker, make sure you have the docker images pulled.

3. Add a new compute package for containerized jobs

You will just need to add a new compute package that uses. Here's an example of how to add one for using singularity in a SLURM environment:

compute_packages:
  default:
    submission_template: divvy_templates/local_template.sub
    submission_command: sh
  singularity_slurm:
    submission_template: templates/slurm_singularity_template.sub
    submission_command: sbatch
    singularity_args: -B /sfs/lustre:/sfs/lustre,/nm/t1:/nm/t1

In singularity_args you'll need to pass any mounts or other settings to be passed to singularity. The actual slurm_singularity_template.sub file looks something like this:

slurm_singularity_template.sub

#!/bin/bash
#SBATCH --job-name='{JOBNAME}'
#SBATCH --output='{LOGFILE}'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='{PARTITION}'
#SBATCH -m block
#SBATCH --ntasks=1

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

singularity instance.start {SINGULARITY_ARGS} {SINGULARITY_IMAGE} {JOBNAME}_image
srun singularity exec instance://{JOBNAME}_image {CODE}

singularity instance.stop {JOBNAME}_image

Notice how these values will be used to populate a template that will run the pipeline in a container. Now, submit a job that will use singularity, you activate this compute package with looper run --package singularity_slurm.

Here's another example divvy config file that adds a compute package to run jobs in docker on a local computer:

divvy_config.yaml

compute:
  default:
    submission_template: templates/localhost_template.sub
    submission_command: sh
  singularity:
    submission_template: templates/localhost_singularity_template.sub
    submission_command: sh
    singularity_args: --bind /ext:/ext
  docker:
    submission_template: templates/localhost_docker_template.sub
    submission_command: sh
    docker_args: |
      --user=$(id -u) \
      --env="DISPLAY" \
      --volume ${HOME}:${HOME} \
      --volume="/etc/group:/etc/group:ro" \
      --volume="/etc/passwd:/etc/passwd:ro" \
      --volume="/etc/shadow:/etc/shadow:ro"  \
      --volume="/etc/sudoers.d:/etc/sudoers.d:ro" \
      --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
      --workdir="`pwd`" \

Notice the --volume arguments, which mount disk volumes from the host into the container. This should work out of the box for most docker users.

Looper and bulker

Many other pipeline frameworks wrap containers into the pipeline. Looper avoids this approach because it tightly couples the computing environment to the pipeline. Instead, Looper's approach is to level these things separated. Specifying a container in a submission template, as described here, works well for pipelines that use monolithic containers -- that is, there is one giant container, and the whole pipeline runs in that container. An alternative approach is to have a pipeline submit container jobs for each individual task. This approach allows you to use smaller, lightweight containers with only one tool, which makes them more reusable. If you want to use this modular approach, you might be interested in bulker, the computing environment manager in PEPkit.

Summary

To looper, containers are just another computing environment setting. You can use the submission template to wrap the command in whatever boilerplate you want, which can be used either for cluster submission or for containers.

How to run jobs in containers

1. Get your container image.

2. Specify the image in your pipeline_interface

3. Add a new compute package for containerized jobs

2. Specify the image in your `pipeline_interface`