Using divvy

Learning objectives

What is divvy?
How can I use it in the CLI?
How can I use divvy in the python API?
How can I use divvy to submit one-off jobs?

Previously you may have read about using divvy for running job in containers or on clusters.

Let's recap what is divvy and why it exists.

What is divvy?

Divvy is a submodule of looper, and it allows you to populate job submission scripts by integrating job-specific settings with separately configured computing environment settings. Divvy makes software portable, so users may easily toggle among any computing resource (laptop, cluster, cloud).

What makes divvy better?

Tools require a particular compute resource setup. For example, one pipeline requires SLURM, another requires AWS, and yet another just runs directly on your laptop. This makes it difficult to transfer to different environments. For tools that can run in multiple environments, each one must be configured separately.

Instead, divvy-compatible tools can run on any computing resource. Users configure their computing environment once, and all divvy-compatible tools will use this same configuration.

Divvy reads a standard configuration file describing available compute resources and then uses a simple template system to write custom job submission scripts. Computing resources are organized as compute packages, which users select, populate with values, and build scripts for compute jobs.

Quick start with CLI

divvy comes installed with looper, so ensure you have the newest version of looper installed: pip install looper

Let's first use divvy list to show us our available computing packages:

divvy list

CLI output

Using default divvy config. You may specify in env var: ['DIVCFG']
Using divvy config: /home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/looper/default_config/divvy_config.yaml
Available compute packages:

local
slurm
singularity
bulker_local
default
singularity_slurm
docker

Note that divvy will default to a divvy_config.yaml if none is provided via the environment variable DIVCFG. This default config is provided by looper.

We can take at one of the above templates:

 cat /home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/looper/default_config/divvy_templates/slurm_template.sub

CLI output

#!/bin/bash
#SBATCH --job-name='{JOBNAME}'
#SBATCH --output='{LOGFILE}'
#SBATCH --mem='{MEM}'
#SBATCH --cpus-per-task='{CORES}'
#SBATCH --time='{TIME}'
#SBATCH --partition='{PARTITION}'
#SBATCH -m block
#SBATCH --ntasks=1

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{CODE}

Remember that compute packages use nested templates to wrap the pipeline {CODE} provided by looper's pipeline interface. More can be read here: advanced computing

Divvy allows you to write your own submission script via the CLI. However, you will need a way to tell divvy the compute settings. One way is to use an input settings file.

Let's create one now:

touch my_job_settings.yaml

Add these settings:

my_job_settings.yaml

time: 4-0-0
logfile: results.log
cores: 6
partition: large_mem
mem: 16G

Now that you have this setting file you can write a new submission script using divvy write:

divvy write --package slurm  --settings my_job_settings.yaml --compute sample=sample1  --outfile submit_script.sub

This will produce the submit_script.sub file.

cat submit_script.sub

CLI output

#!/bin/bash
#SBATCH --job-name='{JOBNAME}'
#SBATCH --output='results.log'
#SBATCH --mem='16G'
#SBATCH --cpus-per-task='6'
#SBATCH --time='4-0-0'
#SBATCH --partition='large_mem'
#SBATCH -m block
#SBATCH --ntasks=1

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{CODE}

We populated several variables, like {LOGFILE} and {TIME}, from the settings.yaml file. However, the {CODE} and {JOBNAME} variables are still unpopulated, so this submission script is incomplete. To remedy this, we'll use divvy's command-line variable passing: any non-interpreted arguments passed to divvy are assumed to be variables to populate the template. These command-line variables are considered highest priority and so will override any values in the more distant locations. For example:

divvy write --package slurm  --settings my_job_settings.yaml  --compute sample=sample1 code=run-this-cmd jobname=12345  --outfile submit_script.sub

Now if we look at the file:

CLI output

#!/bin/bash
#SBATCH --job-name='12345'
#SBATCH --output='results.log'
#SBATCH --mem='16G'
#SBATCH --cpus-per-task='6'
#SBATCH --time='4-0-0'
#SBATCH --partition='large_mem'
#SBATCH -m block
#SBATCH --ntasks=1

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

run-this-cmd

Now we have a complete script, which we can run with sbatch test.sub. Notice also that the time variable uses the one provided on the CLI rather than the one provided in the settings.yaml file, because the CLI has a higher priority.

Variables can come from these 3 sources, in order of increasing priority: 1) compute package (defined in the divvy configuration file and selected with the -p or --package argument); 2) settings.yaml file, passed with -s or --settings; 3) any additional variables passed on the command line as key-value pairs to -c.

Finally, you can submit these jobs using divvy submit.

divvy submit --package slurm  --settings my_job_settings.yaml  --compute sample=sample1 code=run-this-cmd jobname=12345  --outfile submit_script.sub

Note, slurm requires sbatch to work which will be found HPCs and not my local machine. However, I can use the local template to run jobs locally!

divvy submit --package local  --settings my_job_settings.yaml --compute sample=sample1 code=run-this-cmd jobname=12345  --outfile submit_script.sub

Using the Python API for divvy

You can also call divvy from a python script. You'll need to provide a path to a divvy config file:

divvy_usage.py

import looper.divvy as divvy


config_path = "/home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/looper/default_config/divvy_config.yaml"

# Now create a ComputingConfiguration object
dcc = divvy.ComputingConfiguration.from_yaml_file(filepath=config_path)

Once a ComputingConfiguration object has been created, you can see the default compute package as well as the associated submission script:

sub section of divvy_usage.py

# Print default compute package to terminal
print(dcc.compute)

# Access the submission template associated with the compute package.
with open(dcc.compute['submission_template']) as f:
    print(f.read())

console output

submission_template: /home/drc/PythonProjects/testing_looper_docs_tutorial/looper_tutorial/.venv/lib/python3.10/site-packages/looper/default_config/divvy_templates/localhost_template.sub
submission_command: .

#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{
{CODE}
} | tee {LOGFILE}

You can also populate the submission script via the API:

sub section of divvy_usage.py

# Populate and create a local submission script
dcc.write_script("test_local.sub", {"code": "run-this-command", "logfile": "logfile.txt"})

# Take a look at the newly populated submission template
with open("test_local.sub") as f:
    print(f.read())

console output

Writing script to /home/drc/PythonProjects/testing_looper_docs_tutorial/looper_tutorial/test_local.sub

#!/bin/bash

echo 'Compute node:' `hostname`
echo 'Start time:' `date +'%Y-%m-%d %T'`

{
run-this-command
} | tee logfile.txt

You can also activate a computing package via the API:

sub section of divvy_usage.py

# You can also activate a package
print(dcc.activate_package("slurm"))

console output

Activating compute package 'slurm'

Finally, you can also submit jobs:

sub section of divvy_usage.py

dcc.submit(output_path="test_local.sub")

Here is the final completed script used for this tutorial:

divvy_usage.py

import looper.divvy as divvy

config_path = "/home/drc/GITHUB/looper/master/looper/venv/lib/python3.10/site-packages/looper/default_config/divvy_config.yaml"
dcc = divvy.ComputingConfiguration.from_yaml_file(filepath=config_path)

# Print default compute package to terminal
print(dcc.compute)

# Access the submission template associated with the compute package.
with open(dcc.compute['submission_template']) as f:
    print(f.read())

# Populate and create a local submission script
dcc.write_script("test_local.sub", {"code": "run-this-command", "logfile": "logfile.txt"})

# Take a look at the newly populated submission template
with open("test_local.sub") as f:
    print(f.read())

# You can also activate a package
dcc.activate_package("slurm")

# You can also submit jobs 
dcc.submit(output_path="test_local.sub")