Hello World! example for looper

This tutorial demonstrates how to install looper and use it to run a pipeline on a PEP project.

1. Install the latest version of looper:

pip install --user --upgrade looper

2. Download and unzip the hello_looper repository

The hello looper repository contains a basic functional example project (in /project) and a looper-compatible pipeline (in /pipeline) that can run on that project. Let's download and unzip it:

!wget https://github.com/pepkit/hello_looper/archive/refs/heads/master.zip

--2024-06-05 09:33:27--  https://github.com/pepkit/hello_looper/archive/refs/heads/master.zip
Resolving github.com (github.com)... 140.82.112.3
Connecting to github.com (github.com)|140.82.112.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master [following]
--2024-06-05 09:33:27--  https://codeload.github.com/pepkit/hello_looper/zip/refs/heads/master
Resolving codeload.github.com (codeload.github.com)... 140.82.112.10
Connecting to codeload.github.com (codeload.github.com)|140.82.112.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘master.zip’

master.zip              [ <=>                ]  22.79K  --.-KB/s    in 0.003s

2024-06-05 09:33:28 (6.66 MB/s) - ‘master.zip’ saved [23340]

!unzip master.zip

Archive:  master.zip
95c790f9b17e66a24e3e579c0ebc05a955b5c1d0
   creating: hello_looper-master/
  inflating: hello_looper-master/.gitignore  
  inflating: hello_looper-master/.looper.yaml  
  inflating: hello_looper-master/README.md  
   creating: hello_looper-master/advanced/
  inflating: hello_looper-master/advanced/.looper.yaml  
  inflating: hello_looper-master/advanced/.looper_advanced_pipestat.yaml  
   creating: hello_looper-master/advanced/pipeline/
  inflating: hello_looper-master/advanced/pipeline/output_schema.yaml  
  inflating: hello_looper-master/advanced/pipeline/pipeline_interface1_project.yaml  
  inflating: hello_looper-master/advanced/pipeline/pipeline_interface1_sample.yaml  
  inflating: hello_looper-master/advanced/pipeline/pipeline_interface2_project.yaml  
  inflating: hello_looper-master/advanced/pipeline/pipeline_interface2_sample.yaml  
  inflating: hello_looper-master/advanced/pipeline/pipestat_output_schema.yaml  
  inflating: hello_looper-master/advanced/pipeline/pipestat_pipeline_interface1_sample.yaml  
  inflating: hello_looper-master/advanced/pipeline/pipestat_pipeline_interface2_sample.yaml  
  inflating: hello_looper-master/advanced/pipeline/readData.R  
  inflating: hello_looper-master/advanced/pipeline/resources-project.tsv  
  inflating: hello_looper-master/advanced/pipeline/resources-sample.tsv  
   creating: hello_looper-master/advanced/project/
  inflating: hello_looper-master/advanced/project/annotation_sheet.csv  
  inflating: hello_looper-master/advanced/project/project_config.yaml  
   creating: hello_looper-master/csv/
  inflating: hello_looper-master/csv/.looper.yaml  
   creating: hello_looper-master/csv/data/
  inflating: hello_looper-master/csv/data/frog1_data.txt  
  inflating: hello_looper-master/csv/data/frog2_data.txt  
   creating: hello_looper-master/csv/pipeline/
  inflating: hello_looper-master/csv/pipeline/count_lines.sh  
  inflating: hello_looper-master/csv/pipeline/pipeline_interface.yaml  
  inflating: hello_looper-master/csv/pipeline/pipeline_interface_project.yaml  
   creating: hello_looper-master/csv/project/
  inflating: hello_looper-master/csv/project/sample_annotation.csv  
   creating: hello_looper-master/intermediate/
  inflating: hello_looper-master/intermediate/.looper.yaml  
  inflating: hello_looper-master/intermediate/.looper_project.yaml  
   creating: hello_looper-master/intermediate/data/
  inflating: hello_looper-master/intermediate/data/frog_1.txt  
  inflating: hello_looper-master/intermediate/data/frog_2.txt  
   creating: hello_looper-master/intermediate/pipeline/
  inflating: hello_looper-master/intermediate/pipeline/count_lines.sh  
  inflating: hello_looper-master/intermediate/pipeline/pipeline_interface.yaml  
  inflating: hello_looper-master/intermediate/pipeline/pipeline_interface_project.yaml  
   creating: hello_looper-master/intermediate/project/
  inflating: hello_looper-master/intermediate/project/project_config.yaml  
  inflating: hello_looper-master/intermediate/project/sample_annotation.csv  
   creating: hello_looper-master/minimal/
  inflating: hello_looper-master/minimal/.looper.yaml  
   creating: hello_looper-master/minimal/data/
  inflating: hello_looper-master/minimal/data/frog_1.txt  
  inflating: hello_looper-master/minimal/data/frog_2.txt  
   creating: hello_looper-master/minimal/pipeline/
  inflating: hello_looper-master/minimal/pipeline/count_lines.sh  
  inflating: hello_looper-master/minimal/pipeline/pipeline_interface.yaml  
   creating: hello_looper-master/minimal/project/
  inflating: hello_looper-master/minimal/project/project_config.yaml  
  inflating: hello_looper-master/minimal/project/sample_annotation.csv  
   creating: hello_looper-master/pephub/
  inflating: hello_looper-master/pephub/.looper.yaml  
   creating: hello_looper-master/pephub/data/
  inflating: hello_looper-master/pephub/data/frog1_data.txt  
  inflating: hello_looper-master/pephub/data/frog2_data.txt  
   creating: hello_looper-master/pephub/pipeline/
  inflating: hello_looper-master/pephub/pipeline/count_lines.sh  
  inflating: hello_looper-master/pephub/pipeline/pipeline_interface.yaml  
  inflating: hello_looper-master/pephub/pipeline/pipeline_interface_project.yaml  
   creating: hello_looper-master/pipestat/
  inflating: hello_looper-master/pipestat/.looper.yaml  
  inflating: hello_looper-master/pipestat/.looper_pipestat_shell.yaml  
   creating: hello_looper-master/pipestat/data/
  inflating: hello_looper-master/pipestat/data/frog_1.txt  
  inflating: hello_looper-master/pipestat/data/frog_2.txt  
   creating: hello_looper-master/pipestat/pipeline_pipestat/
  inflating: hello_looper-master/pipestat/pipeline_pipestat/count_lines.py  
  inflating: hello_looper-master/pipestat/pipeline_pipestat/count_lines_pipestat.sh  
  inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface.yaml  
  inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface_project.yaml  
  inflating: hello_looper-master/pipestat/pipeline_pipestat/pipeline_interface_shell.yaml  
  inflating: hello_looper-master/pipestat/pipeline_pipestat/pipestat_output_schema.yaml  
   creating: hello_looper-master/pipestat/project/
  inflating: hello_looper-master/pipestat/project/project_config.yaml  
  inflating: hello_looper-master/pipestat/project/sample_annotation.csv

3. Run it

Run it by changing to the directory and then invoking looper run on the project configuration file.

cd hello_looper-master/minimal

/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal


/home/drc/GITHUB/pepspec/.venv/lib/python3.10/site-packages/IPython/core/magics/osm.py:417: UserWarning: This is now an optional IPython functionality, setting dhist requires you to install the `pickleshare` library.
  self.shell.db['dhist'] = compress_dhist(dhist)[-100:]

!looper run --looper-config .looper.yaml

Looper version: 1.8.1a1
Command: run
Using default divvy config. You may specify in env var: ['DIVCFG']
[36m## [1 of 2] sample: frog_1; pipeline: count_lines[0m
Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_1.sub
Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_1.sub
Compute node: databio
Start time: 2024-06-05 09:37:34
Number of lines: 4
[36m## [2 of 2] sample: frog_2; pipeline: count_lines[0m
Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_2.sub
Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/minimal/results/submission/count_lines_frog_2.sub
Compute node: databio
Start time: 2024-06-05 09:37:34
Number of lines: 7

Looper finished
Samples valid for job generation: 2 of 2
{'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2}
[0m

Voila! You've run your very first pipeline across multiple samples using looper!

Exploring the results

Now, let's inspect the hello_looper repository you downloaded. It has 3 components, each in a subfolder:

!tree *

[01;34mdata[0m
├── frog_1.txt
└── frog_2.txt
[01;34mpipeline[0m
├── [01;32mcount_lines.sh[0m
└── pipeline_interface.yaml
[01;34mproject[0m
├── project_config.yaml
└── sample_annotation.csv
[01;34mresults[0m
└── [01;34msubmission[0m
    ├── count_lines_frog_1.log
    ├── count_lines_frog_1.sub
    ├── count_lines_frog_2.log
    └── count_lines_frog_2.sub

1 directory, 4 files

These are:

/data -- contains 2 data files for 2 samples. These input files were each passed to the pipeline.
/pipeline -- contains the script we want to run on each sample in our project. Our pipeline is a very simple shell script named count_lines.sh, which (duh!) counts the number of lines in an input file.
/project -- contains 2 files that describe metadata for the project (project_config.yaml) and the samples (sample_annotation.csv). This particular project describes just two samples listed in the annotation file. These files together make up a PEP-formatted project, and can therefore be read by any PEP-compatible tool, including looper.

When we invoke looper from the command line we told it to run project/project_config.yaml. looper reads the project/project_config.yaml file, which points to a few things:

the project/sample_annotation.csv file, which specifies a few samples, their type, and path to data file
the output_dir, which is where looper results are saved.
the pipeline_interface.yaml file, (pipeline/pipeline_interface.yaml), which tells looper how to connect to the pipeline (pipeline/count_lines.sh).

The 3 folders (data, project, and pipeline) are modular; there is no need for these to live in any predetermined folder structure. For this example, the data and pipeline are included locally, but in practice, they are usually in a separate folder; you can point to anything (so data, pipelines, and projects may reside in distinct spaces on disk). You may also include more than one pipeline interface in your project_config.yaml, so in a looper project, many-to-many relationships are possible.

Looper config

The looper config contains paths to the project config, the output_dir as well as any defined pipeline interfaces.

!cat .looper.yaml

pep_config: project/project_config.yaml # local path to pep config
# pep_config: pepkit/hello_looper:default  # you can also use a pephub registry path
output_dir: "results"
pipeline_interfaces:
  sample: pipeline/pipeline_interface.yaml

Project Config

The project config file contains the PEP version and sample annotation sheet. (see defining a project).

!cat project/project_config.yaml

pep_version: 2.0.0
sample_table: sample_annotation.csv

Pipeline Interface

The pipeline interface shows the pipeline_name, pipeline_type, as well as the var_templates and command_templates used for this pipeline.

!cat pipeline/pipeline_interface.yaml

pipeline_name: count_lines
pipeline_type: sample
var_templates:
  pipeline: '{looper.piface_dir}/count_lines.sh'
command_template: >
  {pipeline.var_templates.pipeline} {sample.file}

Alright, next let's explore what this pipeline stuck into our output_dir:

!tree results/

[01;34mresults/[0m
└── [01;34msubmission[0m
    ├── count_lines_frog_1.log
    ├── count_lines_frog_1.sub
    ├── count_lines_frog_2.log
    └── count_lines_frog_2.sub

1 directory, 4 files

Inside of an output_dir there will be one to two directories:

submissions - which holds a YAML representation of each sample and a log file for each submitted job
results_pipeline - a directory with output of the pipeline(s) (if applicable), for each sample/pipeline combination (often one per sample). In this minimal example, the output was simply printed to terminal instead of producing files.

From here to running hundreds of samples of various sample types is virtually the same effort!

Running PEPs from PEPHub

Looper also supports running a PEP from PEPHub!

cd ..

/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master

cd pephub

/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub

!cat .looper.yaml

pep_config: pepkit/hello_looper:default # pephub registry path or local path
output_dir: results
pipeline_interfaces:
  sample: pipeline/pipeline_interface.yaml

!looper run --looper-config .looper.yaml

Looper version: 1.8.1a1
Command: run
Using default divvy config. You may specify in env var: ['DIVCFG']
No config key in Project, or reading project from dict
[36m## [1 of 2] sample: frog_1; pipeline: count_lines[0m
Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_1.sub
Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_1.sub
Compute node: databio
Start time: 2024-06-05 09:46:04
Number of lines: 4
[36m## [2 of 2] sample: frog_2; pipeline: count_lines[0m
Writing script to /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_2.sub
Job script (n=1; 0.00Gb): /home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pephub/results/submission/count_lines_frog_2.sub
Compute node: databio
Start time: 2024-06-05 09:46:04
Number of lines: 7

Looper finished
Samples valid for job generation: 2 of 2
{'Pipestat compatible': False, 'Commands submitted': '2 of 2', 'Jobs submitted': 2}
[0m

Pipestat compatible configurations

Looper can also be used in tandem with pipestat to report pipeline results.

cd ..

/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master

cd pipestat

/home/drc/GITHUB/pepspec/docs/looper/notebooks/hello_looper-master/pipestat

!cat .looper.yaml

pep_config: ./project/project_config.yaml # pephub registry path or local path
output_dir: ./results
pipeline_interfaces:
  sample:  ./pipeline_pipestat/pipeline_interface.yaml
  project: ./pipeline_pipestat/pipeline_interface_project.yaml
pipestat:
  results_file_path: results.yaml
  flag_file_dir: results/flags

A few more basic looper options

Looper also provides a few other simple arguments that let you adjust what it does. You can find a complete reference of usage in the docs. Here are a few of the more common options:

For looper run:

-d: Dry run mode (creates submission scripts, but does not execute them)
--limit: Only run a few samples
--lumpn: Run several commands together as a single job. This is useful when you have a quick pipeline to run on many samples and want to group them.

There are also other commands:

looper check: checks on the status (running, failed, completed) of your jobs (requires pipestat)
looper summarize: produces an output file that summarizes your project results (requires pipestat)
looper destroy: completely erases all results so you can restart
looper rerun: rerun only jobs that have failed.

On your own

To use looper on your own, you will need to prepare 2 things: a project (metadata that define what you want to process), and pipelines (how to process data). To link your project to looper, you will need to define a project. You will want to either use pre-made looper-compatible pipelines or link your own custom-built pipelines. These docs will also show you how to connect your pipeline to your project.