What is looper?
Looper is a job submitting engine. Looper deploys arbitrary shell commands for each sample in a standard PEP project. You can think of looper as providing a single user interface to running, monitoring, and managing all of your sample-intensive research projects the same way, regardless of data type or pipeline used.
What makes looper better?
Looper decouples job handling from the pipeline process. In a typical pipeline, job handling (managing how individual jobs are submitted to a cluster) is delicately intertwined with actual pipeline commands (running the actual code for a single compute job). In contrast, the looper approach is modular: looper only manages job submission. This approach leads to several advantages compared with the traditional integrated approach:
- pipelines do not need to independently re-implement job handling code, which is shared.
- every project uses a universal structure, so datasets can move from one pipeline to another.
- users must learn only a single interface that works with any project for any pipeline.
- running just one or two samples/jobs is simpler, and does not require a distributed compute environment.
Features at-a-glance
Modular approach to job handling. Looper completely divides job handling from pipeline processing. Now, pipelines no longer need to worry about sample metadata parsing.
The power of standard PEP format. Looper
inherits benefits of PEP format: For example, you only need to learn 1 way to format your project metadata, and it will work with any pipeline. PEP provides subprojects, programmatic modifiers, loading in R with pepr or Python with peppy.
Universal parallelization implementation. Looper's sample-level parallelization applies to all pipelines, so individual pipelines do not have to re-implement parallelization by sample. Looper
also handles cluster submission on any cluster resource manager (SLURM, SGE, etc.), so your pipeline doesn't need to manage that, either.
Flexible pipelines. Use looper with any pipeline, any library, in any domain. As long as you can run your pipeline with a shell command, looper can run it for you.
Job completion monitoring. Looper is job-aware and will not submit new jobs for samples that are already running or finished, making it easy to add new samples to existing projects, or re-run failed samples.
Flexible resources. Looper has an easy-to-use resource requesting scheme. With a few lines to define CPU, memory, clock time, or anything else, pipeline authors can specify different computational resources depending on the size of the input sample and pipeline to run. Or, just use a default if you don't want to mess with setup.
Command line interface. Looper uses a command-line interface so you have total power at your fingertips.
Beautiful linked result reports. Looper automatically creates an internally linked, portable HTML report highlighting all results for your pipeline, for every pipeline. For an html report example see: PEPATAC Gold Summary