Looper Python API

The looper Python API provides classes for managing pipeline submissions and compute configurations.

Project

The main class for working with looper projects. Extends peppy's Project with pipeline submission capabilities.

Project

Project(cfg=None, amendments=None, divcfg_path=None, **kwargs)

Bases: peppyProject

Looper-specific Project.

:param str cfg: path to configuration file with data from which Project is to be built :param Iterable[str] amendments: name indicating amendment to use, optional :param str divcfg_path: path to an environment configuration YAML file specifying compute settings. :param bool permissive: Whether a error should be thrown if a sample input file(s) do not exist or cannot be open. :param str compute_env_file: Environment configuration YAML file specifying compute settings.

cli_pifaces `property`

cli_pifaces

Collection of pipeline interface sources specified in object constructor

:return list[str]: collection of pipeline interface sources

output_dir `property`

output_dir

Output directory for the project, specified in object constructor

:return str: path to the output directory

pipeline_interfaces `cached` `property`

pipeline_interfaces

Flat list of all valid interface objects associated with this Project

Note that only valid pipeline interfaces will show up in the result (ones that exist on disk/remotely and validate successfully against the schema)

:return list[looper.PipelineInterface]: list of pipeline interfaces

results_folder `property`

results_folder

Path to the results folder for the project

:return str: path to the results folder in the output folder

selected_compute_package `property`

selected_compute_package

Compute package name specified in object constructor

:return str: compute package name

submission_folder `property`

submission_folder

Path to the submission folder for the project

:return str: path to the submission in the output folder

get_sample_piface

get_sample_piface(sample_name)

Get a list of pipeline interfaces associated with the specified sample.

Note that only valid pipeline interfaces will show up in the result (ones that exist on disk/remotely and validate successfully against the schema)

:param str sample_name: name of the sample to retrieve list of pipeline interfaces for :return list[looper.PipelineInterface]: collection of valid pipeline interfaces associated with selected sample

get_schemas `staticmethod`

get_schemas(pifaces, schema_key=INPUT_SCHEMA_KEY)

Get the list of unique schema paths for a list of pipeline interfaces

:param str | Iterable[str] pifaces: pipeline interfaces to search schemas for :param str schema_key: where to look for schemas in the piface :return Iterable[str]: unique list of schema file paths

make_project_dirs

make_project_dirs()

Create project directory structure if it doesn't exist.

populate_pipeline_outputs

populate_pipeline_outputs()

Populate project and sample output attributes based on output schemas that pipeline interfaces point to.

set_sample_piface

set_sample_piface(sample_piface)

Add sample pipeline interfaces variable to object

:param list | str sample_piface: sample pipeline interface

PipelineInterface

Parses and holds information from a pipeline interface YAML file, including resource specifications and command templates.

PipelineInterface

PipelineInterface(config, pipeline_type=None)

Bases: YAMLConfigManager

This class parses, holds, and returns information for a yaml file that specifies how to interact with each individual pipeline. This includes both resources to request for cluster job submission, as well as arguments to be passed from the sample annotation metadata to the pipeline

:param str | Mapping config: path to file from which to parse configuration data, or pre-parsed configuration data. :param str pipeline_type: type of the pipeline, must be either 'sample' or 'project'.

pipeline_name `property`

pipeline_name

choose_resource_package

choose_resource_package(namespaces, file_size)

Select resource bundle for given input file size to given pipeline.

:param float file_size: Size of input data (in gigabytes). :param Mapping[Mapping[str]] namespaces: namespaced variables to pass as a context for fluid attributes command rendering :return MutableMapping: resource bundle appropriate for given pipeline, for given input file size :raises ValueError: if indicated file size is negative, or if the file size value specified for any resource package is negative :raises InvalidResourceSpecificationException: if no default resource package specification is provided

get_pipeline_schemas

get_pipeline_schemas(schema_key=INPUT_SCHEMA_KEY)

Get path to the pipeline schema.

:param str schema_key: where to look for schemas in the pipeline iface :return str: absolute path to the pipeline schema file

render_var_templates

render_var_templates(namespaces)

Render path templates under 'var_templates' in this pipeline interface.

:param dict namespaces: namespaces to use for rendering

SubmissionConductor

Collects and submits pipeline jobs. Manages job pooling based on file size or command count limits.

SubmissionConductor

SubmissionConductor(pipeline_interface, prj, delay=0, extra_args=None, extra_args_override=None, ignore_flags=False, compute_variables=None, max_cmds=None, max_size=None, max_jobs=None, automatic=True, collate=False)

Bases: object

Collects and then submits pipeline jobs.

This class holds a 'pool' of commands to submit as a single cluster job. Eager to submit a job, each instance's collection of commands expands until it reaches the 'pool' has been filled, and it's therefore time to submit the job. The pool fills as soon as a fill criteria has been reached, which can be either total input file size or the number of individual commands.

Create a job submission manager.

The most critical inputs are the pipeline interface and the pipeline key, which together determine which provide critical pipeline information like resource allocation packages and which pipeline will be overseen by this instance, respectively.

:param PipelineInterface pipeline_interface: Collection of important data for one or more pipelines, like resource allocation packages and option/argument specifications :param prj: Project with which each sample being considered is associated (what generated each sample) :param float delay: Time (in seconds) to wait before submitting a job once it's ready :param str extra_args: string to pass to each job generated, for example additional pipeline arguments :param str extra_args_override: string to pass to each job generated, for example additional pipeline arguments. This deactivates the 'extra' functionality that appends strings defined in Sample.command_extra and Project.looper.command_extra to the command template. :param bool ignore_flags: Whether to ignore flag files present in the sample folder for each sample considered for submission :param dict[str] compute_variables: A dict with variables that will be made available to the compute package. For example, this should include the name of the cluster partition to which job or jobs will be submitted :param int | NoneType max_cmds: Upper bound on number of commands to include in a single job script. :param int | float | NoneType max_size: Upper bound on total file size of inputs used by the commands lumped into single job script. :param int | float | NoneType max_jobs: Upper bound on total number of jobs to group samples for submission. :param bool automatic: Whether the submission should be automatic once the pool reaches capacity. :param bool collate: Whether a collate job is to be submitted (runs on the project level, rather that on the sample level)

failed_samples `property`

failed_samples

num_cmd_submissions `property`

num_cmd_submissions

Return the number of commands that this conductor has submitted.

:return int: Number of commands submitted so far.

num_job_submissions `property`

num_job_submissions

Return the number of jobs that this conductor has submitted.

:return int: Number of jobs submitted so far.

add_sample

add_sample(sample, rerun=False)

Add a sample for submission to this conductor.

:param peppy.Sample sample: sample to be included with this conductor's currently growing collection of command submissions :param bool rerun: whether the given sample is being rerun rather than run for the first time :return bool: Indication of whether the given sample was added to the current 'pool.' :raise TypeError: If sample subtype is provided but does not extend the base Sample class, raise a TypeError.

is_project_submittable

is_project_submittable(force=False)

Check whether the current project has been already submitted

:param bool frorce: whether to force the project submission (ignore status/flags)

submit

submit(force=False)

Submit one or more commands as a job.

This call will submit the commands corresponding to the current pool of samples if and only if the argument to 'force' evaluates to a true value, or the pool of samples is full.

:param bool force: Whether submission should be done/simulated even if this conductor's pool isn't full. :return bool: Whether a job was submitted (or would've been if not for dry run)

write_script

write_script(pool, size)

Create the script for job submission.

:param Iterable[peppy.Sample] pool: collection of sample instances :param float size: cumulative size of the given pool :return str: Path to the job submission script created.

ComputingConfiguration

Manages compute environment settings from divvy configuration files. Handles resource packages and submission templates.

ComputingConfiguration

ComputingConfiguration(entries=None, wait_max=None, strict_ro_locks=False, schema_source=None, validate_on_write=False)

Bases: FutureYAMLConfigManager

Represents computing configuration objects.

The ComputingConfiguration class provides a computing configuration object that is an in memory representation of a divvy computing configuration file. This object has various functions to allow a user to activate, modify, and retrieve computing configuration files, and use these values to populate job submission script templates.

:param str | Iterable[(str, object)] | Mapping[str, object] entries: config Collection of key-value pairs. :param str filepath: YAML file specifying computing package data. (the DIVCFG file)

default_config_file `property`

default_config_file

Path to default compute environment settings file.

:return str: Path to default compute settings file

templates_folder `property`

templates_folder

Path to folder with default submission templates.

:return str: path to folder with default submission templates

activate_package

activate_package(package_name)

Activates a compute package.

This copies the computing attributes from the configuration file into the compute attribute, where the class stores current compute settings.

:param str package_name: name for non-resource compute bundle, the name of a subsection in an environment configuration file :return bool: success flag for attempt to establish compute settings

clean_start

clean_start(package_name)

Clear current active settings and then activate the given package.

:param str package_name: name of the resource package to activate :return bool: success flag

get_active_package

get_active_package()

Returns settings for the currently active compute package

:return YAMLConfigManager: data defining the active compute package

get_adapters

get_adapters()

Get current adapters, if defined.

Adapters are sourced from the 'adapters' section in the root of the divvy configuration file and updated with an active compute package-specific set of adapters, if any defined in 'adapters' section under currently active compute package.

:return YAMLConfigManager: current adapters mapping

list_compute_packages

list_compute_packages()

Returns a list of available compute packages.

:return set[str]: names of available compute packages

reset_active_settings

reset_active_settings()

Clear out current compute settings.

:return bool: success flag

template

template()

Get the currently active submission template.

:return str: submission script content template for current state

update_packages

update_packages(config_file)

Parse data from divvy configuration file.

Given a divvy configuration file, this function will update (not overwrite) existing compute packages with existing values. It does not affect any currently active settings.

:param str config_file: path to file with new divvy configuration data

write_script

write_script(output_path, extra_vars=None)

Given currently active settings, populate the active template to write a submission script. Additionally use the current adapters to adjust the select of the provided variables

:param str output_path: Path to file to write as submission script :param Iterable[Mapping] extra_vars: A list of Dict objects with key-value pairs with which to populate template fields. These will override any values in the currently active compute package. :return str: Path to the submission script file

Utility Functions

select_divvy_config

select_divvy_config(filepath)

Selects the divvy config file path to load.

This uses a priority ordering to first choose a config file path if it's given, but if not, then look in a priority list of environment variables and choose the first available file path to return. If none of these options succeed, the default config path will be returned.

:param str | NoneType filepath: direct file path specification :return str: path to the config file to read

Looper Python API

Project

Project

cli_pifaces property

output_dir property

pipeline_interfaces cached property

results_folder property

selected_compute_package property

submission_folder property

get_sample_piface

get_schemas staticmethod

make_project_dirs

populate_pipeline_outputs

set_sample_piface

PipelineInterface

PipelineInterface

pipeline_name property

choose_resource_package

get_pipeline_schemas

render_var_templates

SubmissionConductor

SubmissionConductor

failed_samples property

num_cmd_submissions property

num_job_submissions property

add_sample

is_project_submittable

submit

write_script

ComputingConfiguration

ComputingConfiguration

default_config_file property

templates_folder property

activate_package

clean_start

get_active_package

get_adapters

list_compute_packages

reset_active_settings

template

update_packages

write_script

Utility Functions

select_divvy_config

select_divvy_config

cli_pifaces `property`

output_dir `property`

pipeline_interfaces `cached` `property`

results_folder `property`

selected_compute_package `property`

submission_folder `property`

get_schemas `staticmethod`

pipeline_name `property`

failed_samples `property`

num_cmd_submissions `property`

num_job_submissions `property`

default_config_file `property`

templates_folder `property`