Looper Python API
The looper Python API provides classes for managing pipeline submissions and compute configurations.
Project
The main class for working with looper projects. Extends peppy's Project with pipeline submission capabilities.
Project
Project(cfg=None, amendments=None, divcfg_path=None, **kwargs)
Bases: peppyProject
Looper-specific Project.
:param str cfg: path to configuration file with data from which Project is to be built :param Iterable[str] amendments: name indicating amendment to use, optional :param str divcfg_path: path to an environment configuration YAML file specifying compute settings. :param bool permissive: Whether a error should be thrown if a sample input file(s) do not exist or cannot be open. :param str compute_env_file: Environment configuration YAML file specifying compute settings.
cli_pifaces
property
cli_pifaces
Collection of pipeline interface sources specified in object constructor
:return list[str]: collection of pipeline interface sources
output_dir
property
output_dir
Output directory for the project, specified in object constructor
:return str: path to the output directory
pipeline_interfaces
cached
property
pipeline_interfaces
Flat list of all valid interface objects associated with this Project
Note that only valid pipeline interfaces will show up in the result (ones that exist on disk/remotely and validate successfully against the schema)
:return list[looper.PipelineInterface]: list of pipeline interfaces
results_folder
property
results_folder
Path to the results folder for the project
:return str: path to the results folder in the output folder
selected_compute_package
property
selected_compute_package
Compute package name specified in object constructor
:return str: compute package name
submission_folder
property
submission_folder
Path to the submission folder for the project
:return str: path to the submission in the output folder
get_sample_piface
get_sample_piface(sample_name)
Get a list of pipeline interfaces associated with the specified sample.
Note that only valid pipeline interfaces will show up in the result (ones that exist on disk/remotely and validate successfully against the schema)
:param str sample_name: name of the sample to retrieve list of pipeline interfaces for :return list[looper.PipelineInterface]: collection of valid pipeline interfaces associated with selected sample
get_schemas
staticmethod
get_schemas(pifaces, schema_key=INPUT_SCHEMA_KEY)
Get the list of unique schema paths for a list of pipeline interfaces
:param str | Iterable[str] pifaces: pipeline interfaces to search schemas for :param str schema_key: where to look for schemas in the piface :return Iterable[str]: unique list of schema file paths
make_project_dirs
make_project_dirs()
Create project directory structure if it doesn't exist.
populate_pipeline_outputs
populate_pipeline_outputs()
Populate project and sample output attributes based on output schemas that pipeline interfaces point to.
set_sample_piface
set_sample_piface(sample_piface)
Add sample pipeline interfaces variable to object
:param list | str sample_piface: sample pipeline interface
PipelineInterface
Parses and holds information from a pipeline interface YAML file, including resource specifications and command templates.
PipelineInterface
PipelineInterface(config, pipeline_type=None)
Bases: YAMLConfigManager
This class parses, holds, and returns information for a yaml file that specifies how to interact with each individual pipeline. This includes both resources to request for cluster job submission, as well as arguments to be passed from the sample annotation metadata to the pipeline
:param str | Mapping config: path to file from which to parse configuration data, or pre-parsed configuration data. :param str pipeline_type: type of the pipeline, must be either 'sample' or 'project'.
pipeline_name
property
pipeline_name
choose_resource_package
choose_resource_package(namespaces, file_size)
Select resource bundle for given input file size to given pipeline.
:param float file_size: Size of input data (in gigabytes). :param Mapping[Mapping[str]] namespaces: namespaced variables to pass as a context for fluid attributes command rendering :return MutableMapping: resource bundle appropriate for given pipeline, for given input file size :raises ValueError: if indicated file size is negative, or if the file size value specified for any resource package is negative :raises InvalidResourceSpecificationException: if no default resource package specification is provided
get_pipeline_schemas
get_pipeline_schemas(schema_key=INPUT_SCHEMA_KEY)
Get path to the pipeline schema.
:param str schema_key: where to look for schemas in the pipeline iface :return str: absolute path to the pipeline schema file
render_var_templates
render_var_templates(namespaces)
Render path templates under 'var_templates' in this pipeline interface.
:param dict namespaces: namespaces to use for rendering
SubmissionConductor
Collects and submits pipeline jobs. Manages job pooling based on file size or command count limits.
SubmissionConductor
SubmissionConductor(pipeline_interface, prj, delay=0, extra_args=None, extra_args_override=None, ignore_flags=False, compute_variables=None, max_cmds=None, max_size=None, max_jobs=None, automatic=True, collate=False)
Bases: object
Collects and then submits pipeline jobs.
This class holds a 'pool' of commands to submit as a single cluster job. Eager to submit a job, each instance's collection of commands expands until it reaches the 'pool' has been filled, and it's therefore time to submit the job. The pool fills as soon as a fill criteria has been reached, which can be either total input file size or the number of individual commands.
Create a job submission manager.
The most critical inputs are the pipeline interface and the pipeline key, which together determine which provide critical pipeline information like resource allocation packages and which pipeline will be overseen by this instance, respectively.
:param PipelineInterface pipeline_interface: Collection of important data for one or more pipelines, like resource allocation packages and option/argument specifications :param prj: Project with which each sample being considered is associated (what generated each sample) :param float delay: Time (in seconds) to wait before submitting a job once it's ready :param str extra_args: string to pass to each job generated, for example additional pipeline arguments :param str extra_args_override: string to pass to each job generated, for example additional pipeline arguments. This deactivates the 'extra' functionality that appends strings defined in Sample.command_extra and Project.looper.command_extra to the command template. :param bool ignore_flags: Whether to ignore flag files present in the sample folder for each sample considered for submission :param dict[str] compute_variables: A dict with variables that will be made available to the compute package. For example, this should include the name of the cluster partition to which job or jobs will be submitted :param int | NoneType max_cmds: Upper bound on number of commands to include in a single job script. :param int | float | NoneType max_size: Upper bound on total file size of inputs used by the commands lumped into single job script. :param int | float | NoneType max_jobs: Upper bound on total number of jobs to group samples for submission. :param bool automatic: Whether the submission should be automatic once the pool reaches capacity. :param bool collate: Whether a collate job is to be submitted (runs on the project level, rather that on the sample level)
failed_samples
property
failed_samples
num_cmd_submissions
property
num_cmd_submissions
Return the number of commands that this conductor has submitted.
:return int: Number of commands submitted so far.
num_job_submissions
property
num_job_submissions
Return the number of jobs that this conductor has submitted.
:return int: Number of jobs submitted so far.
add_sample
add_sample(sample, rerun=False)
Add a sample for submission to this conductor.
:param peppy.Sample sample: sample to be included with this conductor's currently growing collection of command submissions :param bool rerun: whether the given sample is being rerun rather than run for the first time :return bool: Indication of whether the given sample was added to the current 'pool.' :raise TypeError: If sample subtype is provided but does not extend the base Sample class, raise a TypeError.
is_project_submittable
is_project_submittable(force=False)
Check whether the current project has been already submitted
:param bool frorce: whether to force the project submission (ignore status/flags)
submit
submit(force=False)
Submit one or more commands as a job.
This call will submit the commands corresponding to the current pool of samples if and only if the argument to 'force' evaluates to a true value, or the pool of samples is full.
:param bool force: Whether submission should be done/simulated even if this conductor's pool isn't full. :return bool: Whether a job was submitted (or would've been if not for dry run)
write_script
write_script(pool, size)
Create the script for job submission.
:param Iterable[peppy.Sample] pool: collection of sample instances :param float size: cumulative size of the given pool :return str: Path to the job submission script created.
ComputingConfiguration
Manages compute environment settings from divvy configuration files. Handles resource packages and submission templates.
ComputingConfiguration
ComputingConfiguration(entries=None, wait_max=None, strict_ro_locks=False, schema_source=None, validate_on_write=False)
Bases: FutureYAMLConfigManager
Represents computing configuration objects.
The ComputingConfiguration class provides a computing configuration object
that is an in memory representation of a divvy computing configuration
file. This object has various functions to allow a user to activate, modify,
and retrieve computing configuration files, and use these values to populate
job submission script templates.
:param str | Iterable[(str, object)] | Mapping[str, object] entries: config
Collection of key-value pairs.
:param str filepath: YAML file specifying computing package data. (the
DIVCFG file)
default_config_file
property
default_config_file
Path to default compute environment settings file.
:return str: Path to default compute settings file
templates_folder
property
templates_folder
Path to folder with default submission templates.
:return str: path to folder with default submission templates
activate_package
activate_package(package_name)
Activates a compute package.
This copies the computing attributes from the configuration file into
the compute attribute, where the class stores current compute
settings.
:param str package_name: name for non-resource compute bundle, the name of a subsection in an environment configuration file :return bool: success flag for attempt to establish compute settings
clean_start
clean_start(package_name)
Clear current active settings and then activate the given package.
:param str package_name: name of the resource package to activate :return bool: success flag
get_active_package
get_active_package()
Returns settings for the currently active compute package
:return YAMLConfigManager: data defining the active compute package
get_adapters
get_adapters()
Get current adapters, if defined.
Adapters are sourced from the 'adapters' section in the root of the divvy configuration file and updated with an active compute package-specific set of adapters, if any defined in 'adapters' section under currently active compute package.
:return YAMLConfigManager: current adapters mapping
list_compute_packages
list_compute_packages()
Returns a list of available compute packages.
:return set[str]: names of available compute packages
reset_active_settings
reset_active_settings()
Clear out current compute settings.
:return bool: success flag
template
template()
Get the currently active submission template.
:return str: submission script content template for current state
update_packages
update_packages(config_file)
Parse data from divvy configuration file.
Given a divvy configuration file, this function will update (not overwrite) existing compute packages with existing values. It does not affect any currently active settings.
:param str config_file: path to file with new divvy configuration data
write_script
write_script(output_path, extra_vars=None)
Given currently active settings, populate the active template to write a submission script. Additionally use the current adapters to adjust the select of the provided variables
:param str output_path: Path to file to write as submission script :param Iterable[Mapping] extra_vars: A list of Dict objects with key-value pairs with which to populate template fields. These will override any values in the currently active compute package. :return str: Path to the submission script file
Utility Functions
select_divvy_config
select_divvy_config
select_divvy_config(filepath)
Selects the divvy config file path to load.
This uses a priority ordering to first choose a config file path if it's given, but if not, then look in a priority list of environment variables and choose the first available file path to return. If none of these options succeed, the default config path will be returned.
:param str | NoneType filepath: direct file path specification :return str: path to the config file to read