Skip to content

Multi Pipelines and Multi Result Files

Enable writing to a single results file from multiple pipelines

The pipestat API supports writing to one results.yaml file via multiple pipelines. This can be enabled during pipestat initialization by setting multi_pipelines to true:

psm = PipestatManager(results_file_path=result_file, schema_path=schema_file, multi_pipelines=True)

Example Results file

test_pipeline_01:
  project: {}
  sample:
    RECORD1:
      number_of_things: 50000
      pipestat_created_time: '2024-04-04 17:23:56'
      pipestat_modified_time: '2024-04-04 18:28:59'
      name_of_something: Another_Name

test_pipeline_02:
  project:
    RECORD2:
      number_of_things: 42
      pipestat_created_time: '2024-04-04 18:23:56'
      pipestat_modified_time: '2024-04-04 19:28:59'
  sample: {}

Enable writing each record to its own results file

Pipestat also supports writing a record's results to a separate file for every record. This can be achieved by placing the string {record_identifier} in the designated results_file_path field of the pipestat configuration file.

schema_path: sample_output_schema.yaml
results_file_path: "${DATA}/processed/results_pipeline/{record_identifier}/stats.yaml"

Here is an example from the PEPATAC pipeline which currently uses this functionality via a looper config file:

name: PEPATAC_tutorial
pep_config: tutorial_refgenie_project_config.yaml

output_dir: "${TUTORIAL}/processed/"
pipeline_interfaces:
  sample: ["${TUTORIAL}/tools/pepatac/sample_pipeline_interface.yaml"]
  project: ["${TUTORIAL}/tools/pepatac/project_pipeline_interface.yaml"]

pipestat:
  results_file_path: "${TUTORIAL}/processed/results_pipeline/{record_identifier}/stats.yaml"