Pipestat configuration

Pipestat requires a few pieces of information to run:

a pipeline_name to write into, usually contained within the schema file (see below).

a path to the schema file that describes results that can be reported, e.g. "your/path/sample_output_schema.yaml":

title: An example Pipestat output schema
description: A pipeline that uses pipestat to report sample and project level results.
type: object
properties:
  pipeline_name: "default_pipeline_name"
  samples:
    type: array
    items:
      type: object
      properties:
        result_name:
          type: string
          description: "ResultName"

backend info: either path to a YAML-formatted file or pipestat config with PostgreSQL database login credentials.

Note that the config file can also contain a path to the yaml-formatted results file:

schema_path: sample_output_schema.yaml
#The config can contain either a results_file_path (file backend) or a database connection (database backend)
# results_file_path takes priority and will create a PipestatManager with a file backend
results_file_path: results_file.yaml 
database:
  dialect: postgresql
  driver: psycopg
  name: pipestat-test
  user: postgres
  password: pipestat-password
  host: 127.0.0.1
  port: 5432

Beginning with v0.10.0, there is also support for reporting results directly to PEPHub. Simply give PipestatManager a path to a PEPHub registry path:

psm = PipestatManager(pephub_path="databio/pipestat_demo:default", schema_path=my_schema_file_path)

You can also place this in the configuration file:

pephub_path: "databio/pipestat_demo:default"
schema_path: sample_output_schema.yaml

Apart from that, there are many other optional configuration points that have defaults. Please refer to the environment variables reference to learn about the the optional configuration options and their meaning.

Configuration sources

Pipestat configuration can come from 3 sources, with the following priority:

PipestatManager constructor
Pipestat configuration file
Environment variables