Pipestat CLI

This tutorial demonstrates how to use the pipeline command-line interface (CLI). You should have already installed pipestat. Before following this tutorial please make sure you're familiar with more information-rich "Pipestat Python API" tutorial. Also, for the following tutorial, you will need to point to a sample_output_schema.yaml in the schema path. An example file can be found here:

https://github.com/pepkit/pipestat/blob/master/tests/data/sample_output_schema.yaml

Report results from the command line by calling pipestat and passing in all relevant information:

rm ../tests/data/test_results_1.yaml
touch ../tests/data/test_results_1.yaml
pipestat report \
    --record-identifier sample_name \
    --result-identifier percentage_of_things \
    --value 12 \
    --results-file ../tests/data/test_results_1.yaml \
    --schema ../tests/data/sample_output_schema.yaml

Reported records for 'sample_name' in 'pipeline_name' namespace:
 - percentage_of_things: 12

But this is obviously pretty cumbersome, since you have to pass lots of constant information to every call to report a result. So instead, you have an option to set up environment variables for a particular pipeline run:

Prepare environment

Pipestat environment variables avoid copious repetition of arguments in subsequent pipestat calls. Refer to the Environment variables reference for the complete list of supported environment variables. We will set a few for this tutorial:

export PIPESTAT_RESULTS_SCHEMA=../tests/data/sample_output_schema.yaml
export PIPESTAT_RECORD_IDENTIFIER=sample1
export PIPESTAT_RESULTS_FILE=`mktemp` # temporary file for results storage

Before we dive in, let's take a quick glance at the schema. This is the file that describes what sort of results are reported by this pipeline:

cat $PIPESTAT_RESULTS_SCHEMA

number_of_things:
  type: integer
  description: "Number of things"
percentage_of_things:
  type: number
  description: "Percentage of things"
name_of_something:
  type: string
  description: "Name of something"
swtich_value:
  type: boolean
  description: "Is the switch on of off"
collection_of_things:
  type: array
  description: "This store collection of values"
output_object:
  type: object
  description: "Object output"
output_file:
  type: file
  description: "This a path to the output file"
output_image:
  type: image
  description: "This a path to the output image"
md5sum:
  type: string
  description: "MD5SUM of an object"
  highlight: true

Reporting

Naturally, the command line interface provides access to all the Python API functionalities of pipestat. So, for example, to report a result and back the object by a file use:

pipestat report -i number_of_things -v 100

Reported records for 'sample1' in 'test' namespace:
 - number_of_things: 100

The result has been reported and the database file has been updated:

cat $PIPESTAT_RESULTS_FILE

test:
  sample1:
    number_of_things: 100

Let's report another result:

pipestat report -i percentage_of_things -v 1.1

Reported records for 'sample1' in 'test' namespace:
 - percentage_of_things: 1.1

cat $PIPESTAT_RESULTS_FILE

test:
  sample1:
    number_of_things: 100
    percentage_of_things: 1.1

Inspection

pipestat inspect command is a way to briefly look at the general PipestatManager state, like number of records, type of backend etc.

pipestat inspect

PipestatManager (test)
Backend: File 
 - results: /var/folders/h8/8npwnh2s4rb8lr6hsy2ydrsh0000gp/T/tmp.hk8q23wT
 - status: /var/folders/h8/8npwnh2s4rb8lr6hsy2ydrsh0000gp/T)
Results schema source: ../tests/data/sample_output_schema.yaml
Status schema source: /usr/local/lib/python3.9/site-packages/pipestat/schemas/status_schema.yaml
Records count: 1
Highlighted results: md5sum

In order to display the contents of the results file or database table associated with the indicated namespace, add --data flag:

pipestat inspect --data

PipestatManager (test)
Backend: File 
 - results: /var/folders/h8/8npwnh2s4rb8lr6hsy2ydrsh0000gp/T/tmp.hk8q23wT
 - status: /var/folders/h8/8npwnh2s4rb8lr6hsy2ydrsh0000gp/T)
Results schema source: ../tests/data/sample_output_schema.yaml
Status schema source: /usr/local/lib/python3.9/site-packages/pipestat/schemas/status_schema.yaml
Records count: 1
Highlighted results: md5sum

Data:
test:
  sample1:
    number_of_things: 100
    percentage_of_things: 1.1

Retrieval

Naturally, the reported results can be retrieved. Just call pipestat retrieve to do so:

pipestat retrieve -i percentage_of_things

1.1

History Retrieval

If you overwrite results, pipestat keeps a history by default. Use pipestat history to see previously reported results:

{'percentage_of_things': {'2024-04-18 14:53:58': '1.1'}}

Removal

In order to remove a result call pipestat remove:

pipestat remove -i percentage_of_things

Removed result 'percentage_of_things' for record 'sample1' from 'test' namespace

The results file and the state of the PipestatManager object reflect the removal:

cat $PIPESTAT_RESULTS_FILE

test:
  sample1:
    number_of_things: 100

pipestat inspect --data

PipestatManager (test)
Backend: File 
 - results: /var/folders/h8/8npwnh2s4rb8lr6hsy2ydrsh0000gp/T/tmp.hk8q23wT
 - status: /var/folders/h8/8npwnh2s4rb8lr6hsy2ydrsh0000gp/T)
Results schema source: ../tests/data/sample_output_schema.yaml
Status schema source: /usr/local/lib/python3.9/site-packages/pipestat/schemas/status_schema.yaml
Records count: 1
Highlighted results: md5sum

Data:
test:
  sample1:
    number_of_things: 100

Status management

To manage pipeline status call pipestat status <subcommand>:

set to set pipeline statuses
get to retrieve pipeline statuses

Starting with pipestat 0.0.3 the --schema argument is not required for status management.

pipestat status set running

pipestat status get

running

Note that only statuses defined in the status schema are supported:

cat /usr/local/lib/python3.9/site-packages/pipestat/schemas/status_schema.yaml

running:
  description: "the pipeline is running"
  color: [30, 144, 255] # dodgerblue
completed:
  description: "the pipeline has completed"
  color: [50, 205, 50] # limegreen
failed:
  description: "the pipeline has failed"
  color: [220, 20, 60] # crimson
waiting:
  description: "the pipeline is waiting"
  color: [240, 230, 140] # khaki
partial:
  description: "the pipeline stopped before completion point"
  color: [169, 169, 169] # darkgray

HTML Report Generation

To generate a static html report, call pipestat summarize --results-file PIPESTAT_RESULTS_FILE --schema PIPESTAT_RESULTS_SCHEMA

rm $PIPESTAT_RESULTS_FILE