Eido is used to 1) validate or 2) convert format of sample metadata. Sample metadata is stored according to the standard PEP specification. For validation, eido is based on JSON Schema and extends it with new features, like required input files. You can write your own schema for your pipeline and use eido to validate sample metadata. For conversion, eido filters convert sample metadata input into any output format, including custom filters.
Why do we need eido?
Data-intensive bioinformatics projects often include metadata describing a set of samples. When it comes to handling such sample metadata, there are two common challenges that eido solves:
- Validation. Tool authors use eido to specify and describe required input sample attributes. Input sample attributes are described with a schema, and eido validates the sample metadata to ensure it satisfies the tool's needs. Eido uses JSON Schema, which annotates and validates JSON. JSON schema alone is great for validating JSON, but bioinformatics sample metadata is more complicated, so eido provides additional capability and features tailored to bioinformatics projects listed below.
- Format conversion. Tools often require sample metadata in a specific format. Eido filters take a metadata in standard PEP format and convert it to any desired output format. Filters can be either built-in or custom. This allows a single sample metadata source to be used for multiple downstream analyses.
Eido validation features
An eido schema is written using the JSON Schema vocabulary, plus a few additional features:
- required input files. Eido adds
required_files, which allows a schema author to specify which attributes must point to files that exist.
- optional input files.
filesspecifies which attributes point to files that may or may not exist.
- project and sample validation. Eido validates project attributes separately from sample attributes.
- schema imports. Eido adds an
importssection for schemas that should be validated prior to this schema
- automatic multi-value support. Eido validates successfully for singular or plural sample attributes for strings, booleans, and numbers. This accommodates the PEP subsample_table feature.
How to use eido
- Use eido to validate data from the command line
- Use eido to validate data from Python
- Write your own schema
Why the name 'eido'?
Eidos is a Greek term meaning form, essence, or type (see Plato's Theory of Forms). Schemas are analogous to forms, and eido tests claims that an instance is of a particular form. Eido also helps change forms using filters.