Skip to content

Package peppy Documentation

Package Overview

The peppy package provides a Python interface for working with Portable Encapsulated Projects (PEPs). A PEP is a standardized format for organizing metadata for biological samples and sample-intensive data.

Key Features

  • Project Management: Create and manage collections of samples with metadata
  • Sample Access: Retrieve individual samples and their attributes
  • Amendments: Activate different project configurations
  • Validation: Validate projects against schemas

Installation

pip install peppy

Quick Example

from peppy import Project

# Initialize with a project config file
prj = Project(cfg="ngs.yaml")

# Access samples
samples = prj.samples

API Reference

Project Class

The main class for working with PEPs:

Project

Project(cfg=None, amendments=None, sample_table_index=None, subsample_table_index=None, defer_samples_creation=False)

Bases: MutableMapping

A class to model a Project (collection of samples and metadata).

:param str cfg: Project config file (YAML) or sample table (CSV/TSV) with one row per sample to constitute project :param str | Iterable[str] sample_table_index: name of the columns to set the sample_table index to :param str | Iterable[str] subsample_table_index: name of the columns to set the subsample_table index to :param str | Iterable[str] amendments: names of the amendments to activate :param Iterable[str] amendments: amendments to use within configuration file :param bool defer_samples_creation: whether the sample creation should be skipped

:Example:

.. code-block:: python

from peppy import Project
prj = Project(cfg="ngs.yaml")
samples = prj.samples
Source code in peppy/project.py
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
def __init__(
    self,
    cfg: str = None,
    amendments: Union[str, Iterable[str]] = None,
    sample_table_index: Union[str, Iterable[str]] = None,
    subsample_table_index: Union[str, Iterable[str]] = None,
    defer_samples_creation: bool = False,
):
    _LOGGER.debug(
        "Creating {}{}".format(
            self.__class__.__name__, " from file {}".format(cfg) if cfg else ""
        )
    )
    self._project_data = {}
    super(Project, self).__init__()
    is_cfg = is_cfg_or_anno(cfg)
    if is_cfg is None:
        # no 'cfg' provided. Empty Project will be created
        self[CONFIG_FILE_KEY] = None
        self[SAMPLE_TABLE_FILE_KEY] = None
        self[SUBSAMPLE_TABLES_FILE_KEY] = None
    elif is_cfg:
        # the provided 'cfg' is a project config file
        self[CONFIG_FILE_KEY] = cfg
        self[SAMPLE_TABLE_FILE_KEY] = None
        self[SUBSAMPLE_TABLES_FILE_KEY] = None
        self.parse_config_file(cfg, amendments)
    else:
        # the provided 'cfg' is a sample table
        self[SAMPLE_TABLE_FILE_KEY] = cfg
        self[SUBSAMPLE_TABLES_FILE_KEY] = None

    self._samples = []
    self[SAMPLE_EDIT_FLAG_KEY] = False
    self.progressbar = False

    # table indexes can be specified in config or passed to the object constructor
    # That's the priority order:
    # 1. constructor specified
    # 2. config specified (already set as Project attrs if config exists)
    # 3. defaults
    self.st_index = (
        sample_table_index or getattr(self, "st_index", None) or SAMPLE_NAME_ATTR
    )

    self.sst_index = (
        ([subsample_table_index] if subsample_table_index else None)
        or (
            [getattr(self, "sst_index", None)]
            if getattr(self, "sst_index", None)
            else None
        )
        or [SAMPLE_NAME_ATTR, SUBSAMPLE_NAME_ATTR]
    )

    if not defer_samples_creation:
        self.create_samples(modify=False if self[SAMPLE_TABLE_FILE_KEY] else True)
    self._sample_table = self._get_table_from_samples(
        index=self.st_index, initial=True
    )

amendments property

amendments

Return currently active list of amendments or None if none was activated

:return Iterable[str]: a list of currently active amendment names

config property

config

Get the config mapping

:return Mapping: config. May be formatted to comply with the most recent version specifications

config_file property

config_file

Get the config file path

:return str: path to the config file

list_amendments property

list_amendments

Return a list of available amendments or None if not declared

:return Iterable[str]: a list of available amendment names

pep_version property

pep_version

The declared PEP version string

It is validated to make sure it is a valid PEP version string

:raise InvalidConfigFileException: in case of invalid PEP version :return str: PEP version string

sample_name_colname property

sample_name_colname

Deprecated, please use Project.sample_table_index instead

Name of the effective sample name containing column in the sample table.

It is "sample_name" by default, but when it's missing it could be replaced by the selected sample table index, defined on the object instantiation stage.

:return str: name of the column that consist of sample identifiers

sample_table property

sample_table

Get sample table. If any sample edits were performed, it will be re-generated

:return pandas.DataFrame: a data frame with current samples attributes

sample_table_index property

sample_table_index

The effective sample table index.

It is sample_name by default, but could be overwritten by the selected sample table index, defined on the object instantiation stage or in the project configuration file via sample_table_index field.

That's the sample table index selection priority order:

  1. Constructor specified
  2. Config specified
  3. Deafult: sample_table

:return str: name of the column that consist of sample identifiers

samples property

samples

Generic/base Sample instance for each of this Project's samples.

:return Iterable[Sample]: Sample instance for each of this Project's samples

subsample_table property

subsample_table

Get subsample table

:return pandas.DataFrame: a data frame with subsample attributes

subsample_table_index property

subsample_table_index

The effective subsample table indexes.

It is [subasample_name, sample_name] by default, but could be overwritten by the selected subsample table indexes, defined on the object instantiation stage or in the project configuration file via subsample_table_index field.

That's the subsample table indexes selection priority order:

  1. Constructor specified
  2. Config specified
  3. Deafult: [subasample_name, sample_name]

:return List[str]: names of the columns that consist of sample and subsample identifiers

__getitem__

__getitem__(item)

Fetch the value of given key.

:param hashable item: key for which to fetch value :return object: value mapped to given key, if available :raise KeyError: if the requested key is unmapped.

Source code in peppy/project.py
1464
1465
1466
1467
1468
1469
1470
1471
1472
def __getitem__(self, item):
    """
    Fetch the value of given key.

    :param hashable item: key for which to fetch value
    :return object: value mapped to given key, if available
    :raise KeyError: if the requested key is unmapped.
    """
    return self._project_data[item]

__str__

__str__()

Representation in interpreter.

Source code in peppy/project.py
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
def __str__(self):
    """Representation in interpreter."""
    if len(self) == 0:
        return "{}"
    msg = "Project"
    if NAME_KEY in self and self[NAME_KEY] is not None:
        msg += f" '{self[NAME_KEY]}'"
    if CONFIG_FILE_KEY in self and self[CONFIG_FILE_KEY] is not None:
        msg += f" ({self[CONFIG_FILE_KEY]})"
    if DESC_KEY in self and self[DESC_KEY] is not None:
        msg += f"\n{DESC_KEY}: {self[DESC_KEY]}"
    try:
        num_samples = len(self._samples)
    except (AttributeError, TypeError):
        _LOGGER.debug("No samples established on project")
        num_samples = 0
    if num_samples > 0:
        msg = f"{msg}\n{num_samples} samples"
        sample_names = [s[self.st_index] for s in self.samples]
        repr_names = sample_names[:MAX_PROJECT_SAMPLES_REPR]
        context = (
            f" (showing first {MAX_PROJECT_SAMPLES_REPR})"
            if num_samples > MAX_PROJECT_SAMPLES_REPR
            else ""
        )
        msg = f"{msg}{context}: {', '.join(repr_names)}"
    else:
        msg = f"{msg} 0 samples"
    if CONFIG_KEY not in self:
        return msg
    msg = f"{msg}\nSections: {', '.join([s for s in self[CONFIG_KEY].keys()])}"
    if (
        PROJ_MODS_KEY in self[CONFIG_KEY]
        and AMENDMENTS_KEY in self[CONFIG_KEY][PROJ_MODS_KEY]
    ):
        msg = f"{msg}\nAmendments: {', '.join(self[CONFIG_KEY][PROJ_MODS_KEY][AMENDMENTS_KEY].keys())}"
    if self.amendments:
        msg = (
            f"{msg}\nActivated amendments: {', '.join(self[ACTIVE_AMENDMENTS_KEY])}"
        )
    return msg

activate_amendments

activate_amendments(amendments)

Update settings based on amendment-specific values.

This method will update Project attributes, adding new values associated with the amendments indicated, and in case of collision with an existing key/attribute the amendments' values will be favored.

:param Iterable[str] amendments: A string with amendment names to be activated :return peppy.Project: Updated Project instance :raise TypeError: if argument to amendment parameter is null :raise NotImplementedError: if this call is made on a project not created from a config file

Source code in peppy/project.py
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
def activate_amendments(self, amendments):
    """
    Update settings based on amendment-specific values.

    This method will update Project attributes, adding new values
    associated with the amendments indicated, and in case of collision with
    an existing key/attribute the amendments' values will be favored.

    :param Iterable[str] amendments: A string with amendment
        names to be activated
    :return peppy.Project: Updated Project instance
    :raise TypeError: if argument to amendment parameter is null
    :raise NotImplementedError: if this call is made on a project not
        created from a config file
    """
    amendments = [amendments] if isinstance(amendments, str) else amendments
    if amendments is None:
        raise TypeError(
            "The amendment argument can not be null. To deactivate an "
            "amendment use the deactivate_amendments method."
        )
    if not self[CONFIG_FILE_KEY]:
        raise NotImplementedError(
            "amendment activation isn't supported on a project not "
            "created from a config file"
        )
    prev = [(k, v) for k, v in self.items() if not k.startswith("_")]
    conf_file = self[CONFIG_FILE_KEY]
    self.__init__(cfg=conf_file, amendments=amendments)
    for k, v in prev:
        if k.startswith("_"):
            continue
        if k not in self or (self.is_null(k) and v is not None):
            _LOGGER.debug("Restoring {}: {}".format(k, v))
            self[k] = v
    self[ACTIVE_AMENDMENTS_KEY] = amendments
    return self

add_samples

add_samples(samples)

Add list of Sample objects

:param peppy.Sample | Iterable[peppy.Sample] samples: samples to add

Source code in peppy/project.py
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
def add_samples(self, samples):
    """
    Add list of Sample objects

    :param peppy.Sample | Iterable[peppy.Sample] samples: samples to add
    """
    samples = [samples] if isinstance(samples, Sample) else samples
    for sample in samples:
        if not isinstance(sample, Sample):
            _LOGGER.warning("Not a peppy.Sample object, not adding")
            continue
        self._samples.append(sample)
        self[SAMPLE_EDIT_FLAG_KEY] = True

attr_constants

attr_constants()

Update each Sample with constants declared by a Project. If Project does not declare constants, no update occurs.

Source code in peppy/project.py
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
def attr_constants(self):
    """
    Update each Sample with constants declared by a Project.
    If Project does not declare constants, no update occurs.
    """
    if self._modifier_exists(APPEND_KEY):
        to_append = self[CONFIG_KEY][SAMPLE_MODS_KEY][APPEND_KEY]
        _LOGGER.debug("Applying constant attributes: {}".format(to_append))

        for s in track(
            self.samples,
            description="Applying constant sample attributes",
            disable=not (self.is_sample_table_large and self.progressbar),
            console=Console(file=sys.stderr),
        ):
            for attr, val in to_append.items():
                if attr not in s:
                    s.update({attr: val})

attr_derive

attr_derive(attrs=None)

Set derived attributes for all Samples tied to this Project instance

Source code in peppy/project.py
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
def attr_derive(self, attrs=None):
    """
    Set derived attributes for all Samples tied to this Project instance
    """
    if not self._modifier_exists(DERIVED_KEY):
        return
    da = self[CONFIG_KEY][SAMPLE_MODS_KEY][DERIVED_KEY][DERIVED_ATTRS_KEY]
    ds = self[CONFIG_KEY][SAMPLE_MODS_KEY][DERIVED_KEY][DERIVED_SOURCES_KEY]
    derivations = attrs or (da if isinstance(da, list) else [da])
    _LOGGER.debug("Derivations to be done: {}".format(derivations))
    for sample in track(
        self.samples,
        description="Deriving sample attributes",
        disable=not (self.is_sample_table_large and self.progressbar),
        console=Console(file=sys.stderr),
    ):
        for attr in derivations:
            if attr not in sample:
                _LOGGER.debug(f"sample lacks '{attr}' attribute")
                continue
            elif attr in sample._derived_cols_done:
                _LOGGER.debug(f"'{attr}' has been derived")
                continue
            _LOGGER.debug(
                f"Deriving '{attr}' attribute for '{sample[self.st_index]}'"
            )

            # Set {atr}_key, so the original source can also be retrieved
            sample[ATTR_KEY_PREFIX + attr] = sample[attr]

            derived_attr = sample.derive_attribute(ds, attr)
            if derived_attr:
                _LOGGER.debug("Setting '{}' to '{}'".format(attr, derived_attr))
                sample[attr] = derived_attr
            else:
                _LOGGER.debug(
                    f"Not setting null/empty value for data source '{attr}': {type(derived_attr)}"
                )
            sample._derived_cols_done.append(attr)

attr_imply

attr_imply()

Infer value for additional field(s) from other field(s).

Add columns/fields to the sample based on values in those already-set that the sample's project defines as indicative of implications for additional data elements for the sample.

Source code in peppy/project.py
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
def attr_imply(self):
    """
    Infer value for additional field(s) from other field(s).

    Add columns/fields to the sample based on values in those already-set
    that the sample's project defines as indicative of implications for
    additional data elements for the sample.
    """
    if not self._modifier_exists(IMPLIED_KEY):
        return
    implications = self[CONFIG_KEY][SAMPLE_MODS_KEY][IMPLIED_KEY]
    if not isinstance(implications, list):
        raise InvalidConfigFileException(
            f"{SAMPLE_MODS_KEY}.{IMPLIED_KEY} has to be a list of key-value pairs"
        )
    _LOGGER.debug(f"Sample attribute implications: {implications}")
    for implication in implications:
        if not all([key in implication for key in IMPLIED_COND_KEYS]):
            raise InvalidConfigFileException(
                f"{SAMPLE_MODS_KEY}.{IMPLIED_KEY} section is invalid: {implication}"
            )
    for sample in track(
        self.samples,
        description="Implying sample attributes",
        disable=not (self.is_sample_table_large and self.progressbar),
        console=Console(file=sys.stderr),
    ):
        for implication in implications:
            implier_attrs = list(implication[IMPLIED_IF_KEY].keys())
            implied_attrs = list(implication[IMPLIED_THEN_KEY].keys())
            _LOGGER.debug(f"Setting Sample attributes implied by '{implier_attrs}'")
            for implier_attr in implier_attrs:
                implier_val = implication[IMPLIED_IF_KEY][implier_attr]
                if implier_attr not in sample:
                    _LOGGER.debug(
                        f"Sample lacks implier attr ({implier_attr}), skipping:"
                    )
                    break
                sample_val = sample[implier_attr]
                if sample_val not in implier_val:
                    _LOGGER.debug(
                        "Sample attr value does not match any of implier "
                        f"requirements ({sample_val} not in {implier_val}), skipping"
                    )
                    break
            else:
                # only executed if the inner loop did NOT break
                for implied_attr in implied_attrs:
                    imp_val = implication[IMPLIED_THEN_KEY][implied_attr]
                    _LOGGER.debug(
                        f"Setting implied attr: '{implied_attr}={imp_val}'"
                    )
                    sample.__setitem__(implied_attr, imp_val)

attr_merge

attr_merge()

Merge sample subannotations (from subsample table) with sample annotations (from sample_table)

Source code in peppy/project.py
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
def attr_merge(self):
    """
    Merge sample subannotations (from subsample table) with
    sample annotations (from sample_table)
    """
    if SUBSAMPLE_DF_KEY not in self or self[SUBSAMPLE_DF_KEY] is None:
        _LOGGER.debug("No {} found, skipping merge".format(CFG_SUBSAMPLE_TABLE_KEY))
        return
    for subsample_table in self[SUBSAMPLE_DF_KEY]:
        for sample_name in list(subsample_table[self.st_index]):
            if sample_name not in [s[self.st_index] for s in self.samples]:
                _LOGGER.warning(
                    ("Couldn't find matching sample for subsample: {}").format(
                        sample_name
                    )
                )
        for sample in track(
            self.samples,
            description=f"Merging subsamples, adding sample attrs: {', '.join(subsample_table.keys())}",
            disable=not (self.is_sample_table_large and self.progressbar),
            console=Console(file=sys.stderr),
        ):
            sample_colname = self.st_index
            if sample_colname not in subsample_table.columns:
                raise KeyError(
                    "Subannotation requires column '{}'.".format(sample_colname)
                )
            _LOGGER.debug(
                "Using '{}' as sample name column from "
                "subannotation table".format(sample_colname)
            )
            sample_indexer = (
                subsample_table[sample_colname] == sample[self.st_index]
            )
            this_sample_rows = subsample_table[sample_indexer]
            if len(this_sample_rows) == 0:
                _LOGGER.debug(
                    "No merge rows for sample '%s', skipping",
                    sample[self.st_index],
                )
                continue
            _LOGGER.debug("%d rows to merge", len(this_sample_rows))
            _LOGGER.debug("Merge rows dict: {}".format(this_sample_rows.to_dict()))

            merged_attrs = {key: list() for key in this_sample_rows.columns}
            _LOGGER.debug(this_sample_rows)
            for subsample_row_id, row in this_sample_rows.iterrows():
                try:
                    row[SUBSAMPLE_NAME_ATTR]
                except KeyError:
                    row[SUBSAMPLE_NAME_ATTR] = str(subsample_row_id)
                rowdata = row.to_dict()

                def _select_new_attval(merged_attrs, attname, attval):
                    """
                    Select new attribute value for the merged columns
                    dictionary
                    """
                    if attname in merged_attrs:
                        return merged_attrs[attname] + [attval]
                    return [str(attval).rstrip()]

                for attname, attval in rowdata.items():
                    if attname == sample_colname or not attval:
                        _LOGGER.debug(f"Skipping KV: {attname}={attval}")
                        continue
                    _LOGGER.debug(
                        f"merge: sample '{sample[self.st_index]}'; '{attname}'='{attval}'"
                    )
                    merged_attrs[attname] = _select_new_attval(
                        merged_attrs, attname, attval
                    )

            # remove sample name from the data with which to update sample
            merged_attrs.pop(sample_colname, None)

            _LOGGER.debug(
                f"Updating Sample {sample[self.st_index]}: {merged_attrs}"
            )
            sample.update(merged_attrs)

attr_remove

attr_remove()

Remove declared attributes from all samples that have them defined

Source code in peppy/project.py
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
def attr_remove(self):
    """
    Remove declared attributes from all samples that have them defined
    """

    def _del_if_in(obj, attr):
        if attr in obj:
            del obj[attr]

    if self._modifier_exists(REMOVE_KEY):
        to_remove = self[CONFIG_KEY][SAMPLE_MODS_KEY][REMOVE_KEY]
        _LOGGER.debug(f"Removing attributes: {to_remove}")
        for s in track(
            self.samples,
            description="Removing sample attributes",
            disable=not (self.is_sample_table_large and self.progressbar),
            console=Console(file=sys.stderr),
        ):
            for attr in to_remove:
                _del_if_in(s, attr)

attr_synonyms

attr_synonyms()

Copy attribute values for all samples to a new one

Source code in peppy/project.py
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
def attr_synonyms(self):
    """
    Copy attribute values for all samples to a new one
    """
    if self._modifier_exists(DUPLICATED_KEY):
        synonyms = self[CONFIG_KEY][SAMPLE_MODS_KEY][DUPLICATED_KEY]
        _LOGGER.debug(f"Applying synonyms: {synonyms}")
        for sample in track(
            self.samples,
            description="Applying synonymous sample attributes",
            disable=not (self.is_sample_table_large and self.progressbar),
            console=Console(file=sys.stderr),
        ):
            for attr, new in synonyms.items():
                if attr in sample:
                    sample[new] = sample[attr]
                else:
                    _LOGGER.warning(
                        f"The sample attribute to duplicate not found: {attr}"
                    )

create_samples

create_samples(modify=False)

Populate Project with Sample objects

Source code in peppy/project.py
355
356
357
358
359
360
361
362
363
364
365
366
367
def create_samples(self, modify: bool = False):
    """
    Populate Project with Sample objects
    """
    self._samples: List[Sample] = self.load_samples()
    if self.samples is None:
        _LOGGER.debug("No samples found in the project.")

    if modify:
        self.modify_samples()
    else:
        self._assert_samples_have_names()
        self._auto_merge_duplicated_names()

deactivate_amendments

deactivate_amendments()

Bring the original project settings back.

:return peppy.Project: Updated Project instance :raise NotImplementedError: if this call is made on a project not created from a config file

Source code in peppy/project.py
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
def deactivate_amendments(self):
    """
    Bring the original project settings back.

    :return peppy.Project: Updated Project instance
    :raise NotImplementedError: if this call is made on a project not
        created from a config file
    """
    if ACTIVE_AMENDMENTS_KEY not in self or self[ACTIVE_AMENDMENTS_KEY] is None:
        _LOGGER.warning("No amendments have been activated.")
        return self
    if not self[CONFIG_FILE_KEY]:
        raise NotImplementedError(
            "amendments deactivation isn't supported on a project that "
            "lacks a config file."
        )
    self._reinit()
    return self

from_dict classmethod

from_dict(pep_dictionary)

Init a peppy project instance from a dictionary representation of an already processed PEP.

:param Dict[Any] pep_dictionary: dict representation of the project {_config: dict, _samples: list | dict, _subsamples: list[list | dict]}

Source code in peppy/project.py
217
218
219
220
221
222
223
224
225
226
227
228
229
@classmethod
def from_dict(cls, pep_dictionary: dict):
    """
    Init a peppy project instance from a dictionary representation
    of an already processed PEP.

    :param Dict[Any] pep_dictionary: dict representation of the project {_config: dict,
                                                                         _samples: list | dict,
                                                                         _subsamples: list[list | dict]}
    """
    _LOGGER.info("Processing project from dictionary...")
    temp_obj = cls()
    return temp_obj._from_dict(pep_dictionary)

from_pandas classmethod

from_pandas(samples_df, sub_samples_df=None, config=None)

Init a peppy project instance from a pandas Dataframe

:param samples_df: in-memory pandas DataFrame object of samples :param sub_samples_df: in-memory list of pandas DataFrame objects of sub-samples :param config: dict of yaml file

Source code in peppy/project.py
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
@classmethod
def from_pandas(
    cls,
    samples_df: pd.DataFrame,
    sub_samples_df: List[pd.DataFrame] = None,
    config: dict = None,
):
    """
    Init a peppy project instance from a pandas Dataframe

    :param samples_df: in-memory pandas DataFrame object of samples
    :param sub_samples_df: in-memory list of pandas DataFrame objects of sub-samples
    :param config: dict of yaml file
    """
    tmp_obj = cls()
    if not config:
        config = {CONFIG_VERSION_KEY: PEP_LATEST_VERSION}
    tmp_obj[SAMPLE_DF_KEY] = samples_df.replace(np.nan, "")
    tmp_obj[SUBSAMPLE_DF_KEY] = sub_samples_df

    tmp_obj[SAMPLE_DF_LARGE] = tmp_obj[SAMPLE_DF_KEY].shape[0] > 1000

    tmp_obj[CONFIG_KEY] = config

    tmp_obj.create_samples(modify=False if tmp_obj[SAMPLE_TABLE_FILE_KEY] else True)
    tmp_obj._sample_table = tmp_obj._get_table_from_samples(
        index=tmp_obj.st_index, initial=True
    )
    return tmp_obj

from_pep_config classmethod

from_pep_config(cfg=None, amendments=None, sample_table_index=None, subsample_table_index=None, defer_samples_creation=False)

Init a peppy project instance from a yaml file

:param str cfg: Project config file (YAML) or sample table (CSV/TSV) with one row per sample to constitute project :param str | Iterable[str] sample_table_index: name of the columns to set the sample_table index to :param str | Iterable[str] subsample_table_index: name of the columns to set the subsample_table index to :param str | Iterable[str] amendments: names of the amendments to activate :param Iterable[str] amendments: amendments to use within configuration file :param bool defer_samples_creation: whether the sample creation should be skipped

Source code in peppy/project.py
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
@classmethod
def from_pep_config(
    cls,
    cfg: str = None,
    amendments: Union[str, Iterable[str]] = None,
    sample_table_index: Union[str, Iterable[str]] = None,
    subsample_table_index: Union[str, Iterable[str]] = None,
    defer_samples_creation: bool = False,
):
    """
    Init a peppy project instance from a yaml file

    :param str cfg: Project config file (YAML) or sample table (CSV/TSV)
        with one row per sample to constitute project
    :param str | Iterable[str] sample_table_index: name of the columns to set
        the sample_table index to
    :param str | Iterable[str] subsample_table_index: name of the columns to set
        the subsample_table index to
    :param str | Iterable[str] amendments: names of the amendments to activate
    :param Iterable[str] amendments: amendments to use within configuration file
    :param bool defer_samples_creation: whether the sample creation should be skipped
    """
    # TODO: this is just a copy of the __init__ method. It should be refactored
    return cls(
        cfg=cfg,
        amendments=amendments,
        sample_table_index=sample_table_index,
        subsample_table_index=subsample_table_index,
        defer_samples_creation=defer_samples_creation,
    )

from_pephub classmethod

from_pephub(registry_path)

Init project from pephubclient.

:param registry_path: PEPhub registry path :return: peppy Project

Source code in peppy/project.py
204
205
206
207
208
209
210
211
212
213
214
215
@classmethod
def from_pephub(cls, registry_path: str) -> "Project":
    """
    Init project from pephubclient.

    :param registry_path: PEPhub registry path
    :return: peppy Project
    """
    from pephubclient import PEPHubClient

    phc = PEPHubClient()
    return phc.load_project(project_registry_path=registry_path)

from_sample_yaml classmethod

from_sample_yaml(yaml_file)

Init a peppy project instance from a yaml file

:param str yaml_file: path to yaml file

Source code in peppy/project.py
299
300
301
302
303
304
305
306
307
308
309
310
@classmethod
def from_sample_yaml(cls, yaml_file: str):
    """
    Init a peppy project instance from a yaml file

    :param str yaml_file: path to yaml file
    """
    _LOGGER.info("Processing project from yaml...")
    with open(yaml_file, "r") as f:
        prj_dict = yaml.safe_load(f)
    pd_df = pd.DataFrame.from_dict(prj_dict)
    return cls.from_pandas(pd_df)

get_description

get_description()

Infer project description from config file.

The provided description has to be of class coercible to string

:return str: inferred name for project. :raise InvalidConfigFileException: if description is not of class coercible to string

Source code in peppy/project.py
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
def get_description(self):
    """
    Infer project description from config file.

    The provided description has to be of class coercible to string

    :return str: inferred name for project.
    :raise InvalidConfigFileException: if description is not of class
        coercible to string
    """
    if CONFIG_KEY not in self:
        return
    if DESC_KEY in self[CONFIG_KEY]:
        desc_str = str(self[CONFIG_KEY][DESC_KEY])
        if not isinstance(desc_str, str):
            try:
                desc_str = str(desc_str)
            except Exception as e:
                raise InvalidConfigFileException(
                    "Could not convert the specified Project description "
                    "({}) to string. Caught exception: {}".format(
                        desc_str, getattr(e, "message", repr(e))
                    )
                )
        return desc_str

get_sample

get_sample(sample_name)

Get an individual sample object from the project.

Will raise a ValueError if the sample is not found. In the case of multiple samples with the same name (which is not typically allowed), a warning is raised and the first sample is returned

:param str sample_name: The name of a sample to retrieve :raise ValueError: if there's no sample with the specified name defined :return peppy.Sample: The requested Sample object

Source code in peppy/project.py
1414
1415
1416
1417
1418
1419
1420
1421
1422
1423
1424
1425
1426
1427
1428
1429
1430
1431
1432
1433
1434
def get_sample(self, sample_name):
    """
    Get an individual sample object from the project.

    Will raise a ValueError if the sample is not found.
    In the case of multiple samples with the same name (which is not
    typically allowed), a warning is raised and the first sample is returned

    :param str sample_name: The name of a sample to retrieve
    :raise ValueError: if there's no sample with the specified name defined
    :return peppy.Sample: The requested Sample object
    """
    samples = self.get_samples([sample_name])
    if len(samples) > 1:
        _LOGGER.warning(
            f"{len(samples)} samples matched the name: {sample_name}. Returning the first one."
        )
    try:
        return samples[0]
    except IndexError:
        raise ValueError(f"Project has no sample named {sample_name}.")

get_samples

get_samples(sample_names)

Returns a list of sample objects given a list of sample names

:param list sample_names: A list of sample names to retrieve :return list[peppy.Sample]: A list of Sample objects

Source code in peppy/project.py
1436
1437
1438
1439
1440
1441
1442
1443
def get_samples(self, sample_names):
    """
    Returns a list of sample objects given a list of sample names

    :param list sample_names: A list of sample names to retrieve
    :return list[peppy.Sample]: A list of Sample objects
    """
    return [s for s in self.samples if s[self.st_index] in sample_names]

infer_name

infer_name()

Infer project name from config file path.

First assume the name is the folder in which the config file resides, unless that folder is named "metadata", in which case the project name is the parent of that folder.

:return str: inferred name for project. :raise InvalidConfigFileException: if the project lacks both a name and a configuration file (no basis, then, for inference) :raise InvalidConfigFileException: if specified Project name is invalid

Source code in peppy/project.py
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
def infer_name(self):
    """
    Infer project name from config file path.

    First assume the name is the folder in which the config file resides,
    unless that folder is named "metadata", in which case the project name
    is the parent of that folder.

    :return str: inferred name for project.
    :raise InvalidConfigFileException: if the project lacks both a name and
        a configuration file (no basis, then, for inference)
    :raise InvalidConfigFileException: if specified Project name is invalid
    """
    if CONFIG_KEY not in self:
        return
    if NAME_KEY in self[CONFIG_KEY]:
        if " " in self[CONFIG_KEY][NAME_KEY]:
            raise InvalidConfigFileException(
                "Specified Project name ({}) contains whitespace".format(
                    self[CONFIG_KEY][NAME_KEY]
                )
            )
        return self[CONFIG_KEY][NAME_KEY].replace(" ", "_")
    if not self[CONFIG_FILE_KEY]:
        raise NotImplementedError(
            "Project name inference isn't supported "
            "on a project that lacks a config file."
        )
    config_folder = os.path.dirname(self[CONFIG_FILE_KEY])
    project_name = os.path.basename(config_folder)
    if project_name == METADATA_KEY:
        project_name = os.path.basename(os.path.dirname(config_folder))
    return project_name.replace(" ", "_")

load_samples

load_samples()

Read the sample_table and subsample_tables into dataframes and store in the object root. The values sourced from the project config can be overwritten by the optional arguments.

Source code in peppy/project.py
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
def load_samples(self):
    """
    Read the sample_table and subsample_tables into dataframes
    and store in the object root. The values sourced from the
    project config can be overwritten by the optional arguments.
    """
    # To initiate project from pandas or dictionary we shouldn't run
    # this function otherwise it will cause errors
    if SAMPLE_DF_KEY not in self or self.amendments is not None:
        self._read_sample_data()

    samples_list = []
    if SAMPLE_DF_KEY not in self:
        return []

    if CONFIG_KEY not in self:
        self[CONFIG_KEY] = {CONFIG_VERSION_KEY: PEP_LATEST_VERSION}
        self[CONFIG_FILE_KEY] = None

    elif len(self[CONFIG_KEY]) < 1:
        self[CONFIG_KEY][CONFIG_VERSION_KEY] = PEP_LATEST_VERSION
        self[CONFIG_FILE_KEY] = None

    if SUBSAMPLE_DF_KEY not in self:
        self[SUBSAMPLE_DF_KEY] = None

    for _, r in self[SAMPLE_DF_KEY].iterrows():
        samples_list.append(Sample(r, prj=self))
    return samples_list

modify_samples

modify_samples()

Perform any sample modifications defined in the config.

Source code in peppy/project.py
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
def modify_samples(self):
    """
    Perform any sample modifications defined in the config.
    """
    if self._modifier_exists():
        # check for unrecognizable modification keys
        mod_diff = set(self[CONFIG_KEY][SAMPLE_MODS_KEY].keys()) - set(
            SAMPLE_MODIFIERS
        )
        if len(mod_diff) > 0:
            _LOGGER.warning(
                f"Config '{SAMPLE_MODS_KEY}' section contains unrecognized "
                f"subsections: {mod_diff}"
            )
    self.attr_remove()
    self.attr_constants()
    self.attr_synonyms()
    self.attr_imply()
    self._assert_samples_have_names()
    self._auto_merge_duplicated_names()
    self.attr_merge()
    self.attr_derive()

parse_config_file

parse_config_file(cfg_path=None, amendments=None)

Parse provided yaml config file and check required fields exist.

:param str cfg_path: path to the config file to read and parse :param Iterable[str] amendments: Name of amendments to activate :raises KeyError: if config file lacks required section(s)

Source code in peppy/project.py
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
def parse_config_file(
    self,
    cfg_path: str = None,
    amendments: Iterable[str] = None,
):
    """
    Parse provided yaml config file and check required fields exist.

    :param str cfg_path: path to the config file to read and parse
    :param Iterable[str] amendments: Name of amendments to activate
    :raises KeyError: if config file lacks required section(s)
    """
    if CONFIG_KEY not in self:
        self[CONFIG_KEY] = {}
        self[ORIGINAL_CONFIG_KEY] = {}
    if not os.path.exists(cfg_path) and not is_url(cfg_path):
        raise OSError(f"Project config file path does not exist: {cfg_path}")
    if os.path.islink(cfg_path):
        cfg_path = os.path.realpath(
            cfg_path
        )  # due to some problems with symlinks in Nextflow
    config = load_yaml(cfg_path)

    assert isinstance(
        config, Mapping
    ), "Config file parse did not yield a Mapping; got {} ({})".format(
        config, type(config)
    )

    _LOGGER.debug(f"Raw ({cfg_path}) config data: {config}")

    self._set_indexes(config)
    # recursively import configs
    if (
        PROJ_MODS_KEY in config
        and CFG_IMPORTS_KEY in config[PROJ_MODS_KEY]
        and config[PROJ_MODS_KEY][CFG_IMPORTS_KEY]
    ):
        _make_sections_absolute(config[PROJ_MODS_KEY], [CFG_IMPORTS_KEY], cfg_path)
        _LOGGER.info(
            "Importing external Project configurations: {}".format(
                ", ".join(config[PROJ_MODS_KEY][CFG_IMPORTS_KEY])
            )
        )
        for i in config[PROJ_MODS_KEY][CFG_IMPORTS_KEY]:
            _LOGGER.debug("Processing external config: {}".format(i))
            if os.path.exists(i):
                self.parse_config_file(cfg_path=i)
            else:
                _LOGGER.warning(
                    "External Project configuration does not" " exist: {}".format(i)
                )

    self[CONFIG_KEY].update(**config)
    self[ORIGINAL_CONFIG_KEY] = deepcopy(self[CONFIG_KEY])
    # Parse yaml into the project.config attributes
    _LOGGER.debug("Adding attributes: {}".format(", ".join(config)))
    # Overwrite any config entries with entries in the amendments
    amendments = [amendments] if isinstance(amendments, str) else amendments
    if amendments:
        for amendment in amendments:
            c = self[CONFIG_KEY]
            if (
                PROJ_MODS_KEY in c
                and AMENDMENTS_KEY in c[PROJ_MODS_KEY]
                and c[PROJ_MODS_KEY][AMENDMENTS_KEY] is not None
            ):
                _LOGGER.debug("Adding entries for amendment '{}'".format(amendment))
                try:
                    amends = c[PROJ_MODS_KEY][AMENDMENTS_KEY][amendment]
                except KeyError:
                    raise MissingAmendmentError(
                        amendment, c[PROJ_MODS_KEY][AMENDMENTS_KEY]
                    )
                _LOGGER.debug("Updating with: {}".format(amends))
                self[CONFIG_KEY].update(**amends)
                _LOGGER.info("Using amendments: {}".format(amendment))
            else:
                raise MissingAmendmentError(amendment)
        self[ACTIVE_AMENDMENTS_KEY] = amendments

    # determine config version and reformat it, if needed
    self[CONFIG_KEY][CONFIG_VERSION_KEY] = self.pep_version
    # here specify cfg sections that may need expansion
    relative_vars = [CFG_SAMPLE_TABLE_KEY, CFG_SUBSAMPLE_TABLE_KEY]
    _make_sections_absolute(self[CONFIG_KEY], relative_vars, cfg_path)

remove_samples

remove_samples(sample_names)

Remove Samples from Project

:param Iterable[str] sample_names: sample names to remove

Source code in peppy/project.py
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
def remove_samples(self, sample_names):
    """
    Remove Samples from Project

    :param Iterable[str] sample_names: sample names to remove
    """
    sample_names = [sample_names] if isinstance(sample_names, str) else sample_names
    samples_keep = [
        s for s in self.samples if s[self.sample_name_colname] not in sample_names
    ]
    if len(self._samples) != len(samples_keep):
        self._samples = samples_keep
        self[SAMPLE_EDIT_FLAG_KEY] = True

to_dict

to_dict(extended=False, orient='dict')

Convert the Project object to a dictionary.

:param bool extended: whether to produce complete project dict (used to reinit the project) :param Literal orient: orientation of the returned df :return dict: a dictionary representation of the Project object

Source code in peppy/project.py
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
def to_dict(
    self,
    # expand: bool = False, # expand was used to expand paths. This functionality was removed, because of attmapp
    extended: bool = False,
    orient: Literal[
        "dict", "list", "series", "split", "tight", "records", "index"
    ] = "dict",
) -> dict:
    """
    Convert the Project object to a dictionary.

    :param bool extended: whether to produce complete project dict (used to reinit the project)
    :param Literal orient: orientation of the returned df
    :return dict: a dictionary representation of the Project object
    """
    if extended:
        if self[SUBSAMPLE_DF_KEY] is not None:
            sub_df = [
                sub_a.to_dict(orient=orient) for sub_a in self[SUBSAMPLE_DF_KEY]
            ]
        else:
            sub_df = None

        if not self.get(ORIGINAL_CONFIG_KEY):
            self[ORIGINAL_CONFIG_KEY] = self[CONFIG_KEY]
        try:
            self[ORIGINAL_CONFIG_KEY][NAME_KEY] = self.name
        except NotImplementedError:
            self[ORIGINAL_CONFIG_KEY][NAME_KEY] = "unnamed"
        self[ORIGINAL_CONFIG_KEY][DESC_KEY] = self.description
        p_dict = {
            SAMPLE_RAW_DICT_KEY: self[SAMPLE_DF_KEY].to_dict(orient=orient),
            CONFIG_KEY: dict(self[ORIGINAL_CONFIG_KEY]),
            SUBSAMPLE_RAW_LIST_KEY: sub_df,
        }
    else:
        p_dict = {
            "project": self.config.copy(),
            "samples": [s.to_dict() for s in self.samples],
        }

    return p_dict

Sample Class

Sample

Sample(series, prj=None)

Bases: SimpleAttMap

Class to model Samples based on a pandas Series.

:param Mapping | pandas.core.series.Series series: Sample's data.

Source code in peppy/sample.py
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
def __init__(self, series, prj=None):
    super(Sample, self).__init__()

    data = dict(series)
    _LOGGER.debug("Sample data: {data}")

    # Attach Project reference
    try:
        data_proj = data.pop(PRJ_REF)
    except (AttributeError, KeyError):
        data_proj = None

    self.update(**data)

    if data_proj and PRJ_REF not in self:
        self[PRJ_REF] = data_proj

    if PRJ_REF in self and prj:
        _LOGGER.warning(
            "Project data provided both in data and as separate"
            " constructor argument; using direct argument"
        )
    if prj:
        self[PRJ_REF] = prj
    if not self.get(PRJ_REF):
        # Force empty attmaps to null and ensure something's set.
        self[PRJ_REF] = None
        _LOGGER.debug("No project reference for sample")
    else:
        prefix = "Project reference on a sample must be an instance of dict"

        if not isinstance(self[PRJ_REF], Mapping):
            raise TypeError(f"{prefix}; got {type(self[PRJ_REF]).__name__}")
    self._derived_cols_done = []
    self._attributes = list(series.keys())

project property

project

Get the project mapping

:return peppy.Project: project object the sample was created from

sample_name property

sample_name

Get the sample's name

:return str: current sample name derived from project's st_index

__str__

__str__(max_attr=10)

Representation in interpreter.

Source code in peppy/sample.py
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
def __str__(self, max_attr=10):
    """Representation in interpreter."""
    if len(self) == 0:
        return ""
    head = "Sample"
    try:
        head += f" '{self[SAMPLE_NAME_ATTR]}'"
    except KeyError:
        pass
    try:
        prj_cfg = self[PRJ_REF][CONFIG_FILE_KEY]
    except (KeyError, TypeError):
        pass
    else:
        head += f" in Project ({prj_cfg})"
    pub_attrs = {k: v for k, v in self.items() if not k.startswith("_")}
    maxlen = max(map(len, pub_attrs.keys())) + 2
    attrs = ""
    counter = 0
    for k, v in pub_attrs.items():
        key_to_show = (k + ":").ljust(maxlen)
        if not isinstance(v, list):
            val_to_show = v
        else:
            try:
                val_to_show = ", ".join([i for i in v if v is not None])
            except TypeError:
                val_to_show = "None"
        attrs += f"\n{key_to_show}{val_to_show}"
        if counter == max_attr:
            attrs += "\n\n...".ljust(maxlen) + f"(showing first {max_attr})"
            break
        counter += 1
    return head + "\n" + attrs

derive_attribute

derive_attribute(data_sources, attr_name)

Uses the template path provided in the project config section "data_sources" to piece together an actual path by substituting variables (encoded by "{variable}"") with sample attributes.

:param Mapping data_sources: mapping from key name (as a value in a cell of a tabular data structure) to, e.g., filepath :param str attr_name: Name of sample attribute (equivalently, sample sheet column) specifying a derived column. :return str: regex expansion of data source specified in configuration, with variable substitutions made :raises ValueError: if argument to data_sources parameter is null/empty

Source code in peppy/sample.py
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
def derive_attribute(self, data_sources, attr_name):
    """
    Uses the template path provided in the project config section
    "data_sources" to piece together an actual path by substituting
    variables (encoded by "{variable}"") with sample attributes.

    :param Mapping data_sources: mapping from key name (as a value in
        a cell of a tabular data structure) to, e.g., filepath
    :param str attr_name: Name of sample attribute
        (equivalently, sample sheet column) specifying a derived column.
    :return str: regex expansion of data source specified in configuration,
        with variable substitutions made
    :raises ValueError: if argument to data_sources parameter is null/empty
    """

    def _format_regex(regex, items):
        """
        Format derived source with object attributes

        :param str regex: string to format,
            e.g. {identifier}{file_id}_data.txt
        :param Iterable[Iterable[Iterable | str]] items: items to format
            the string with
        :raise InvalidSampleTableFileException: if after merging
            subannotations the lengths of multi-value attrs are not even
        :return Iterable | str: formatted regex string(s)
        """
        keys = [i[1] for i in Formatter().parse(regex) if i[1] is not None]
        if not keys:
            return [regex]
        if "$" in regex:
            _LOGGER.warning(
                "Not all environment variables were populated "
                "in derived attribute source: {}".format(regex)
            )
        attr_lens = [
            len(v) for k, v in items.items() if (isinstance(v, list) and k in keys)
        ]
        if not bool(attr_lens):
            return [_safe_format(regex, items)]
        if len(set(attr_lens)) != 1:
            msg = (
                "All attributes to format the {} ({}) have to be the "
                "same length, got: {}. Correct your {}".format(
                    DERIVED_SOURCES_KEY, regex, attr_lens, SAMPLE_SHEET_KEY
                )
            )
            raise InvalidSampleTableFileException(msg)
        vals = []
        for i in range(0, attr_lens[0]):
            items_cpy = cp(items)
            for k in keys:
                if isinstance(items_cpy[k], list):
                    items_cpy[k] = items_cpy[k][i]
            vals.append(_safe_format(regex, items_cpy))
        return vals

    def _safe_format(s, values):
        """
        Safely format string.

        If the values are missing the key is wrapped in curly braces.
        This is intended to preserve the environment variables specified
        using curly braces notation, for example: "${ENVVAR}/{sample_attr}"
        would result in "${ENVVAR}/populated" rather than a KeyError.

        :param str s: string with curly braces placeholders to populate
        :param Mapping values: key-value pairs to pupulate string with
        :return str: populated string
        """
        return Formatter().vformat(s, (), SafeDict(values))

    def _glob_regex(patterns):
        """
        Perform unix style pathname pattern expansion for multiple patterns

        :param Iterable[str] patterns: patterns to expand
        :return str | Iterable[str]: expanded patterns
        """
        outputs = []
        for p in patterns:
            if "*" in p or "[" in p:
                _LOGGER.debug("Pre-glob: {}".format(p))
                val_globbed = sorted(glob.glob(p))
                if not val_globbed:
                    _LOGGER.debug("No files match the glob: '{}'".format(p))
                else:
                    p = val_globbed
                    _LOGGER.debug("Post-glob: {}".format(p))

            outputs.extend(p if isinstance(p, list) else [p])
        return outputs if len(outputs) > 1 else outputs[0]

    if not data_sources:
        return None
    sn = self[SAMPLE_NAME_ATTR] if SAMPLE_NAME_ATTR in self else "this sample"
    try:
        source_key = self[attr_name]
    except AttributeError:
        reason = (
            "'{attr}': to locate sample's derived attribute source, "
            "provide the name of a key from '{sources}' or ensure "
            "sample has attribute '{attr}'".format(
                attr=attr_name, sources=DERIVED_SOURCES_KEY
            )
        )
        raise AttributeError(reason)

    try:
        regex = data_sources[source_key]
        _LOGGER.debug("Data sources: {data_sources}")
    except KeyError:
        _LOGGER.debug(
            f"{sn}: config lacks entry for {DERIVED_SOURCES_KEY} key: "
            f"'{source_key}' in column '{attr_name}'; known: {data_sources.keys()}"
        )
        return ""
    deriv_exc_base = (
        f"In sample '{sn}' cannot correctly parse derived "
        f"attribute source: {regex}."
    )
    try:
        expanded_regex = os.path.expandvars(regex)
        vals = _format_regex(expanded_regex, dict(self.items()))
        _LOGGER.debug("Formatted regex: {}".format(vals))
    except KeyError as ke:
        _LOGGER.warning(f"{deriv_exc_base} Can't access {str(ke)} attribute")
    except Exception as e:
        _LOGGER.warning(f"{deriv_exc_base} Caught exception: {str(e)}")
    else:
        return _glob_regex(vals)
    return None

get_sheet_dict

get_sheet_dict()

Create a K-V pairs for items originally passed in via the sample sheet. This is useful for summarizing; it provides a representation of the sample that excludes things like config files and derived entries.

:return OrderedDict: mapping from name to value for data elements originally provided via the sample sheet (i.e., the a map-like representation of the instance, excluding derived items)

Source code in peppy/sample.py
78
79
80
81
82
83
84
85
86
87
88
def get_sheet_dict(self):
    """
    Create a K-V pairs for items originally passed in via the sample sheet.
    This is useful for summarizing; it provides a representation of the
    sample that excludes things like config files and derived entries.

    :return OrderedDict: mapping from name to value for data elements
        originally provided via the sample sheet (i.e., the a map-like
        representation of the instance, excluding derived items)
    """
    return dict([[k, self[k]] for k in self._attributes])

to_dict

to_dict(add_prj_ref=False)

Serializes itself as dict object.

:param bool add_prj_ref: whether the project reference bound do the Sample object should be included in the YAML representation :return dict: dict representation of this Sample

Source code in peppy/sample.py
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
def to_dict(self, add_prj_ref=False):
    """
    Serializes itself as dict object.

    :param bool add_prj_ref: whether the project reference bound do the
        Sample object should be included in the YAML representation
    :return dict: dict representation of this Sample
    """

    def _obj2dict(obj, name=None):
        """
        Build representation of object as a dict, recursively
        for all objects that might be attributes of self.

        :param object obj: what to serialize to write to YAML.
        :param str name: name of the object to represent.
        :param Iterable[str] to_skip: names of attributes to ignore.
        """
        if name:
            _LOGGER.log(5, "Converting to dict: {name}")
        if isinstance(obj, list):
            return [_obj2dict(i) for i in obj]
        elif isinstance(obj, Mapping):
            return {
                k: _obj2dict(v, name=k)
                for k, v in obj.items()
                if not k.startswith("_")
            }
        if isinstance(obj, set):
            return [_obj2dict(i) for i in obj]
        elif isinstance(obj, Series):
            _LOGGER.warning("Serializing series as mapping, not array-like")
            return obj.to_dict()
        elif hasattr(obj, "dtype"):  # numpy data types
            # TODO: this fails with ValueError for multi-element array.
            return obj.item()
        elif isnull(obj):
            # Missing values as evaluated by pandas.isnull().
            # This gets correctly written into yaml.
            return None
        else:
            return obj

    serial = _obj2dict(self)
    if add_prj_ref:
        serial.update({"prj": grab_project_data(self[PRJ_REF])})
    return serial

to_yaml

to_yaml(path=None, add_prj_ref=False)

Serializes itself in YAML format. Writes to file if path is provided, else returns string representation.

:param str path: A file path to write yaml to; provide this or the subs_folder_path, defaults to None :param bool add_prj_ref: whether the project reference bound do the Sample object should be included in the YAML representation :return str | None: returns string representation of sample yaml or None

Source code in peppy/sample.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
def to_yaml(
    self, path: Optional[str] = None, add_prj_ref=False
) -> Union[str, None]:
    """
    Serializes itself in YAML format. Writes to file if path is provided, else returns string representation.

    :param str path: A file path to write yaml to; provide this or
        the subs_folder_path, defaults to None
    :param bool add_prj_ref: whether the project reference bound do the
        Sample object should be included in the YAML representation
    :return str | None: returns string representation of sample yaml or None
    """
    serial = self.to_dict(add_prj_ref=add_prj_ref)
    if path:
        path = os.path.expandvars(path)
        if os.path.exists(os.path.dirname(path)):
            with open(path, "w") as outfile:
                try:
                    yaml_data = yaml.safe_dump(serial, default_flow_style=False)
                except yaml.representer.RepresenterError:
                    _LOGGER.error("Serialized sample data: {}".format(serial))
                    raise
                outfile.write(yaml_data)
                _LOGGER.debug("Sample data written to: {}".format(path))
        else:
            _LOGGER.warning(
                "Could not write sample data to: {}. "
                "Directory does not exist".format(path)
            )
            return
    else:
        yaml_data = yaml.safe_dump(serial, stream=None, default_flow_style=False)
        return yaml_data