Package peppy Documentation
Package Overview
The peppy package provides a Python interface for working with Portable Encapsulated Projects (PEPs). A PEP is a standardized format for organizing metadata for biological samples and sample-intensive data.
Key Features
- Project Management: Create and manage collections of samples with metadata
- Sample Access: Retrieve individual samples and their attributes
- Amendments: Activate different project configurations
- Validation: Validate projects against schemas
Installation
pip install peppy
Quick Example
from peppy import Project
# Initialize with a project config file
prj = Project(cfg="ngs.yaml")
# Access samples
samples = prj.samples
API Reference
Project Class
The main class for working with PEPs:
Project
Project(cfg=None, amendments=None, sample_table_index=None, subsample_table_index=None, defer_samples_creation=False)
Bases: MutableMapping
A class to model a Project (collection of samples and metadata).
:param str cfg: Project config file (YAML) or sample table (CSV/TSV) with one row per sample to constitute project :param str | Iterable[str] sample_table_index: name of the columns to set the sample_table index to :param str | Iterable[str] subsample_table_index: name of the columns to set the subsample_table index to :param str | Iterable[str] amendments: names of the amendments to activate :param Iterable[str] amendments: amendments to use within configuration file :param bool defer_samples_creation: whether the sample creation should be skipped
:Example:
.. code-block:: python
from peppy import Project
prj = Project(cfg="ngs.yaml")
samples = prj.samples
Source code in peppy/project.py
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 | |
amendments
property
amendments
Return currently active list of amendments or None if none was activated
:return Iterable[str]: a list of currently active amendment names
config
property
config
Get the config mapping
:return Mapping: config. May be formatted to comply with the most recent version specifications
config_file
property
config_file
Get the config file path
:return str: path to the config file
list_amendments
property
list_amendments
Return a list of available amendments or None if not declared
:return Iterable[str]: a list of available amendment names
pep_version
property
pep_version
The declared PEP version string
It is validated to make sure it is a valid PEP version string
:raise InvalidConfigFileException: in case of invalid PEP version :return str: PEP version string
sample_name_colname
property
sample_name_colname
Deprecated, please use Project.sample_table_index instead
Name of the effective sample name containing column in the sample table.
It is "sample_name" by default, but when it's missing it could be replaced by the selected sample table index, defined on the object instantiation stage.
:return str: name of the column that consist of sample identifiers
sample_table
property
sample_table
Get sample table. If any sample edits were performed, it will be re-generated
:return pandas.DataFrame: a data frame with current samples attributes
sample_table_index
property
sample_table_index
The effective sample table index.
It is sample_name by default, but could be overwritten by the selected sample table index,
defined on the object instantiation stage or in the project configuration file
via sample_table_index field.
That's the sample table index selection priority order:
- Constructor specified
- Config specified
- Deafult:
sample_table
:return str: name of the column that consist of sample identifiers
samples
property
samples
Generic/base Sample instance for each of this Project's samples.
:return Iterable[Sample]: Sample instance for each of this Project's samples
subsample_table
property
subsample_table
Get subsample table
:return pandas.DataFrame: a data frame with subsample attributes
subsample_table_index
property
subsample_table_index
The effective subsample table indexes.
It is [subasample_name, sample_name] by default,
but could be overwritten by the selected subsample table indexes,
defined on the object instantiation stage or in the project configuration file
via subsample_table_index field.
That's the subsample table indexes selection priority order:
- Constructor specified
- Config specified
- Deafult:
[subasample_name, sample_name]
:return List[str]: names of the columns that consist of sample and subsample identifiers
__getitem__
__getitem__(item)
Fetch the value of given key.
:param hashable item: key for which to fetch value :return object: value mapped to given key, if available :raise KeyError: if the requested key is unmapped.
Source code in peppy/project.py
1464 1465 1466 1467 1468 1469 1470 1471 1472 | |
__str__
__str__()
Representation in interpreter.
Source code in peppy/project.py
1098 1099 1100 1101 1102 1103 1104 1105 1106 1107 1108 1109 1110 1111 1112 1113 1114 1115 1116 1117 1118 1119 1120 1121 1122 1123 1124 1125 1126 1127 1128 1129 1130 1131 1132 1133 1134 1135 1136 1137 1138 | |
activate_amendments
activate_amendments(amendments)
Update settings based on amendment-specific values.
This method will update Project attributes, adding new values associated with the amendments indicated, and in case of collision with an existing key/attribute the amendments' values will be favored.
:param Iterable[str] amendments: A string with amendment names to be activated :return peppy.Project: Updated Project instance :raise TypeError: if argument to amendment parameter is null :raise NotImplementedError: if this call is made on a project not created from a config file
Source code in peppy/project.py
953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 | |
add_samples
add_samples(samples)
Add list of Sample objects
:param peppy.Sample | Iterable[peppy.Sample] samples: samples to add
Source code in peppy/project.py
1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 | |
attr_constants
attr_constants()
Update each Sample with constants declared by a Project. If Project does not declare constants, no update occurs.
Source code in peppy/project.py
611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 | |
attr_derive
attr_derive(attrs=None)
Set derived attributes for all Samples tied to this Project instance
Source code in peppy/project.py
913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 | |
attr_imply
attr_imply()
Infer value for additional field(s) from other field(s).
Add columns/fields to the sample based on values in those already-set that the sample's project defines as indicative of implications for additional data elements for the sample.
Source code in peppy/project.py
859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 | |
attr_merge
attr_merge()
Merge sample subannotations (from subsample table) with sample annotations (from sample_table)
Source code in peppy/project.py
778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 | |
attr_remove
attr_remove()
Remove declared attributes from all samples that have them defined
Source code in peppy/project.py
590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 | |
attr_synonyms
attr_synonyms()
Copy attribute values for all samples to a new one
Source code in peppy/project.py
630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 | |
create_samples
create_samples(modify=False)
Populate Project with Sample objects
Source code in peppy/project.py
355 356 357 358 359 360 361 362 363 364 365 366 367 | |
deactivate_amendments
deactivate_amendments()
Bring the original project settings back.
:return peppy.Project: Updated Project instance :raise NotImplementedError: if this call is made on a project not created from a config file
Source code in peppy/project.py
991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 | |
from_dict
classmethod
from_dict(pep_dictionary)
Init a peppy project instance from a dictionary representation of an already processed PEP.
:param Dict[Any] pep_dictionary: dict representation of the project {_config: dict, _samples: list | dict, _subsamples: list[list | dict]}
Source code in peppy/project.py
217 218 219 220 221 222 223 224 225 226 227 228 229 | |
from_pandas
classmethod
from_pandas(samples_df, sub_samples_df=None, config=None)
Init a peppy project instance from a pandas Dataframe
:param samples_df: in-memory pandas DataFrame object of samples :param sub_samples_df: in-memory list of pandas DataFrame objects of sub-samples :param config: dict of yaml file
Source code in peppy/project.py
174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 | |
from_pep_config
classmethod
from_pep_config(cfg=None, amendments=None, sample_table_index=None, subsample_table_index=None, defer_samples_creation=False)
Init a peppy project instance from a yaml file
:param str cfg: Project config file (YAML) or sample table (CSV/TSV) with one row per sample to constitute project :param str | Iterable[str] sample_table_index: name of the columns to set the sample_table index to :param str | Iterable[str] subsample_table_index: name of the columns to set the subsample_table index to :param str | Iterable[str] amendments: names of the amendments to activate :param Iterable[str] amendments: amendments to use within configuration file :param bool defer_samples_creation: whether the sample creation should be skipped
Source code in peppy/project.py
268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 | |
from_pephub
classmethod
from_pephub(registry_path)
Init project from pephubclient.
:param registry_path: PEPhub registry path :return: peppy Project
Source code in peppy/project.py
204 205 206 207 208 209 210 211 212 213 214 215 | |
from_sample_yaml
classmethod
from_sample_yaml(yaml_file)
Init a peppy project instance from a yaml file
:param str yaml_file: path to yaml file
Source code in peppy/project.py
299 300 301 302 303 304 305 306 307 308 309 310 | |
get_description
get_description()
Infer project description from config file.
The provided description has to be of class coercible to string
:return str: inferred name for project. :raise InvalidConfigFileException: if description is not of class coercible to string
Source code in peppy/project.py
1072 1073 1074 1075 1076 1077 1078 1079 1080 1081 1082 1083 1084 1085 1086 1087 1088 1089 1090 1091 1092 1093 1094 1095 1096 | |
get_sample
get_sample(sample_name)
Get an individual sample object from the project.
Will raise a ValueError if the sample is not found. In the case of multiple samples with the same name (which is not typically allowed), a warning is raised and the first sample is returned
:param str sample_name: The name of a sample to retrieve :raise ValueError: if there's no sample with the specified name defined :return peppy.Sample: The requested Sample object
Source code in peppy/project.py
1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 | |
get_samples
get_samples(sample_names)
Returns a list of sample objects given a list of sample names
:param list sample_names: A list of sample names to retrieve :return list[peppy.Sample]: A list of Sample objects
Source code in peppy/project.py
1436 1437 1438 1439 1440 1441 1442 1443 | |
infer_name
infer_name()
Infer project name from config file path.
First assume the name is the folder in which the config file resides, unless that folder is named "metadata", in which case the project name is the parent of that folder.
:return str: inferred name for project. :raise InvalidConfigFileException: if the project lacks both a name and a configuration file (no basis, then, for inference) :raise InvalidConfigFileException: if specified Project name is invalid
Source code in peppy/project.py
1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 | |
load_samples
load_samples()
Read the sample_table and subsample_tables into dataframes and store in the object root. The values sourced from the project config can be overwritten by the optional arguments.
Source code in peppy/project.py
517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 | |
modify_samples
modify_samples()
Perform any sample modifications defined in the config.
Source code in peppy/project.py
547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 | |
parse_config_file
parse_config_file(cfg_path=None, amendments=None)
Parse provided yaml config file and check required fields exist.
:param str cfg_path: path to the config file to read and parse :param Iterable[str] amendments: Name of amendments to activate :raises KeyError: if config file lacks required section(s)
Source code in peppy/project.py
412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 | |
remove_samples
remove_samples(sample_names)
Remove Samples from Project
:param Iterable[str] sample_names: sample names to remove
Source code in peppy/project.py
1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 | |
to_dict
to_dict(extended=False, orient='dict')
Convert the Project object to a dictionary.
:param bool extended: whether to produce complete project dict (used to reinit the project) :param Literal orient: orientation of the returned df :return dict: a dictionary representation of the Project object
Source code in peppy/project.py
312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 | |
Sample Class
Sample
Sample(series, prj=None)
Bases: SimpleAttMap
Class to model Samples based on a pandas Series.
:param Mapping | pandas.core.series.Series series: Sample's data.
Source code in peppy/sample.py
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | |
project
property
project
Get the project mapping
:return peppy.Project: project object the sample was created from
sample_name
property
sample_name
Get the sample's name
:return str: current sample name derived from project's st_index
__str__
__str__(max_attr=10)
Representation in interpreter.
Source code in peppy/sample.py
338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 | |
derive_attribute
derive_attribute(data_sources, attr_name)
Uses the template path provided in the project config section "data_sources" to piece together an actual path by substituting variables (encoded by "{variable}"") with sample attributes.
:param Mapping data_sources: mapping from key name (as a value in a cell of a tabular data structure) to, e.g., filepath :param str attr_name: Name of sample attribute (equivalently, sample sheet column) specifying a derived column. :return str: regex expansion of data source specified in configuration, with variable substitutions made :raises ValueError: if argument to data_sources parameter is null/empty
Source code in peppy/sample.py
172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 | |
get_sheet_dict
get_sheet_dict()
Create a K-V pairs for items originally passed in via the sample sheet. This is useful for summarizing; it provides a representation of the sample that excludes things like config files and derived entries.
:return OrderedDict: mapping from name to value for data elements originally provided via the sample sheet (i.e., the a map-like representation of the instance, excluding derived items)
Source code in peppy/sample.py
78 79 80 81 82 83 84 85 86 87 88 | |
to_dict
to_dict(add_prj_ref=False)
Serializes itself as dict object.
:param bool add_prj_ref: whether the project reference bound do the Sample object should be included in the YAML representation :return dict: dict representation of this Sample
Source code in peppy/sample.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 | |
to_yaml
to_yaml(path=None, add_prj_ref=False)
Serializes itself in YAML format. Writes to file if path is provided, else returns string representation.
:param str path: A file path to write yaml to; provide this or the subs_folder_path, defaults to None :param bool add_prj_ref: whether the project reference bound do the Sample object should be included in the YAML representation :return str | None: returns string representation of sample yaml or None
Source code in peppy/sample.py
138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 | |