Package geofetch
Documentation
Package-level data
Class Finder
Class for finding GSE accessions in special period of time. Additionally, user can add specific filters for the search, while initialization of the class
def __init__(self, filters: str=None, retmax: int=10000000)
Parameters:
filters
(``): filters that have to be added to the query.Filter Patterns can be found here: https://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Using_the_Advanced_Search_Pagretmax
(``): maximum number of retrieved accessions.
def find_differences(old_list: list, new_list: list) -> list
Compare 2 lists and search for elements that are not in old list
Parameters:
old_list
(``): old list of elementsnew_list
(``): new list of elements
Returns:
- ``: list of elements that are not in old list but are in new_list
def generate_file(self, file_path: str, gse_list: list=None)
Save the list of GSE accessions stored in this Finder object to a given file
Parameters:
file_path
(``): root to the file where gse accessions have to be savedgse_list
(``): list of gse accessions
Returns:
- ``: NoReturn
def get_gse_all(self) -> list
Get list of all gse accession available in GEO
Returns:
- ``: list of gse accession
def get_gse_by_date(self, start_date: str, end_date: str=None) -> list
Search gse accessions by providing start date and end date. By default, the last date is today.
Parameters:
start_date
(``): 'YYYY/MM/DD']end_date
(``): 'YYYY/MM/DD']
Returns:
- ``: list of gse accessions
def get_gse_by_day_count(self, n_days: int=1) -> list
Get list of gse accessions that were uploaded or updated in last X days
Parameters:
n_days
(``): number of days from now [e.g. 5]
Returns:
- ``: list of gse accession
def get_gse_id_by_query(self, url: str) -> list
Run esearch (ncbi search tool) by specifying URL and retrieve gse list result
Parameters:
url
(``): url of the query
Returns:
- ``: list of gse ids
def get_gse_last_3_month(self) -> list
Get list of gse accession that were uploaded or updated in last 3 month
Returns:
- ``: list of gse accession
def get_gse_last_week(self) -> list
Get list of gse accession that were uploaded or updated in last week
Returns:
- ``: list of gse accession
def uid_to_gse(uid: str) -> str
UID to GES accession converter
Parameters:
uid
(``): uid string (Unique Identifier Number in GEO)
Returns:
- ``: GSE id string
Class Geofetcher
Class to download or get projects, metadata, data from GEO and SRA
def __init__(self, name: str='', metadata_root: str='', metadata_folder: str='', just_metadata: bool=False, refresh_metadata: bool=False, config_template: str=None, pipeline_samples: str=None, pipeline_project: str=None, skip: int=0, acc_anno: bool=False, use_key_subset: bool=False, processed: bool=False, data_source: str='samples', filter: str=None, filter_size: str=None, geo_folder: str='.', split_experiments: bool=False, bam_folder: str='', fq_folder: str='', sra_folder: str='', bam_conversion: bool=False, picard_path: str='', input: str=None, const_limit_project: int=50, const_limit_discard: int=1000, attr_limit_truncate: int=500, max_soft_size: str='1GB', discard_soft: bool=False, add_dotfile: bool=False, disable_progressbar: bool=False, add_convert_modifier: bool=False, opts=None, max_prefetch_size=None, **kwargs)
Constructor
Parameters:
input
(``): GSEnumber or path to the input filename
(``): Specify a project name. Defaults to GSE number or name of accessions file namemetadata_root
(``): Specify a parent folder location to store metadata.The project name will be added as a subfolder [Default: $SRAMETA:]metadata_folder
(``): Specify an absolute folder location to store metadata. No subfolder will be added.Overrides value of --metadata-root [Default: Not used (--metadata-root is used by default)]just_metadata
(``): If set, don't actually run downloads, just create metadatarefresh_metadata
(``): If set, re-download metadata even if it exists.config_template
(``): Project config yaml file template.pipeline_samples
(``): Specify one or more filepaths to SAMPLES pipeline interface yaml files.These will be added to the project config file to make it immediately compatible with looper. [Default: null]pipeline_project
(``): Specify one or more filepaths to PROJECT pipeline interface yaml files.These will be added to the project config file to make it immediately compatible with looper. [Default: null]acc_anno
(``): Produce annotation sheets for each accession.Project combined PEP for the whole project won't be produced.discard_soft
(``): Create project without downloading soft files on the discadd_dotfile
(``): Add .pep.yaml file that points .yaml PEP filedisable_progressbar
(``): Set true to disable progressbar
def fetch_all(self, input: str, name: str=None) -> Union[NoReturn, peppy.project.Project]
Main function driver/workflow Function that search, filters, downloads and save data and metadata from GEO and SRA
Parameters:
input
(``): GSE or input file with gse'sname
(``): Name of the project
Returns:
- ``: NoReturn or peppy Project
def fetch_processed_one(self, gse_file_content: list, gsm_file_content: list, gsm_filter_list: dict) -> Tuple
Fetche one processed GSE project and return its metadata
Parameters:
gsm_file_content
(``): gse soft file contentgse_file_content
(``): gsm soft file contentgsm_filter_list
(``): list of gsm that have to be downloaded
Returns:
- ``: Tuple of project list of gsm samples and gse samples
def get_projects(self, input: str, just_metadata: bool=True, discard_soft: bool=True) -> dict
Function for fetching projects from GEO|SRA and receiving peppy project
Parameters:
input
(``): GSE number, or path to file of GSE numbersjust_metadata
(``): process only metadatadiscard_soft
(``): clean run, without downloading soft files
Returns:
- ``: peppy project or list of project, if acc_anno is set.
Version Information: geofetch
v0.12.6, generated by lucidoc
v0.4.4