Tutorial of usage geofetch as python package
♪♫•♪♪♫•♪♪♫•♪♪♫•♪♪♫*
Geofetch provides python functions to fetch metadata and metadata from GEO and SRA by using python language. get_project
function returns dictionary of peppy projects that were found using filters and input you specified.
peppy is a Python package that provides an API for handling standardized project and sample metadata.
More information you can get here:
http://peppy.databio.org/en/latest/
http://pep.databio.org/en/2.0.0/
First let's import geofetch
from geofetch import Geofetcher
Initiate Geofetch object by specifying parameters that you want to use for downloading metadata/data
1) If you won't specify any parameters, default parameters will be used
geof = Geofetcher()
Metadata folder: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name
2) To download processed data with samples and series specify this two arguments:
geof = Geofetcher(processed=True, data_source="all")
Metadata folder: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name
3) To tune project parameter, where metadata should be stored use next parameters:
geof = Geofetcher(processed=True, data_source="all", const_limit_project = 20, const_limit_discard = 500, attr_limit_truncate = 10000 )
Metadata folder: /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name
4) To add more filter of other options see documentation
Run Geofetch
By default:
1) No actual data will be downloaded (just_metadata=True)
2) No soft files will be saved on the disc (discard_soft=True)
projects = geof.get_projects("GSE95654")
Trying GSE95654 (not a file) as accession...
Trying GSE95654 (not a file) as accession...
Output()
Skipped 0 accessions. Starting now.
[38;5;200mProcessing accession 1 of 1: 'GSE95654'[0m
Total number of processed SAMPLES files found is: 40
Total number of processed SERIES files found is: 0
Expanding metadata list...
Expanding metadata list...
Finished processing 1 accession(s)
Cleaning soft files ...
Unifying and saving of metadata...
Output()
Output()
No files found. No data to save. File /home/bnt4me/Virginia/repos/geof2/geofetch/docs_jupyter/project_name/GSE95654_series/GSE95654_series.csv won't be created
Check if projects were created by checking dict keys:
projects.keys()
dict_keys(['GSE95654_samples'])
project for samples was created! Now let's look into it.
* the values of the dictionary are peppy projects. More information about peppy Project you can find in the documentation: http://peppy.databio.org/en/latest/
len(projects['GSE95654_samples'].samples)
40
We got 40 samples from GSE95654 project. If you want to check if it's correct information go into: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE95654
Now let's see actual data. first 15 project and 5 clolumns:
projects['GSE95654_samples'].sample_table.iloc[:15 , :5]
sample_name | sample_library_strategy | genome_build | tissue | sample_organism_ch1 | |
---|---|---|---|---|---|
sample_name | |||||
RRBS_on_CRC_patient_8 | RRBS_on_CRC_patient_8 | Bisulfite-Seq | hg19 | primary tumor | Homo sapiens |
RRBS_on_adjacent_normal_colon_patient_8 | RRBS_on_adjacent_normal_colon_patient_8 | Bisulfite-Seq | hg19 | adjacent normal colon | Homo sapiens |
RRBS_on_CRC_patient_32 | RRBS_on_CRC_patient_32 | Bisulfite-Seq | hg19 | primary tumor | Homo sapiens |
RRBS_on_adjacent_normal_colon_patient_32 | RRBS_on_adjacent_normal_colon_patient_32 | Bisulfite-Seq | hg19 | adjacent normal colon | Homo sapiens |
RRBS_on_CRC_patient_41 | RRBS_on_CRC_patient_41 | Bisulfite-Seq | hg19 | primary tumor | Homo sapiens |
RRBS_on_adjacent_normal_colon_patient_41 | RRBS_on_adjacent_normal_colon_patient_41 | Bisulfite-Seq | hg19 | adjacent normal colon | Homo sapiens |
RRBS_on_CRC_patient_42 | RRBS_on_CRC_patient_42 | Bisulfite-Seq | hg19 | primary tumor | Homo sapiens |
RRBS_on_adjacent_normal_colon_patient_42 | RRBS_on_adjacent_normal_colon_patient_42 | Bisulfite-Seq | hg19 | adjacent normal colon | Homo sapiens |
RRBS_on_ACF_patient_173 | RRBS_on_ACF_patient_173 | Bisulfite-Seq | hg19 | aberrant crypt foci | Homo sapiens |
RRBS_on_ACF_patient_515 | RRBS_on_ACF_patient_515 | Bisulfite-Seq | hg19 | aberrant crypt foci | Homo sapiens |
RRBS_on_normal_crypts_patient_139 | RRBS_on_normal_crypts_patient_139 | Bisulfite-Seq | hg19 | normal colonic crypt | Homo sapiens |
RRBS_on_ACF_patient_143 | RRBS_on_ACF_patient_143 | Bisulfite-Seq | hg19 | aberrant crypt foci | Homo sapiens |
RRBS_on_normal_crypts_patient_143 | RRBS_on_normal_crypts_patient_143 | Bisulfite-Seq | hg19 | normal colonic crypt | Homo sapiens |
RRBS_on_normal_crypts_patient_165 | RRBS_on_normal_crypts_patient_165 | Bisulfite-Seq | hg19 | normal colonic crypt | Homo sapiens |
RRBS_on_ACF_patient_165 | RRBS_on_ACF_patient_165 | Bisulfite-Seq | hg19 | aberrant crypt foci | Homo sapiens |