Package ngstk
Documentation
Package exports
Class UnsupportedFiletypeException
Restrict domain of file types.
def count_fail_reads(file_name, paired_end, prog_path)
Count the number of reads that failed platform/vendor quality checks.
Parameters:
file_name
(str
): name/path to file to examinepaired_end
(``): this parameter is ignored; samtools automaticallycorrectly responds depending on the data in the bamfile; we leave the option here just for consistency, since all the other counting functions require the parameter; this makes it easier to swap counting functions during pipeline development.prog_path
(str
): path to main program/tool to use for the counting
Returns:
int
: count of failed reads
def count_flag_reads(file_name, flag, paired_end, prog_path)
Counts the number of reads with the specified flag.
Parameters:
file_name
(str
): name/path to file to examineflag
(str int |
): SAM flag value to be readpaired_end
(``): this parameter is ignored; samtools automaticallycorrectly responds depending on the data in the bamfile; we leave the option here just for consistency, since all the other counting functions require the parameter; this makes it easier to swap counting functions during pipeline development.prog_path
(str
): path to main program/tool to use for the counting
Returns:
str
: terminal-like text output
def count_lines(file_name)
Uses the command-line utility wc to count the number of lines in a file.
For MacOS, must strip leading whitespace from wc.
Parameters:
file_name
(str
): name of file whose lines are to be counted
Returns:
str
: terminal-like text output
def count_lines_zip(file_name)
Count number of lines in a zipped file.
This function eses the command-line utility wc to count the number of lines in a file. For MacOS, strip leading whitespace from wc.
Parameters:
file_name
(str
): path to file in which to count lines
Returns:
str
: terminal-like text output
def count_reads(file_name, paired_end, prog_path)
Count reads in a file.
Paired-end reads count as 2 in this function. For paired-end reads, this function assumes that the reads are split into 2 files, so it divides line count by 2 instead of 4. This will thus give an incorrect result if your paired-end fastq files are in only a single file (you must divide by 2 again).
Parameters:
file_name
(str
): Name/path of file whose reads are to be counted.paired_end
(bool
): Whether the file contains paired-end reads.prog_path
(str
): path to main program/tool to use for the counting
Returns:
str
: terminal-like text output (if input is SAM/BAM), or actualcount value (if input isn't SAM/BAM)
def get_file_size(filename)
Get size of all files in gigabytes (Gb).
Parameters:
filename
(str | collections.Iterable[str]
): A space-separatedstring or list of space-separated strings of absolute file paths.
Returns:
float
: size of file(s), in gigabytes.
def get_input_ext(input_file)
Get the extension of the input_file.
This function assumes you're using .bam, .fastq/.fq, or .fastq.gz/.fq.gz.
Parameters:
input_file
(str
): name/path of file for which to get extension
Returns:
str
: standardized extension
Raises:
ubiquerg.ngs.UnsupportedFiletypeException
: if the given file nameor path has an extension that's not supported
def is_fastq(file_name)
Determine whether indicated file appears to be in FASTQ format.
Parameters:
file_name
(str
): Name/path of file to check as FASTQ.
Returns:
bool
: Whether indicated file appears to be in FASTQ format, zippedor unzipped.
def is_gzipped_fastq(file_name)
Determine whether indicated file appears to be a gzipped FASTQ.
Parameters:
file_name
(str
): Name/path of file to check as gzipped FASTQ.
Returns:
bool
: Whether indicated file appears to be in gzipped FASTQ format.
def is_sam_or_bam(file_name)
Determine whether a file appears to be in a SAM format.
Parameters:
file_name
(str
): Name/path of file to check as SAM-formatted.
Returns:
bool
: Whether file appears to be SAM-formatted
def is_unzipped_fastq(file_name)
Determine whether indicated file appears to be an unzipped FASTQ.
Parameters:
file_name
(str
): Name/path of file to check as unzipped FASTQ.
Returns:
bool
: Whether indicated file appears to be in unzipped FASTQ format.
def parse_ftype(input_file)
Checks determine filetype from extension.
Parameters:
input_file
(str
): String to check.
Returns:
str
: filetype (extension without dot prefix)
Raises:
UnsupportedFiletypeException
: if file does not appear of asupported type
def peek_read_lengths_and_paired_counts_from_bam(bam, sample_size)
Counting read lengths and paired reads in a sample from a BAM.
Parameters:
bam
(str
): path to BAM file to examinesample_size
(int
): number of reads to look at for estimation
Returns:
defaultdict[int, int], int
: read length observation counts, andnumber of paired reads observed
Raises:
OSError
:
def samtools_view(file_name, param, prog_path, postpend='')
Run samtools view, with flexible parameters and post-processing.
This is used to implement the various read counting functions.
Parameters:
file_name
(str
): name/path of reads tile to useparam
(str
): String of parameters to pass to samtools viewprog_path
(str
): path to the samtools programpostpend
(str
): String to append to the samtools command;useful to add cut, sort, wc operations to the samtools view output.
Returns:
str
: terminal-like text output
Version Information: ngstk
v0.0.1pre2, generated by lucidoc
v0.4.3