align module

Should take in a sam file from a aligner like bwa aln or bwa mem and convert it into a

mavis.align.SUPPORTED_ALIGNER = MavisNamespace(BLAT='blat', BWA_MEM='bwa mem')

supported aligners

Type

MavisNamespace

class mavis.align.SplitAlignment(*pos, **kwargs)[source]

Bases: mavis.breakpoint.BreakpointPair

alignment_id()[source]
alignment_rank()[source]
static breakpoint_contig_remapped_depth(breakpoint, contig, read)[source]
mapping_quality()[source]
query_consumption()[source]

fraction of the query sequence which is aligned (everything not soft-clipped) in either alignment

query_coverage()[source]

interval representing the total region of the input sequence that is covered by the combination of alignments

query_coverage_read1()[source]
query_coverage_read2()[source]
query_overlap_extension()[source]
score(consec_bonus=10)[source]

scores events between 0 and 1 penalizing events interrupting the alignment. Counts a split alignment as a single event

mavis.align.align_sequences(sequences, input_bam_cache, reference_genome, aligner, aligner_reference, aligner_output_file='aligner_out.temp', aligner_fa_input_file='aligner_in.fa', aligner_output_log='aligner_out.log', blat_limit_top_aln=25, blat_min_identity=0.7, clean_files=True, log=<mavis.util.Log object>, **kwargs)[source]

calls the alignment tool and parses the return output for a set of sequences

Parameters
  • sequences (dict of str to str) – dictionary of sequences by name

  • input_bam_cache (BamCache) – bam cache to be used as a template for reading the alignments

  • reference_genome – the reference genome

  • aligner (SUPPORTED_ALIGNER) – the name of the aligner to be used

  • aligner_reference (str) – path to the aligner reference file

mavis.align.call_paired_read_event(read1, read2, is_stranded=False)[source]

For a given pair of reads call all applicable events. Assume there is a major event from both reads and then call indels from the individual reads

mavis.align.call_read_events(read, secondary_read=None, is_stranded=False)[source]

Given a read, return breakpoint pairs representing all putative events

mavis.align.convert_to_duplication(alignment, reference_genome)[source]

Given a breakpoint call, tests if the untemplated sequences matches the preceding reference sequence. If it does this is annotated as a duplication and the new breakpoint pair is returned. If not, then the original breakpoint pair is returned

mavis.align.get_aligner_version(aligner)[source]

executes a subprocess to try and run the aligner without arguments and parse the version number from the output

Example

>>> get_aligner_version('blat')
'36x2'
mavis.align.query_coverage_interval(read)[source]
Returns

The portion of the original query sequence that is aligned by this read

Return type

Interval

mavis.align.read_breakpoint(read)[source]

convert a given read to a single breakpoint

mavis.align.select_contig_alignments(evidence, reads_by_query)[source]

standardize/simplify reads and filter bad/irrelevant alignments adds the contig alignments to the contigs