cigar module

holds methods related to processing cigar tuples. Cigar tuples are generally an iterable list of tuples where the first element in each tuple is the CIGAR value (i.e. 1 for an insertion), and the second value is the frequency


counts the number of aligned bases irrespective of match/mismatch this is equivalent to counting all CIGAR.M

mavis.bam.cigar.compute(ref, alt, force_softclipping=True, min_exact_to_stop_softclipping=6)[source]

given a ref and alt sequence compute the cigar string representing the alt

returns the cigar tuples along with the start position of the alt relative to the ref


igv does not support the extended CIGAR values for match v mismatch


>>> convert_for_igv([(7, 4), (8, 1), (7, 5)])
[(0, 10)]
mavis.bam.cigar.extend_softclipping(cigar, min_exact_to_stop_softclipping)[source]

given some input cigar, extends softclipping if there are mismatches/insertions/deletions close to the end of the aligned portion. The stopping point is defined by the min_exact_to_stop_softclipping parameter. this function will throw an error if there is no exact match aligned portion to signal stop

  • original_cigar (list of CIGAR and int) – the input cigar
  • min_exact_to_stop_softclipping (int) – number of exact matches to terminate extension

  • list of CIGAR and int - new cigar list
  • int - shift from the original start position

Return type:


mavis.bam.cigar.hgvs_standardize_cigar(read, reference_seq)[source]

extend alignments as long as matches are possible. call insertions before deletions


given a number of cigar lists, joins them and merges any consecutive tuples with the same cigar value


>>> join([(1, 1), (4, 7)], [(4, 3), (2, 4)])
[(1, 1), (4, 10), (2, 4)]

returns the longest consecutive exact match

Parameters:cigar (list of tuple of int and int) – the cigar tuples
mavis.bam.cigar.longest_fuzzy_match(cigar, max_fuzzy_interupt=1)[source]

computes the longest sequence of exact matches allowing for ‘x’ event interrupts

  • cigar – cigar tuples
  • max_fuzzy_interupt (int) – number of mismatches allowed

calculates the percent of aligned bases (matches or mismatches) that are matches

mavis.bam.cigar.recompute_cigar_mismatch(read, ref)[source]

for cigar tuples where M is used, recompute to replace with X/= for increased utility and specificity


the cigar tuple

Return type:

list of tuple of int and int

mavis.bam.cigar.score(cigar, **kwargs)[source]

scoring based on sw alignment properties with gap extension penalties

  • cigar (list of CIGAR and int) – list of cigar tuple values
  • MISMATCH (int) – mismatch penalty
  • MATCH (int) – match penalty
  • GAP (int) – initial gap penalty
  • GAP_EXTEND (int) – gap extension penalty

the score value

Return type:



for a given string returns the smallest substring that is a repeat consuming the entire string


>>> smallest_nonoverlapping_repeat('ATATATA')
>>> smallest_nonoverlapping_repeat('ATATAT')
>>> smallest_nonoverlapping_repeat('CCCCCCCC')