assemble module¶
-
class
mavis.assemble.
Contig
(sequence, score)[source]¶ Bases:
object
-
class
mavis.assemble.
DeBruijnGraph
(data=None, **attr)[source]¶ Bases:
networkx.classes.digraph.DiGraph
wrapper for a basic digraph enforces edge weights
Initialize a graph with edges, name, graph attributes.
Parameters: - data (input graph) – Data to initialize graph. If data=None (default) an empty graph is created. The data can be an edge list, or any NetworkX graph object. If the corresponding optional Python packages are installed the data can also be a NumPy matrix or 2d ndarray, a SciPy sparse matrix, or a PyGraphviz graph.
- name (string, optional (default='')) – An optional name for the graph.
- attr (keyword arguments, optional (default= no attributes)) – Attributes to add to graph as key=value pairs.
See also
convert
Examples
>>> G = nx.Graph() # or DiGraph, MultiGraph, MultiDiGraph, etc >>> G = nx.Graph(name='my graph') >>> e = [(1,2),(2,3),(3,4)] # list of edges >>> G = nx.Graph(e)
Arbitrary graph attribute pairs (key=value) may be assigned
>>> G=nx.Graph(e, day="Friday") >>> G.graph {'day': 'Friday'}
-
add_edge
(n1, n2, freq=1)[source]¶ add a given edge to the graph, if it exists add the frequency to the existing frequency count
-
trim_forks_by_freq
(min_weight)[source]¶ for all nodes in the graph, if the node has an out-degree > 1 and one of the outgoing edges has freq < min_weight. then that outgoing edge is deleted
-
mavis.assemble.
assemble
(sequences, assembly_max_kmer_size=None, assembly_min_nc_edge_weight=3, assembly_min_edge_weight=2, assembly_min_match_quality=0.95, assembly_min_read_mapping_overlap=None, assembly_min_contig_length=None, assembly_min_exact_match_to_remap=6, assembly_max_paths=20, assembly_min_uniq=0.01, assembly_max_kmer_strict=False, log=<function <lambda>>)[source]¶ for a set of sequences creates a DeBruijnGraph simplifies trailing and leading paths where edges fall below a weight threshold and the return all possible unitigs/contigs
Parameters: - sequences (
list
ofstr
) – a list of strings/sequences to assemble - assembly_max_kmer_size – see assembly_max_kmer_size
- assembly_min_nc_edge_weight – see assembly_min_nc_edge_weight
- assembly_min_edge_weight – see assembly_min_edge_weight
- assembly_min_match_quality – see assembly_min_match_quality
- assembly_min_read_mapping_overlap – see assembly_min_read_mapping_overlap
- assembly_min_contig_length – see assembly_min_contig_length
- assembly_min_exact_match_to_remap – see assembly_min_exact_match_to_remap
- assembly_max_paths – see assembly_max_paths
- log (function) – the log function
Returns: a list of putative contigs
Return type: - sequences (
-
mavis.assemble.
digraph_connected_components
(graph, subgraph=None)[source]¶ the networkx module does not support deriving connected components from digraphs (only simple graphs) this function assumes that connection != reachable this means there is no difference between connected components in a simple graph and a digraph
Parameters: graph (networkx.DiGraph) – the input graph to gather components from Returns: returns a list of compnents which are lists of node names Return type: list
oflist
-
mavis.assemble.
filter_contigs
(contigs, assembly_min_uniq=0.01)[source]¶ given a list of contigs, removes similar contigs to leave the highest (of the similar) scoring contig only
-
mavis.assemble.
kmers
(s, size)[source]¶ for a sequence, compute and return a list of all kmers of a specified size
Parameters: Returns: the list of kmers
Return type: Example
>>> kmers('abcdef', 2) ['ab', 'bc', 'cd', 'de', 'ef']
-
mavis.assemble.
pull_contigs_from_component
(assembly, component, assembly_min_nc_edge_weight, assembly_max_paths, log=<function devnull>)[source]¶ builds contigs from the a connected component of the assembly DeBruijn graph
Parameters: - assembly (DeBruijnGraph) – the assembly graph
- component (list) – list of nodes which make up the connected component
- assembly_min_nc_edge_weight (int) – the minimum weight to not remove a non cutting edge/path
- assembly_max_paths (int) – the maximum number of paths allowed before the graph is further simplified
- log (function) – the log function
Returns: the paths/contigs and their scores
Return type: