Guidelines for Contributors

Getting Started

If you are new to the project a good way to get started is by adding to the documentation, or adding unit tests where there is a lack of code coverage.

Installing for Development (new to python projects?)

Clone the repository and switch to the development branch

git clone https://github.com/bcgsc/mavis.git
cd mavis
git checkout develop

Set up a python virtual environment. If you are developing in python setting up with a virtual environment can be incredibly helpful as it allows for a clean install to test. Instructions for setting up the environment are below

pip install virtualenv
virtualenv venv
source venv/bin/activate

Install the MAVIS python package. Running the setup in develop mode will ensure that your code changes are run when you run MAVIS from within that virtual environment

python setup.py develop

Run the unit tests and compute code coverage

python setup.py nosetests

Make the user manual (optional)

cd docs
make html

The contents of the user manual can then be viewed by opening the build/html/index.html in any available web browser (i.e. google-chrome, firefox, etc.)

Reporting a Bug

Please make sure to search through the issues before reporting a bug to ensure there isn’t already an open issue.

Coding Conventions

Formatting/Style

  • In general, follow pep8 style guides (except maximum line width)

  • docstrings should follow sphinx google code style

  • any column name which may appear in any of the intermediate or final output files must be defined in mavis.constants.COLUMNS

Types in docstrings

if you want to be more explicit with nested types, the following conventions are used throughout the code

  • dictionary: d = {<key>: <value>} becomes dict of <value> by <key>

  • list: l = [1, 2, 3] becomes list of int

  • mixed: d = {'a': [1, 2, 3], 'b': [4, 5, 6]} becomes dict of list of int by str

  • tuples: ('a', 1) becomes tuple of str and int

Tests

  • all new code must have unit tests in the tests subdirectory

  • in general for assertEqual statements, the expected value is given first

Major Assumptions

Some assumptions have been made when developing this project. The major ones have been listed here to facilitate debugging/development if any of these are violated in the future.

  • The input bam reads have stored the sequence wrt to the positive/forward strand and have not stored the reverse complement.

  • The distribution of the fragment sizes in the bam file approximately follows a normal distribution.

Current Limitations

  • Assembling contigs will always fail for repeat sequences as we do not resolve this. Unlike traditional assemblies we cannot assume even input coverage as we are taking a select portion of the reads to assemble.

  • Currently no attempt is made to group/pair single events into complex events.

  • Transcriptome validation uses a collapsed model of all overlapping transcripts and is not isoform specific. Allowing for isoform specific validation would be computationally expensive but may be considered as an optional setting for future releases.

Computing Code coverage

Since MAVIS uses multiple processes, it adds complexity to computing the code coverage. Running coverage normally will undereport. To ensure that the coverage module captures the information from the subprocesses we need to do the following

In our development python virtual environment put a coverage.pth file (ex. venv/lib/python3.6/site-packages/coverage.pth) containing the following

import coverage; coverage.process_startup()

Additionally you will need to set the environment variable

export COVERAGE_PROCESS_START=/path/to/mavis/repo/mavis/.coveragerc