.. _configuration-and-settings: Configuration and Settings ============================= .. _pipeline-config: Pipeline Configuration File ------------------------------- The pipeline can be run in steps or it can be configured using a configuration file and setup in a single step. Scripts will be generated to run all steps following clustering. The configuration file can be built from scratch or a template can be output as shown below .. code:: bash >>> mavis config --write template.cfg This will create a template config file called template.cfg which can then be edited by the user. However this will be a simple config with no library information. To generate a configuration file with the library information as well as estimates for the fragment size parameters more inputs are required (see :ref:`generating the config file ` for more information). .. _config-environment: Environment Variables --------------------------- Most of the default settings can be changed by using environment variables. The value given by the environment variables will be used as the new default. Config or command-line parameters will still override these settings. All environment variables are prefixed with MAVIS and an underscore. Otherwise the variable name is the same as that used for the command line parameter or config setting (uppercased). For example to change the default minimum mapping quality used during the validate stage .. code:: bash >>> export MAVIS_MIN_MAPPING_QUALITY=10 Adjusting the Resource Requirements ------------------------------------ Choosing the Number of Validation/Annotation Jobs ..................................................... MAVIS chooses the number of jobs to split validate/annotate stages into based on two settings: :term:`max_files` and :term:`min_clusters_per_file`. For example, in the following situation say you have: 1000 clusters, ``max_files=10``, and ``min_clusters_per_file=10``. Then MAVIS will set up 10 validation jobs each with 100 events. However, if ``min_clusters_per_file=500``, then MAVIS would only set up 2 jobs each with 500 events. This is because :term:`min_clusters_per_file` takes precedence over :term:`max_files`. Splitting into more jobs will lower the resource requirements per job (see :ref:`resource requirements `). The memory and time requirements for validation are linear with respect to the number of events to be validated. Uninformative Filter ...................... For example, if the user is only interested in events in genes, then the :term:`uninformative_filter` can be used. This will drop all events that are not within a certain distance (:term:`max_proximity`) to any annotation in the annotations reference file. These events will be dropped prior to the validation stage which results in significant speed up. This can be set using the environment variable .. code:: bash export MAVIS_UNINFORMATIVE_FILTER=True or in the pipeline config file .. code:: [cluster] uninformative_filter = True or as a command line argument to the cluster stage .. code:: bash mavis cluster --uninformative_filter True ....