sparse

NAME

sparse - manual page for sparse (LocARNA 2.0.0)

DESCRIPTION

sparse - fast pairwise fast alignment of RNAs.

USAGE: sparse [options] <Input 1> <Input 2>

sparse is an even faster, stronger sparsified pairwise alignment algorithm than locarna. Like locarna, it performs fast simultaneous folding and alignment based on two RNA sequences (or alignments). In addition to the filtering of considered base pairs by their probabilities, it filters by conditional probabilities of bases and base pairs in their enclosing loops.

Input, Constraints, and Output

The usage, input, constraints-specifications and output are analogous to locarna; please refer to the help or man page of locarna for explanations and examples.

OPTIONS

-h, –help

: Print this help.

–galaxy-xml

: Print galaxy xml wrapper.

-V, –version

: Print only version string.

-v, –verbose

: Be verbose. Prints input parameters, sequences and size information.

-q, –quiet

: Be quiet.

Scoring parameters:

-i, –indel=<score>(-150)

: Indel score. Score contribution of each single base insertion or deletion. Indel opening score and indel score define the affine scoring of gaps.

-i, –indel-loop=<score>(-300)

: Score for insertions and deletions of loops per base

–indel-opening=<score>(-750)

: Indel opening score. Score contribution of opening an insertion or deletion, i.e. score for a consecutive run of deletions or insertions. Indel opening score and indel score define the affine scoring of gaps.

–indel-opening-loop=<score>(-900)

: Opening score for insertions and deletions of loops

–ribosum-file=<f>(RIBOSUM85_60)

: File specifying the Ribosum base and base-pair similarities. [default: use RIBOSUM85_60 without requiring a Ribosum file.]

–use-ribosum=<bool>(true)

: Use ribosum scores for scoring base matches and base pair matches; note that tau=0 suppresses any effect on the latter.

-m, –match=<score>(50)

: Set score contribution of a base match (unless ribosum scoring).

-M, –mismatch=<score>(0)

: Set score contribution of a base mismatch (unless ribosum scoring).

–unpaired-penalty=<score>(0)

: Penalty for unpaired bases

-s, –struct-weight=<score>(200)

: Maximal weight of 1/2 arc match. Balances structure vs. sequence score contributions.

-e, –exp-prob=<prob>

: Expected base pair probability. Used as background probability for base pair scoring [default: calculated from sequence length].

-t, –tau=<factor>(100)

: Tau factor. Factor for contribution of sequence similarity in an arc match (in percent). tau=0 does not penalize any sequence information including compensatory mutations at arc matches, while tau=100 scores sequence similarity at ends of base matches (if a scoring matrix like ribosum is used, this adds the contributions for base pair match from the matrix). [default tau=0!]

-E, –exclusion=<score>(0)

: Score contribution per exclusion in structure local alignment. Set to zero for unrestricted structure locality.

–stacking

: Use stacking terms (requires stack-probs by RNAfold -p2)

–new-stacking

: Use new stacking terms (requires stack-probs by RNAfold -p2)

Partition function representation (for sequence envelopes):

–extended-pf

: Use extended precision for the computation of sequence envelopes. This enables handling significantly larger instances. [default]

–quad-pf

: Use quad precision for partition function values. Even more precision than extended pf, but usually much slower (overrides extended-pf).

Controlling_output:

-w, –width=<columns>(120)

: Width of alignment output.

–clustal=<file>

: Write alignment in ClustalW (aln) format to given file.

–stockholm=<file>

: Write alignment Stockholm format to given file.

–pp=<file>

: Write alignment in PP format to given file.

–alifold-consensus-dp

: Compute consensus dot plot by alifold (warning: this may fail for long sequences).

–consensus-structure=<type>(alifold)

: Type of consensus structures written to screen and stockholm output [alifold|mea|none] (default: none).

–consensus-gamma=<float>(1.0)

: Base pair weight for mea consensus computation. For MEA, base pairs are scored by their pair probability times 2 gamma; unpaired bases, by their unpaired probability.

-L, –local-output

: Output only local sub-alignment (to std out).

–local-file-output

: Write only local sub-alignment to output files.

-P, –pos-output

: Output only local sub-alignment positions.

–write-structure

: Write guidance structure in output.

–special-gap-symbols

: Special distinct gap symbols for loop gaps or gaps caused by sparsification

–stopwatch

: Print run time informations.

Heuristics for speed accuracy trade off:

-p, –min-prob=<prob>(0.001)

: Minimal probability. Only base pairs of at least this probability are taken into account.

–prob-unpaired-in-loop-threshold=<threshold>(0.00005)

: Threshold for prob_unpaired_in_loop

–prob-basepair-in-loop-threshold=<threshold>(0.0001)

: Threshold for prob_basepair_in_loop

–max-bps-length-ratio=<factor>(0.0)

: Maximal ratio of #base pairs divided by sequence length. This serves as a second filter on the "significant" base pairs. [default: 0.0 = no effect].

–max-uil-length-ratio=<factor>(0.0)

: Maximal ratio of #unpaired bases in loops divided by sequence length (def: no effect)

–max-bpil-length-ratio=<factor>(0.0)

: Maximal ratio of #base pairs in loops divided by loop length (def: no effect)

-D, –max-diff-am=<diff>(-1)

: Maximal difference for sizes of matched arcs. [-1=off]

-d, –max-diff=<diff>(-1)

: Maximal difference for positions of alignment traces (and aligned bases). [-1=off]

–max-diff-at-am=<diff>(-1)

: Maximal difference for positions of alignment traces at arc match ends. [-1=off]

–max-diff-aln=<aln file>()

: Maximal difference relative to given alignment (file in clustalw format)

–max-diff-pw-aln=<alignment>()

: Maximal difference relative to given alignment (string, delim=AMPERSAND)

–max-diff-relax

: Relax deviation constraints in multiple aligmnent

–min-trace-probability=<probability>(1e-5)

: Minimal sequence alignment probability of potential traces (probability-based sequence alignment envelope) [default=1e-4].

MEA score:

–mea-alignment

: Perform maximum expected accuracy alignment (instead of using the default similarity scoring).

–match-prob-method=<int>(0)

: Select method for computing sequence-based base match probablities (to be used for mea-type alignment scores). Methods: 1=probcons-style from HMM, 2=probalign-style from PFs, 3=from PFs, local

–probcons-file=<file>

: Read parameters for probcons-like calculation of match probabilities from probcons parameter file.

–temperature-alipf=<int>(300)

: Temperature for the */sequence* alignment/ partition functions used by the probcons-like sequence-based match/trace probability computation (this temperature is different from the \'physical\' temperature of RNA folding!).

–pf-struct-weight=<weight>(200)

: Structure weight in PF computations (for the computation of sequence-based match probabilties from partition functions).

–mea-gapcost

: Use gap cost in mea alignment

–mea-alpha=<weight>(0)

: Weight alpha for MEA

–mea-beta=<weight>(200)

: Weight beta for MEA

–mea-gamma=<weight>(100)

: Weight gamma for MEA

–probability-scale=<scale>(10000)

: Scale for probabilities/resolution of mea score

–write-match-probs=<file>

: Write match probs to file (don\'t align!).

–read-match-probs=<file>

: Read match probabilities from file.

–write-arcmatch-scores=<file>

: Write arcmatch scores (don\'t align!)

–read-arcmatch-scores=<file>

: Read arcmatch scores.

–read-arcmatch-probs=<file>

: Read arcmatch probabilities (weighted by factor mea_beta/100)

Constraints:

–noLP

: Disallow lonely pairs in prediction and alignment.

–maxBPspan=<span>(-1)

: Limit maximum base pair span [default=off].

–relaxed-anchors

: Use relaxed semantics of anchor constraints [default=strict semantics].

Input files:

The tool is called with two input files <Input 1> and <Input 2>, which specify the two input sequences or input alignments. Different input formats (Fasta, Clustal, Stockholm, LocARNA PP, ViennaRNA postscript dotplots) are accepted and automatically recognized (by file content); the two input files can be in different formats. Extended variants of the Clustal and Stockholm formats enable specifying anchor and structure constraints.

AVAILABILITY

The latest LocARNA package release is available online at at Github https://github.com/s-will/LocARNA and http://www.bioinf.uni-freiburg.de/Software/LocARNA/

COPYING (LICENSE)

REFERENCES

Sebastian Will, Christina Otto, Milad Miladi, Mathias M??hl, and Rolf Backofen. SPARSE: quadratic time simultaneous alignment and folding of RNAs without sequence-based heuristics. Bioinformatics 31 (15): 2489-2496, 2015. doi:10.1093/bioinformatics/btv185

AUTHOR

This man page is written and maintained by Sebastian Will it is part of the LocARNA package.

The sparse tool and sparse alignment algorithm is written by Milad Miladi. Library classes for strong ensemble-based sparsification are written by Christina Otto.

REPORTING BUGS

Report bugs to <miladim (at) informatik.uni-freiburg.de>.

Table of Contents