LocARNA-2.0.0
locarna

locarna

NAME

LocARNA - manual page for LocARNA 2.0.0

DESCRIPTION

locarna - pairwise (global and local) alignment of RNA.

USAGE: locarna [options] <Input 1> <Input 2>

locarna is the pairwise alignment tool of the LocARNA package, which performs fast simultaneous folding and alignment based on two RNA sequences (or alignments).

Input

Input consists of two sequences or alignments, which are specified in fasta, clustal, stockholm, or LocARNA pp format. Optionally, structure and anchor constraints can be specified in the input files. If alignments are given in the input, they are aligned without revising the gap structure within the given alignments. Unless specified, base pair probabilities of the input sequences or alignments are predicted using the ViennaRNA package. Optionally, base pair probability information can be passed for one or both input sequences (or alignments) using the input formats LocARNA PP 2.0 or ViennaRNA postscript dotplot format.

Constraints

Anchor and structure constraints can be specified in the input files. Anchor constraints for sequences (alignments) are defined by assigning names to sequence positions (alignment columns), respectively. The exact semantics is either strict or relaxed (controled by **–relaxed-anchors). In strict semantics, anchor names have to be sorted lexicographically in the input as well as in the result alignment (in the sense that result columns receive inherit the name from one or both input positions, where conflicts are disallowed). In relaxed semantics, anchors of the same name are forced into the same alignment column. The actual syntax of the constraint specification depends on the file format (see Constraint Examples below).

Output

The final pairwise alignment is reported in standard and/or variants of the clustal and stockholm format, as well as LocARNA\'s own pp format.

OPTIONS

-h, –help

: Print this help.

–galaxy-xml

: Print galaxy xml wrapper.

-V, –version

: Print only version string.

-v, –verbose

: Be verbose. Prints input parameters, sequences and size information.

-q, –quiet

: Be quiet.

Scoring parameters:

-i, –indel=<score>(-150)

: Indel score. Score contribution of each single base insertion or deletion. Indel opening score and indel score define the affine scoring of gaps.

–indel-opening=<score>(-750)

: Indel opening score. Score contribution of opening an insertion or deletion, i.e. score for a consecutive run of deletions or insertions. Indel opening score and indel score define the affine scoring of gaps.

–ribosum-file=<f>(RIBOSUM85_60)

: File specifying the Ribosum base and base-pair similarities. [default: use RIBOSUM85_60 without requiring a Ribosum file.]

–use-ribosum=<bool>(true)

: Use ribosum scores for scoring base matches and base pair matches; note that tau=0 suppresses any effect on the latter.

-m, –match=<score>(50)

: Set score contribution of a base match (unless ribosum scoring).

-M, –mismatch=<score>(0)

: Set score contribution of a base mismatch (unless ribosum scoring).

–unpaired-penalty=<score>(0)

: Penalty for unpaired bases

-s, –struct-weight=<score>(200)

: Maximal weight of 1/2 arc match. Balances structure vs. sequence score contributions.

-e, –exp-prob=<prob>

: Expected base pair probability. Used as background probability for base pair scoring [default: calculated from sequence length].

-t, –tau=<factor>(50)

: Tau factor. Factor for contribution of sequence similarity in an arc match (in percent). tau=0 does not penalize any sequence information including compensatory mutations at arc matches, while tau=100 scores sequence similarity at ends of base matches (if a scoring matrix like ribosum is used, this adds the contributions for base pair match from the matrix). [default tau=0!]

-E, –exclusion=<score>(0)

: Score contribution per exclusion in structure local alignment. Set to zero for unrestricted structure locality.

–stacking

: Use stacking terms (requires stack-probs by RNAfold -p2)

–new-stacking

: Use new stacking terms (requires stack-probs by RNAfold -p2)

Partition function representation (for sequence envelopes):

–extended-pf

: Use extended precision for the computation of sequence envelopes. This enables handling significantly larger instances. [default]

–quad-pf

: Use quad precision for partition function values. Even more precision than extended pf, but usually much slower (overrides extended-pf).

Locality:

–struct-local=<bool>(false)

: Turn on/off structure locality. Allow exclusions in alignments of connected substructures.

–sequ-local=<bool>(false)

: Turn on/off sequence locality. Find best alignment of arbitrary subsequences of the input sequences.

–free-endgaps=<spec>(--**–**)

: Control where end gaps are allowed for free. String of four +/- symbols, allowing/disallowing free end gaps at the four sequence ends in the order left end of first sequence, right end of first sequence, left end of second sequence, right end of second sequence. For example, "+**-**--" allows free end gaps at the left end of the first alignment string; "**-**-**--" forbids free end gaps [default].

–normalized=<L>(0)

: Perform normalized local alignment with parameter L. This causes locarna to compute the best local alignment according to \'Score\' / ( L + \'length\' ), where length is the sum of the lengths of the two locally aligned subsequences. Thus, the larger L, the larger the local alignment; the size of value L is in the order of local alignment lengths. Verbose yields info on the iterative optimizations.

–penalized=<PP>(0)

: Penalized local alignment with penalty PP

Output:

-w, –width=<columns>(120)

: Width of alignment output.

–clustal=<file>

: Write alignment in ClustalW (aln) format to given file.

–stockholm=<file>

: Write alignment Stockholm format to given file.

–pp=<file>

: Write alignment in PP format to given file.

–alifold-consensus-dp

: Compute consensus dot plot by alifold (warning: this may fail for long sequences).

–consensus-structure=<type>(none)

: Type of consensus structures written to screen and stockholm output [alifold|mea|none] (default: none).

–consensus-gamma=<float>(1.0)

: Base pair weight for mea consensus computation. For MEA, base pairs are scored by their pair probability times 2 gamma; unpaired bases, by their unpaired probability.

-L, –local-output

: Output only local sub-alignment (to std out).

–local-file-output

: Write only local sub-alignment to output files.

-P, –pos-output

: Output only local sub-alignment positions.

–write-structure

: Write guidance structure in output.

–score-components

: Output components of the score (experimental).

–stopwatch

: Print run time informations.

Heuristics for speed accuracy trade off:

-p, –min-prob=<probability>(0.001)

: Minimal probability. Only base pairs of at least this probability are taken into account.

–max-bps-length-ratio=<factor>(0.0)

: Maximal ratio of #base pairs divided by sequence length. This serves as a second filter on the "significant" base pairs. [default: 0.0 = no effect].

-D, –max-diff-am=<diff>(-1)

: Maximal difference for sizes of matched arcs. [-1=off]

-d, –max-diff=<diff>(-1)

: Maximal difference for positions of alignment traces (and aligned bases). [-1=off]

–max-diff-at-am=<diff>(-1)

: Maximal difference for positions of alignment traces at arc match ends. [-1=off]

–max-diff-aln=<aln file>()

: Maximal difference relative to given alignment (file in clustalw format)

–max-diff-pw-aln=<alignment>()

: Maximal difference relative to given alignment (string, delim=AMPERSAND)

–max-diff-relax

: Relax deviation constraints in multiple aligmnent

–min-trace-probability=<probability>(1e-4)

: Minimal sequence alignment probability of potential traces (probability-based sequence alignment envelope) [default=1e-4].

Special sauce options:

–kbest=<k>(-1)

: Enumerate k-best alignments

–better=<t>(-1000000)

: Enumerate alignments better threshold t

MEA score:

–mea-alignment

: Perform maximum expected accuracy alignment (instead of using the default similarity scoring).

–match-prob-method=<int>(0)

: Select method for computing sequence-based base match probablities (to be used for mea-type alignment scores). Methods: 1=probcons-style from HMM, 2=probalign-style from PFs, 3=from PFs, local

–probcons-file=<file>

: Read parameters for probcons-like calculation of match probabilities from probcons parameter file.

–temperature-alipf=<int>(300)

: Temperature for the */sequence* alignment/ partition functions used by the probcons-like sequence-based match/trace probability computation (this temperature is different from the \'physical\' temperature of RNA folding!).

–pf-struct-weight=<weight>(200)

: Structure weight in PF computations (for the computation of sequence-based match probabilties from partition functions).

–mea-gapcost

: Use gap cost in mea alignment

–mea-alpha=<weight>(0)

: Weight alpha for MEA

–mea-beta=<weight>(200)

: Weight beta for MEA

–mea-gamma=<weight>(100)

: Weight gamma for MEA

–probability-scale=<scale>(10000)

: Scale for probabilities/resolution of mea score

–write-match-probs=<file>

: Write match probs to file (don\'t align!).

–write-trace-probs=<file>

: Write trace probs to file (don\'t align!).

–read-match-probs=<file>

: Read match probabilities from file.

–write-arcmatch-scores=<file>

: Write arcmatch scores (don\'t align!)

–read-arcmatch-scores=<file>

: Read arcmatch scores.

–read-arcmatch-probs=<file>

: Read arcmatch probabilities (weighted by factor mea_beta/100)

Constraints:

–noLP

: Disallow lonely pairs in prediction and alignment.

–maxBPspan=<span>(-1)

: Limit maximum base pair span [default=off].

–relaxed-anchors

: Use relaxed semantics of anchor constraints [default=strict semantics].

Input files:

The tool is called with two input files <Input 1> and <Input 2>, which specify the two input sequences or input alignments. Different input formats (Fasta, Clustal, Stockholm, LocARNA PP, ViennaRNA postscript dotplots) are accepted and automatically recognized (by file content); the two input files can be in different formats. Extended variants of the Clustal and Stockholm formats enable specifying anchor and structure constraints.

DISCLAIMER

For many purposes, it is more convenient to use the multiple alignment tool mlocarna (even for pairwise alignment). However, certain tasks –like aligning two specific alignments– are supported only by the pairwise tool or can be better controlled. Note that the performance of locarna (as well as basically all tools in the LocARNA package) is often significantly improved by the use of suitable application-specific options, deviating from the default settings.

REFERENCES

If you use locarna please cite us:

Sebastian Will, Kristin Reiche, Ivo L. Hofacker, Peter F. Stadler, and Rolf Backofen. Inferring non-coding RNA families and classes by means of genome-scale structure-based clustering. PLOS Computational Biology, 3 no. 4 pp. e65, 2007. doi:10.1371/journal.pcbi.0030065

Sebastian Will, Tejal Joshi, Ivo L. Hofacker, Peter F. Stadler, and Rolf Backofen. LocARNA-P: Accurate boundary prediction and improved detection of structural RNAs. RNA, 18(5):900???14, 2012. doi:10.1261/rna.029041.111

AVAILABILITY

The latest LocARNA package release is available online at at Github https://github.com/s-will/LocARNA and http://www.bioinf.uni-freiburg.de/Software/LocARNA/

EXAMPLES

In the simplest case, the tool is called with two sequences in fasta format or two alignments in multiple fasta, clustal or stockholm format like

locarna file1.fa file2.fa

or

locarna file1.aln file2.aln

Note that input formats can be mixed like in

locarna file1.aln file2.stk

Constraint Examples

Anchor and structure constraints can be specified in extended versions of the Clustal format, in the LocARNA PP 2.0 format, as well as in Stockholm format. Currently, the pairwise alignment tools of the package do not support constraints in fasta-like input. Here is an example of constraints in Clustal format:

CLUSTAL W

vhuU            AGCUCACAACCGAACCCAUUUGGGAGGUUGUGAGCU
fruA            CC-UCGAGGG-GAACCCGAAA-GGGACCCGAGA-GG
#S              (<<<<<<<<<......xxxx...............)
#A1             .............AAABB..................
#A2             .............12312..................

The syntax (and semantic) of structure constraint strings (prefixed by #S) is the one of RNAfold of the ViennaRNA package. Moreover, fixed structures prefixed by #FS are accepted; fixed structures can contain pseudoknots encodes by different bracket symbols.

Anchors are specified by naming columns, where names can consist of several places, in the example each name consists of two characters, such that the names are A1, A2, A3, B1, B2 for the respective columns.

Constraints in PP format are specified in the same way; however, in Stockholm format we use different prefixes, such that the example would look like

# STOCKHOLM 1.0

vhuU            AGCUCACAACCGAACCCAUUUGGGAGGUUGUGAGCU
fruA            CC-UCGAGGG-GAACCCGAAA-GGGACCCGAGA-GG
#=GC cS         (<<<<<<<<<......xxxx...............)
#=GC cA1        .............AAABB..................
#=GC cA2        .............12312..................

The prefix for fixed structures is \'#=GC cFS\'.

AUTHOR

This man page is written and maintained by Sebastian Will. It is part of the LocARNA package.

REPORTING BUGS

Report bugs to <will (at) informatik.uni-freiburg.de>.

COPYRIGHT

Copyright 2005- Sebastian Will. The LocARNA package is released under GNU Public License v3.0

SEE ALSO

The LocARNA PP 2.0 format is described online at http://www.bioinf.uni-freiburg.de/Software/LocARNA/PP/