LocARNA-2.0.0
Public Types | Public Member Functions | Protected Member Functions | Protected Attributes | Friends | List of all members
LocARNA::RnaData Class Reference

represent sparsified data of RNA ensemble More...

#include <rna_data.hh>

Inheritance diagram for LocARNA::RnaData:
Inheritance graph
[legend]
Collaboration diagram for LocARNA::RnaData:
Collaboration graph
[legend]

Public Types

typedef SparseMatrix< double > arc_prob_matrix_t
 arc probability matrix
 
typedef size_t size_type
 usual size type
 

Public Member Functions

 RnaData (const RnaEnsemble &rna_ensemble, double p_bpcut, double max_bps_length_ratio, const PFoldParams &pfoldparams)
 Construct from RnaEnsemble with cutoff probability. More...
 
 RnaData (const std::string &filename, double p_bpcut, double max_bps_length_ratio, const PFoldParams &pfoldparams)
 Construct from file. More...
 
 RnaData (const RnaData &rna_dataA, const RnaData &rna_dataB, const Alignment &alignment, double p_expA, double p_expB, bool only_local=false)
 Construct as consensus of two aligned RNAs. More...
 
virtual ~RnaData ()
 destructor
 
const Sequencesequence () const
 Get the multiple alignment as sequence. More...
 
const MultipleAlignmentmultiple_alignment () const
 Get the multiple alignment. More...
 
size_type length () const
 Get the sequence length. More...
 
double arc_cutoff_prob () const
 Get base pair cutoff probability. More...
 
double arc_prob (pos_type i, pos_type j) const
 Get arc probability. More...
 
std::string mea_structure (double gamma=1.) const
 maximum expected accuracy structure More...
 
std::unique_ptr< vrna_plist_t[]> plist () const
 Construct plist (pair list of Vienna RNA) More...
 
double joint_arc_prob (pos_type i, pos_type j) const
 Get arc probability. More...
 
double stacked_arc_prob (pos_type i, pos_type j) const
 Get arc probability. More...
 
double prob_paired_upstream (pos_type i) const
 Probability that a position is paired upstream. More...
 
double prob_paired_downstream (pos_type i) const
 Probability that a position is paired downstream. More...
 
double prob_unpaired (pos_type i) const
 Unpaired probability. More...
 
std::ostream & write_pp (std::ostream &out, double p_outbpcut=0) const
 
std::ostream & write_size_info (std::ostream &out) const
 Write object size information. More...
 
bool has_stacking () const
 Availability of stacking terms. More...
 
void set_anchors (const SequenceAnnotation &anchors)
 Write access to alignment anchors. More...
 

Protected Member Functions

 RnaData (double p_bpcut, size_t max_bp_span)
 Almost empty constructor. More...
 
const arc_prob_matrix_tarc_probs () const
 arcs with probability above cutoff
 
virtual void init_from_fixed_structure (const RnaStructure &structure, const PFoldParams &pfoldparams)
 initialize from fixed structure More...
 
virtual void init_from_rna_ensemble (const RnaEnsemble &rna_ensemble, const PFoldParams &pfoldparams)
 initialize from rna ensemble More...
 
bool read_autodetect (const std::string &filename, const PFoldParams &pfoldparams)
 read and initialize from file, autodetect format More...
 
virtual bool inloopprobs_ok () const
 check in loop probabilities More...
 
virtual void read_pp (const std::string &filename)
 
virtual std::istream & read_pp (std::istream &in)
 
void read_old_pp (const std::string &filename)
 
void read_ps (const std::string &filename)
 

Protected Attributes

std::unique_ptr< RnaDataImplpimpl_
 pointer to corresponding implementation object
 

Friends

class RnaDataImpl
 
class ExtRnaDataImpl
 

Detailed Description

represent sparsified data of RNA ensemble

knows sequence, cutoff probability and base pair probabilities greater than the cutoff probability; potentially knows stacking probabilities

Note
This class knows the cutoff probability of its data. This cutoff can be different from the cutoff in classes like BasePairs which defines the structure elements that are considered by algorithms.
the class guarantees that sequences are normalized (uppercase, T->U) even when read in unnormalized form, e.g. from file or stream

Constructor & Destructor Documentation

◆ RnaData() [1/4]

LocARNA::RnaData::RnaData ( const RnaEnsemble rna_ensemble,
double  p_bpcut,
double  max_bps_length_ratio,
const PFoldParams pfoldparams 
)

Construct from RnaEnsemble with cutoff probability.

Parameters
rna_ensembleRNA ensemble data
p_bpcutcutoff probability
max_bps_length_ratiomax ratio of bps to length (0=no effect)
pfoldparamsfolding parameters (controls stacking)
Note
RnaData copies all required data from rna_ensemble and does not keep a reference; if pfoldparams.stacking is true, copy stacking terms

◆ RnaData() [2/4]

LocARNA::RnaData::RnaData ( const std::string &  filename,
double  p_bpcut,
double  max_bps_length_ratio,
const PFoldParams pfoldparams 
)

Construct from file.

Parameters
filenameinput file name
p_bpcutcutoff probability
pfoldparamsfolding parameters
max_bps_length_ratiomaximal ratio of number of base pairs divided by sequence length. This serves as a second filter on the "significant" base pairs. Value 0 turns this filter off.
Note
autodetect format of input; for fa or aln input formats, predict base pair probabilities
Note
If probabilities have to be computed by folding the input sequence(s), folding is subject to the parameters pfoldparams. Also when reading ensemble probabilities from file, the outcome depends on settings in pfoldparams (stacking, max_bp_span). Without stacking, stacking probs are ignored and max_bp_span is used to filter the base pairs by their maximum span.

◆ RnaData() [3/4]

LocARNA::RnaData::RnaData ( const RnaData rna_dataA,
const RnaData rna_dataB,
const Alignment alignment,
double  p_expA,
double  p_expB,
bool  only_local = false 
)

Construct as consensus of two aligned RNAs.

Parameters
rna_dataARNA ensemble data A
rna_dataBRNA ensemble data B
alignmentAlignment of A and B
p_expAbackground probability for A
p_expBbackground probability for B
only_localif true, construct only local alignment

The object uses mean cutoff probability of the given objects; The background probability is used in computing consensus probabilities. If both input rna data objects have stacking probabilities, stacking consensus probabilities are computed as well. If the object contain sequence anchors, we construct the new object with a consensus anchor string. (The latter is done as part of the consensus sequence computation.)

◆ RnaData() [4/4]

LocARNA::RnaData::RnaData ( double  p_bpcut,
size_t  max_bp_span 
)
explicitprotected

Almost empty constructor.

Parameters
p_bpcutcutoff probability
max_bp_spanmaximum base pair span

Member Function Documentation

◆ arc_cutoff_prob()

double LocARNA::RnaData::arc_cutoff_prob ( ) const

Get base pair cutoff probability.

Returns
cutoff probability p_bpcut

◆ arc_prob()

double LocARNA::RnaData::arc_prob ( pos_type  i,
pos_type  j 
) const

Get arc probability.

Parameters
ileft sequence position
jright sequence position
Returns
probability p_ij of basepair (i,j) if p_ij>p_bpcut; otherwise, 0

◆ has_stacking()

bool LocARNA::RnaData::has_stacking ( ) const

Availability of stacking terms.

Returns
whether stacking terms are available

◆ init_from_fixed_structure()

void LocARNA::RnaData::init_from_fixed_structure ( const RnaStructure structure,
const PFoldParams pfoldparams 
)
protectedvirtual

initialize from fixed structure

Parameters
structurefixed structure
pfoldparamsfolding parameters
  • stacking: whether to initialize stacking terms
Note
can be overloaded to initialize with additional information (in loop probabilities)

Reimplemented in LocARNA::ExtRnaData.

◆ init_from_rna_ensemble()

void LocARNA::RnaData::init_from_rna_ensemble ( const RnaEnsemble rna_ensemble,
const PFoldParams pfoldparams 
)
protectedvirtual

initialize from rna ensemble

Parameters
rna_ensemblerna ensemble
pfoldparamsfolding parameters
  • stacking: whether to initialize stacking terms
Note
can be overloaded to initialize with additional information (in loop probabilities)
this method never removes lonely or too long base pairs (according to noLP or maxBPspan, resp.)

Reimplemented in LocARNA::ExtRnaData.

◆ inloopprobs_ok()

virtual bool LocARNA::RnaData::inloopprobs_ok ( ) const
inlineprotectedvirtual

check in loop probabilities

Returns
true iff loop probabilities are available or not required
Note
use to indicate the need for recomputation in read_autodetect(); always true in RnaData

Reimplemented in LocARNA::ExtRnaData.

◆ joint_arc_prob()

double LocARNA::RnaData::joint_arc_prob ( pos_type  i,
pos_type  j 
) const

Get arc probability.

Parameters
ileft sequence position
jright sequence position
Returns
joint probability p^(2)_ij of basepair (i,j) and (i+1,j-1) if p_i+1j-1>p_bpcut and p^(2)_ij > p_bpcut; otherwise, 0

◆ length()

size_type LocARNA::RnaData::length ( ) const

Get the sequence length.

Returns
length of RNA sequence

◆ mea_structure()

std::string LocARNA::RnaData::mea_structure ( double  gamma = 1.) const

maximum expected accuracy structure

Parameters
gamma
Returns
maximum non-crossing expected accuracy structure

Works as interface to the RNAlib function MEA. From the ViennaRNA docu: Each base pair (i,j) gets a score 2*gamma*p_ij and the score of an unpaired base is given by the probability of not forming a pair (compare RNAfold)

◆ multiple_alignment()

const MultipleAlignment & LocARNA::RnaData::multiple_alignment ( ) const

Get the multiple alignment.

Returns
multiple alignment

◆ plist()

std::unique_ptr< vrna_plist_t[]> LocARNA::RnaData::plist ( ) const

Construct plist (pair list of Vienna RNA)

Note
the plist has to be deleted by the caller
Returns
unique pointer to plist

◆ prob_paired_downstream()

double LocARNA::RnaData::prob_paired_downstream ( pos_type  i) const

Probability that a position is paired downstream.

Parameters
isequence position
Returns
probability that a position i is paired with a position j<i (downstream)
Note
O(sequence.length()) implementation
See also
prob_paired_upstream

◆ prob_paired_upstream()

double LocARNA::RnaData::prob_paired_upstream ( pos_type  i) const

Probability that a position is paired upstream.

Parameters
isequence position
Returns
probability that a position i is paired with a position j>i (upstream)
Note
O(sequence.length()) implementation
See also
prob_paired_downstream

◆ prob_unpaired()

double LocARNA::RnaData::prob_unpaired ( pos_type  i) const

Unpaired probability.

Parameters
isequence position
Returns
probability that a position i is unpaired
Note
O(sequence.length()) implementation

◆ read_autodetect()

bool LocARNA::RnaData::read_autodetect ( const std::string &  filename,
const PFoldParams pfoldparams 
)
protected

read and initialize from file, autodetect format

Parameters
filenamename of input file
pfoldparamsfolding parameters
  • stacking: whether to initialize stacking terms
  • max_bp_span: maximum base pair span
Returns
whether probabilities were read completely
Note
: this method is designed such that it can be used for RnaData and ExtRnaData
the method delegates actual reading to methods read_pp(), read_old_pp(), read_ps(), and the MultipleAlignment class.
when reading in, base pairs exceeding max_bp_span_ or structure information below the probability thresholds are ignored (which is -in part- job of the delegates).

◆ read_old_pp()

void LocARNA::RnaData::read_old_pp ( const std::string &  filename)
protected

Read data in the old pp format

Parameters
filenamename of input file

Reads only base pairs with probabilities greater than p_bpcut_; reads stacking probabilities only if has_stacking_ is true

Note
the old pp format starts with the sequence/alignment and then simply lists the arcs (i,j) with their probabilities p and optionally stacking probabilities p2. pp-files contain entries i j p [p2]. p denotes the probabilitiy of base pair (i,j). The optional stacking probability p2 is the joint probability of base pairs (i,j) and (i+1,j+1).
handling of stacking: after the call, has_stacking_ is true only if the file specified at least one stacking probability and has_stacking_ was true before.

◆ read_pp() [1/2]

void LocARNA::RnaData::read_pp ( const std::string &  filename)
protectedvirtual

Read data in pp format 2.0

Parameters
filenamename of input file

Reads only base pairs with probabilities greater than p_bpcut_; reads stacking probabilities only if has_stacking_ is true

Note
can be overloaded to read extension sections
pp is a proprietary format of LocARNA. In its simplest version, it starts with the sequence/alignment and then simply lists the arcs (i,j) with their probabilities p and optionally stacking probabilities p2. pp-files contain entries i j p [p2]. p denotes the probabilitiy of base pair (i,j). The optional stacking probability p2 is the joint probability of base pairs (i,j) and (i+1,j+1).
handling of stacking: after the call, has_stacking_ is true only if the file specified the STACK keyword and has_stacking_ was true before.

◆ read_pp() [2/2]

std::istream & LocARNA::RnaData::read_pp ( std::istream &  in)
protectedvirtual

Read data in pp format 2.0

Parameters
ininput stream
See also
read_pp(std::string)

Reimplemented in LocARNA::ExtRnaData.

◆ read_ps()

void LocARNA::RnaData::read_ps ( const std::string &  filename)
protected

Read data in Vienna's dot plot ps format

Parameters
filenamename of input file

Reads only base pairs with probabilities greater than p_bpcut_; reads stacking probabilities only if has_stacking_ is true

Note
reads sequence name from file (instead of guessing from filename!); stacking probabilities are read if available (then, sets has_stacking_ to true)
throws wrong_format exception if not in ps format

sequence characters should be upper case, and Ts translated to Us

◆ sequence()

const Sequence & LocARNA::RnaData::sequence ( ) const

Get the multiple alignment as sequence.

Returns
sequence

◆ set_anchors()

void LocARNA::RnaData::set_anchors ( const SequenceAnnotation anchors)

Write access to alignment anchors.

Parameters
anchorsalignment anchors
See also
MultipleAlignment::set_annotation()

◆ stacked_arc_prob()

double LocARNA::RnaData::stacked_arc_prob ( pos_type  i,
pos_type  j 
) const

Get arc probability.

Parameters
ileft sequence position
jright sequence position
Returns
conditional probability p_ij|i+1j-1 of basepair (i,j) under condition of base pair (i+1,j-1) if p_i+1j-1>p_bpcut and p^(2)_ij > p_bpcut; throw exception if p_i+1j-1<=p_bpcut; otherwise, 0

◆ write_pp()

std::ostream & LocARNA::RnaData::write_pp ( std::ostream &  out,
double  p_outbpcut = 0 
) const

Write data in pp format

Parameters
outoutput stream
p_outbpcutcutoff probability
Returns
stream

Writes only base pairs with probabilities greater than p_outbpcut

◆ write_size_info()

std::ostream & LocARNA::RnaData::write_size_info ( std::ostream &  out) const

Write object size information.

Parameters
outoutput stream

Writes numbers of stored probabilities to stream


The documentation for this class was generated from the following files: