| tRNAscan-SE Output Legend and Search Methods |
| [GtRNAdb Home] [tRNAscan-SE Server] [Lowe Lab] [Eddy Lab] [Other Links] |
Sequence tRNA Bounds tRNA Anti Intron Bounds Cove Name tRNA # Begin End Type Codon Begin End Score -------- ------ ----- --- ---- ----- ----- ----- ----- CELF22B7 1 12619 12738 Leu CAA 12657 12692 60.01 CELF22B7 2 19480 19561 Ser AGA 0 0 80.44 CELF22B7 3 26367 26439 Phe GAA 0 0 80.32 CELF22B7 4 26992 26920 Phe GAA 0 0 80.32 CELF22B7 5 23765 23694 Pro CGG 0 0 75.76
Each new tRNA in a sequence is consecutively numbered in the "tRNA #" column. "tRNA Bounds" specify the starting (5') and ending (3') nucleotide bounds for the tRNA. tRNAs found on the reverse (lower) strand are indicated by having the Begin (5') bound greater than the End (3') bound (see tRNAs #4 & #5 in output above).
The "tRNA Type" is the predicted amino acid charged to the tRNA molecule based on the predicted "Anticodon" (written 5'->3') displayed in the next column. tRNAs that fit criteria for potential pseudogenes (poor primary or secondary structure, see Pseudogene Detection), will be marked with "Pseudo" in the "tRNA Type" column. If there is a predicted intron in the tRNA, the next two columns indicate the nucleotide bounds. If there is no predicted intron, both of these columns contain zero. The final column is the Cove score for the tRNA in bits. Note that this score will vary somewhat depending on the particular tRNA covariance model used in the analysis (the search mode selects which tRNA covariance model will be used: eukaryote-specific, prokaryote-specific, archae-specific, or general). tRNAscan-SE counts any sequence that attains a score of >= 20.0 bits as a tRNA (based on empirical studies conducted by Eddy & Durbin, 1999).
CELF22B7.trna4 (26992-26920) Length: 73 bp
Type: Phe Anticodon: GAA at 34-36 (26959-26957) Score: 73.88
* | * | * | * | * | * | * |
Seq: GCCTCGATAGCTCAGTTGGGAGAGCGTACGACTGAAGATCGTAAGGtCACCAGTTCGATCCTGGTTCGGGGCA
Str: >>>>>>>..>>>>........<<<<.>>>>>.......<<<<<.....>>>>>.......<<<<<<<<<<<<.
| | | | | | | || |
+-----+ +--------------+ +---------------+ +---------------++-----+
| D-stem/loop Anticodon TPC stem/loop |
| stem/loop |
+----------------------------------------------------------------+
Isoacceptor stem
tRNAscan-SE does no tRNA detection itself, but instead combines the strengths of three independent tRNA prediction programs by negotiating the flow of information between them, performing a limited amount of post-processing, and outputting the results. The program works in three main phases. In the first stage, it runs two independent tRNA detection programs on the input DNA sequence. These relatively fast, first-pass detection programs include a modified, optimized version of tRNAscan 1.3 (1), and EufindtRNA, an implementation of another tRNA search algorithm previously described (3).
tRNAscan 1.3 detects tRNAs by initially looking for short, well conserved intragenic promoter sequences (A & B boxes in eukaryotes) found in the TPC and D arm regions of prototypic tRNAs. Once a specific number of nucleotides in the sequence match the consensus promoter (defined by an arbitrary score threshold), the program then progressively attempts to identify the various stem-loop structures found in the tRNA "clover leaf". As each arm is identified by the presence of base-pairing in the stem, correct loop size, and several invariant and semi-invariant bases, a "general score" counter is incremented. If the final score exceeds an empirically determined threshold, the tRNA location, anticodon, and type are saved.
EufindtRNA, on the other hand, only searches for linear sequence signals. A step-wise algorithm uses newly developed log-odds score matrices to first identify A and B box promoter elements that exceed an empirically determined cutoff. The scores for these A and B boxes are then added to a log odds score for the nucleotide distance between the A and B boxes to produce an intermediate score. Finally, a log odds score for the distance to the nearest downstream poly-T pol III termination signal is added to the intermediate score to obtain a final score. If the final score is above a final score cutoff, the tRNA identity and location is saved. tRNAscan-SE uses a less selective version of this algorithm that does not look for pol III termination signals, thus uses the intermediate score as a final cutoff. Also, the intermediate score cutoff is loosened slightly relative to the intermediate cutoff described in the original algorithm (3). These modifications increase the algorithm's sensitivity but greatly reduce EufindtRNA's selectivity. This does not reduce the final selectivity of tRNAscan-SE since a secondary filter (Cove) is being used to eliminate false positives. The sensitivity of EufindtRNA is roughly comparable to tRNAscan 1.3, but it appears to be complementary in that EufindtRNA tends to identify tRNAs missed by tRNAscan 1.3 and vice versa (3). tRNAscan-SE takes advantage of this fact, and saves results from both tRNAscan 1.3 and EufindtRNA, then merges them into one list of non-redundant "candidate" tRNA identifications.
In the second stage, tRNAscan-SE extracts the DNA subsequences identified as possible tRNAs and passes only these segments to an RNA search program in the Cove program suite (covels) for analysis. Cove programs look for tRNAs in a very different way. A probabilistic model for tRNA has been developed by aligning known tRNAs and giving a base-specific probability score to every nucleotide in the tRNA model. Also, Cove uses a special method for capturing secondary RNA structure information using a type of language referred to as a stochastic context-free grammar (SCFG). Cove applies this probabilistic model to the entire windowed sequence, and produces a probability score that the sequence matches the tRNA model. If the score exceeds 20.0 bits, the tRNA is considered a true tRNA (based on empirical studies in ref. 2).
In the final phase, tRNAscan-SE takes those tRNAs confirmed as such
and runs another Cove program (coves) that displays RNA secondary
structure. The tRNA type is predicted by identifying the anticodon
within the structure output. Introns are also automatically
identified from the structure output as runs of five or more
consecutive non-consensus nucleotides within the anticodon loop.
tRNAscan-SE uses heuristics to try to distinguish pseudogenes from
true tRNAs, primarily on lack of tRNA-like secondary structure. A
second tRNA covariance model was created from the original 1415-tRNA
alignment, under the constraint that no secondary structure is
conserved (this model is effectively just a sequence profile, or
hidden Markov model (HMM)). By subtracting a tRNA's similarity score
to the primary structure-only model ("HMM Score" column) from that
using the complete tRNA model, a secondary structure-only score
("2'Str Score" column) is obtained. We have observed that tRNAs with
low scores for either component of the total score were often
pseudogenes. Thus, tRNAs are marked as likely pseudogenes if they
have either a score of less than 10 bits for the primary sequence
component of the total score (HMM Score), or a score of less than 5
bits for the secondary structure component (2'Str Score) of the total
score. Selenocysteine tRNAs are not checked by these rules since they
have atypical primary and secondary structure. Also, use of the -O
option (search for organellar tRNAs) disables pseudogene checking
since these criteria are geared towards detecting cytoplasmic
pseudogenes (some true non-eukaryotic tRNA are marked as pseudogenes
by this analysis).
For more details on the program algorithm & implementation, see
the Nucleic Acids Research paper (Lowe
& Eddy, 1997).
Pseudogene Detection