Using the snoscan Web Server

The snoscan server is accessed via the Lowe Lab Webserver Interface at  http://lowelab.ucsc.edu/snoscan/.

User Interface

The snoscan interface consists of four major components:
* Search mode selection
* Query sequence selection
* Target sequence selection
* Configuration of search-mode and options for displaying results

The search mode determines which probabilistic model to use in searches – each model is based on snoRNA training data from selected species or phylogenetic groups (i.e., mammals, yeasts, archaea). If no explicit model for the species of interest is available in the user interface, specifying either a general model or a model from a related species generally yields good results. Different search modes can offer varying speed and sensitivity. 

Query sequence selection is used to specify the sequences to be searched for snoRNAs. Raw or formatted sequence data can be pasted directly into the query sequence box or can be uploaded from a local file. Links to sample query sequence data are also available for demonstration purposes. snoscan also expects “target sequences” – i.e. sequences that may base-pair and be modified by the query snoRNA sequence. Preloaded target sequences may be chosen, including rRNA from human, yeast, and other model organisms.  Alternatively, the user can specify a custom target RNA sequence. As with the query sequence, a custom target sequence can be pasted into a box in either raw or formatted form, or can be uploaded from a file. When using a custom target sequence, by default, every nucleotide in the sequence is treated as a potential target.  Alternatively, the user can specify a subset of the target sequence nucleotides by uploading a custom “methylation file” that indicates which nucleotides to use as target sites. Sample human and yeast methylation files are included on the server.  When methylation positions are known, restricting the search space to these known target sites has the advantage of decreasing search time and the number of false positive “hits”.

The server also has a set of program-specific search and output-display options such as limits on the distances between some of the sequence motifs (e.g. C and D boxes). In addition, the server has an adjustable cutoff score enabling tradeoffs between scan sensitivity and specificity. In most cases, the default parameter choices will be satisfactory and should be selected – especially by new users. However, more experienced users are able to exert some control over the program’s results by manipulating these parameters.

Output format

The snoscan output consists of a summary information line for each predicted C/D box snoRNA sequence, followed by the candidate in FASTA format.  The summary listing for each hit includes:

* Query sequence name and snoRNA start and end positions within the query sequence
* snoscan overall bit score
* Target sequence name and target methylation position
* Total number of base pairings and mismatches in the guide region
* Whether the guide region is adjacent to the D' box or D box
* The length of the candidate subsequence
* Whether or not a terminal stem was detected
 
Also included in the display are graphical representations of the base-pairing in the target-guide region and the secondary structure of the terminal stem motif. Snoscan scores for known snoRNA sequences for various species are available on the website for comparison.

Sample (abbreviated) snoscan Output

>> snR24  26.40  (1-87)  Cmpl: ySc-25S-Am1447 (U24)  12/0 bp  Gs-DpBox: 18 (18)  Len: 87  TS
Meth site found: 1447 (U24)  Guide Seq Sc: 11.88  (21.36 -1.12 -7.36 -1.00)
                     *                                       
Db seq:  5'-      AGUAGCAAAUAU -3'   ySc-25S    (1444-1456)
                  ||||||||||||
Qry seq: 3'- AGACUUCAUCGUUUAUA -5'     snR24    (29-18)

Terminal stem:+- [C Box]-N- ACU - 5'          Stem Sc: 0.84 (3 bp)
              |             |||
              +---[D Box] - UGAA - 3'          Stem Transit Sc: -1.11

>Summary      [ C Box ] --         -- [ Cmpl/ Mism ]  X [D'Bx] --       -- [D Bx]  Length
>Meth Am1447  [AUGAUGU] --   6 bp  -- [  12 / 0    ]  1 [CAGA] -- 47 bp -- [CUGA]   87 bp
>Sc    26.40  [  7.48 ] --  -1.59  -- [ 21.36 bits ]    [3.94]    -2.44    [8.05]

Candidate sequence:
>snR24  26.40  (1-87)  Cmpl: ySc-25S-Am1447  Len: 87
TCAAATGATGTAATAACATATTTGCTACTTCAGATGGAACTTTGAGTTCCGAATGAGACA
TACCAATTATCACCAAGATCTCTGATG

The snoscan output consists of a summary text information line for each predicted C/D box snoRNA (starting with ">>""), followed by other information, including graphical representations of base-pairing in the target-guide region and the terminal stem motif. A sample header line is labelled below:
    (1)      (2)     (3)           (4)      (5)   (6)    (7)      (8)      (9) (10)     (11) (12)
>> My-query 26.40 (11-97) Cmpl: ySc-25S-Am1447 (U24) 12/0 bp Gs-DpBox: 28 (18) Len: 87 TS
Key:
  1. Query sequence name
  2. Snoscan overall score (in bits)
  3. Start and end coordinates of the predicted snoRNA within the query sequence
  4. Target sequence name (ySc-25S in this example)
  5. Target methylation nucleotide and position in target sequence (A at position 1447 here)
  6. If the methylation site matches a position in methylation file, this is the annotation provided for that methylaiton site (in this case, this site is annotated as known to be guided by U24). If no methylation file was provided, or there is no site match in the methylation file, this will appear as a dash "-".
  7. Total number of base pairings / mismatches in the guide region (G-U pairs count as a base pair)
  8. Whether the guide region is adjacent to the D' box ("Gs-DpBox") or D box ("Gs-D box")
  9. Position of the start of the guide region in the snoRNA candidate (relative to the entire query)
  10. Position of the start of the guide region in the snoRNA candidate (relative to beginning of the snoRNA hit)
  11. The length of the candidate snoRNA
  12. Whether or not a terminal stem was detected (TS=terminal stem present, blank=not present)

An example of the middle part of the output for each hit follows below, and is fairly self-explanatory. Abbreviations: "Db seq" = query sequence, "Sc" = bit score for that feature of the model, "Meth site found" means that this position matches a position found in the methylation file, and "(U24)" is the annotation provided for this site within the methylation file (same as described above). Also note that an asterisk (*) appears above the nucleotide predicted to be methylated by this snoRNA candidate:
Meth site found: 1447 (U24)     Guide Seq Sc: 11.88  (21.36 -1.12 -7.36 -1.00)

*
Db seq: 5'- AGUAGCAAAUAU -3' ySc-25S (1444-1456)
||||||||||||
Qry seq: 3'- AGACUUCAUCGUUUAUA -5' My-query (39-28)

Terminal stem: +-[C Box] -N- ACU - 5' Stem Sc: 0.84 (3 bp)
| |||
+---[D Box] - UGAA - 3' Stem Transit Sc: -1.11

The next part of the output is a graphical summary of the same information above, where
>Summary      [ C Box ] --         -- [ Cmpl/ Mism ]  X [D'Bx] --       -- [D Bx]  Length
>Meth Am1447 [AUGAUGU] -- 6 bp -- [ 12 / 0 ] 1 [CAGA] -- 47 bp -- [CUGA] 87 bp
>Sc 26.40 [ 7.48 ] -- -1.59 -- [ 21.36 bits ] [3.94] -2.44 [8.05]
And finally, the candidate sequence is given in FASTA format:
Candidate sequence:
>snR24 26.40 (1-87) Cmpl: ySc-25S-Am1447 Len: 87
TCAAATGATGTAATAACATATTTGCTACTTCAGATGGAACTTTGAGTTCCGAATGAGACA
TACCAATTATCACCAAGATCTCTGATG

Further information

The snoscan algorithm is described in:
        Lowe, T.M. & Eddy, S.E. (1999) "A computational screen for methylation guide snoRNAs in yeast", Science 283:1168-71

Additional information can be found in the documentation to the stand-alone version of the program available at:
http://lowelab.ucsc.edu/software/snoscan.tar.gz