Ariadne tutorial 1: RNA identification

Size: px
Start display at page:

Download "Ariadne tutorial 1: RNA identification"

Transcription

1 Ariadne tutorial 1: RNA identification 0. Introduction In this tutorial, we are going to introduce how to characterize RNA MS/MS data by using the Ariadne server via the Internet. How do you start searching sequence database using MS/MS data will be explained in section 1. In section 2, how the user browses his/her search result interactively. Another tutorial on characterizing heavily-modified synthetic oligonucleotides for pharmaceutical purposes is available at the Ariadne site (ariadne.riken.jp/). A tutorial on Nucleic acid mass calculator and that on User-parts editor will also be available at the site soon. If you have any questions and/or comments, please feel free to contact us via (Ariadne_dev_team@riken.jp). Ariadne is a web service that assists researchers to identify RNAs in a sample and to characterize their post-transcriptional modifications by searching sequence database using MS/MS data. To identify RNAs in the sample, the software conducts a two-step searching algorithm, MS/MS ion search and Nucleotide mapping as shown in Figure 1. Figure 1. Schematic of Ariadne service 1

2 The Ariadne server is publicly available at url: ariadne.riken.jp/. The screen shot of the home page is shown in Figure 2. Figure 2. The homepage of Ariadne service at ariadne.riken.jp/ 2

3 Figure 3. The transition of html pages of Ariadne Since user interface to Ariadne is a web browser, the search is defined using a web form for an interactive search. In the form, a user can specify MS/MS data of the sample's 3

4 RNA(s), sequence database to be searched, which contains RNA sequence(s) and possible post-transcriptional modifications, and other search parameters. The query inputted in the form is uploaded to the Ariadne server in which they are processed to identify/characterize RNAs in the sample. Just after transferring the query, the server issues the search ID, which can be specified the search afterward. On completion of the search, the server will return the search results to the user's web browser, which shows an html report containing summary and detailed views of the results. 1. How to set up an Ariadne search? In this section, data, parameters and environment for search will be explained in detail. How to input peak lists, sequences, modifications and search parameters will be described in the section Requirements An Ariadne search needs at least a peak-list file containing MS/MS data in Mascot TM generic format (MGF) and a nucleotide sequence database. The sequence database to be searched can be inputted as a text in the text area field, or a Fasta file in the file upload field, or selected from preinstalled databases. If RNAs in the sample are expected to have post-transcriptional modifications, the search can also include the information on the modifications. Available modifications are listed on the Parts table page ( To identify RNA which was hydrolysed with an RNase, the parameter Enzyme must be specified as the enzyme used in the experiment. We highly recommend the latest version of Mozilla Firefox or Google Chrome when using Ariadne. Other web browsers have not been tested yet. Since Ariadne uses JavaScript to set up the web form and to show the result reports, allow JavaScript to run on the browser MS/MS peak list The peak list should be a text file containing the information on precursor ion's mass and charge as well as product ion's mass values and the corresponding intensities. Ariadne supports only MGF at present. An MGF file contains at least one MS/MS query unit. As shown in Figure 4, each unit begins with a line having only 'BEGIN IONS' statement and ends with another line having only 'END IONS'. The content between 'BEGIN IONS' and 'END IONS' includes the information on a precursor ion like 4

5 'CHARGE' and 'PEPMASS' (blue letters in Figure 4) and that on the corresponding product ions which are pairs of mass value and its intensity with tab- or space-delimited format (green letters in Figure 4). The charge value of a product ion can be optionally specified as a third column. See 'Data file format page' of Mascot ( for more information on MGF. Most MS vender s software tools support to export MS/MS data as MGF. Consult each vender's manual for details. Figure 4. An example of MGF file Lines between BEGIN IONS and END IONS represent a peak list for a single MS/MS measurement or for accumulated multiple MS/MS spectra with the same precursor Sequence database Ariadne allows two types of sequence databases. One is a user s sequence database, and the other is a preinstalled database. To use your own sequences as database, the upload search function is for you. The upload search supports user's sequence file in Fasta format ( as database. The symbols which can be used in a Fasta file are canonical ribonucleosides: adenosine (A), cytidine (C), guanosine (G) and uridine (U) as well as modified nucleosides (See Modifications). The sequence database can be directly inputted or pasted to the text area field of the search form or specified as a file in the file upload 5

6 field of the form. On the other hand, the genome search function is used for searching one of preinstalled databases. The available databases preinstalled on the server are eukaryotic genome sequences (Homo sapiens, Mus musculus, Saccharomyces cerevisiae and Schizosaccharomyces pombe), which are usually too large to upload to the server via the internet. At present, the genome search functionality is only for registered users. Please contact us via for requesting your user ID Modifications Users can incorporate modified residues into their search. Two types of modifications are considered in upload search: site-directed and variable modifications, and only variable modification in genome search. If you have information on modified specie(s) and position(s), use the site-directed modification. Otherwise use the variable modification. Both types can be used in a single upload search. All the nucleosides on publicly available databases such as 'MODOMICS' ( and 'The RNA modification database' ( are available for a search. To show the list of preinstalled nucleosides, press 'Show table' button of 'Select Mass table (optional)' section on the web form of upload search (See 1.2). Users can specify those modifications in a separate text file with extention.mods. A.mods file must consist of the site-directed and variable modification parts. The site-directed modification part for RNA should start with a line of the same description as the corresponding sequence in the accompanying Fasta file for a search. From the next line, two tab- or white space-delimited columns, position and nucleoside symbol at the position, should be represented in each line as shown below. An example of site-directed modification part of a.mod file >Thr_GGT_C AP Gammaproteobacteria Escherichia coli W3110 Thr GGT 16 D 17 D 20 D 37 m6t6a 6

7 46 m7g 54 m5u 55 Y >Gly_CCC_C AP Gammaproteobacteria Escherichia coli W3110 Gly CCC 8 s4u 19 D 52 m5u 53 Y Assume that the Fasta file contain the same accession / description line as the.mods file. If there are inconsistencies between both the files, the search will be failed. The variable modification part should start with a header line containing only 'max_var_mods' label and its value delimited with a tab or white space. The value represents a maximum number of modified residues in an (enzyme-cleaved) oligonucleotide. The recommended value is 1 or 2 although the maximum is 4. The header line is followed by nucleoside symbol lines. Each line should have only a single symbol as shown below. An example of variable modification part of a.mods file max_var_mods 2 ma mc mg mu Symbols for modified nucleosides can also be placed into sequences in a Fasta file for the site-directed modification. Symbols more than two characters have to be enclosed in parentheses if they are placed in the sequences as shown below. Modified nucleoside symbols consisting of one character like I and Y may be placed as it is or as enclosed in parentheses. Hence, both AYUG and A(Y)UG are acceptable and recognized as the same sequence. 7

8 Sequences with site-directed modifications in Fasta format > trna Ala AGC Saccharomyces cerevisiae cytosolic GGGCGUGU(m1G)GCGUAG(D)CGG(D)AGCGC(m2,2G)CUCCCUU(I)GC(m1I)(Y)GG GAGAGG(D)CUCCGG(m5U)(Y)CGAUUCCGGACUCGUCCACCA > trna Arg UCU Saccharomyces cerevisiae cytosolic GCUCGCGU(m1G)(m2G)CGUAA(D)GGCAACGC(m2,2G)(Y)CUGACU(mcm5U)CU(t6 A)A(Y)CAGAAGA(D)UAUGGG(m5U)(Y)CG(m1A)CCCCCAUCGUGAGUGCCA The site-directed modifications are considered to be fully modified by default. The software can generate a combination of partially modified sequences to search if 'partial modification' box in the form is checked (See 1.2.1; Figure 5) Other data and parameters The software also considers the information on expected structures of the RNAs including 5 and 3 termini of intact molecules, and their isotopic distribution. Search queries that relate with the experiment are also to be specified: the polarity, mass tolerances for MS1 and MS2, and precision of the result mass expressions. If some of them are not specified, a search will be done with default values for the corresponding queries (See for default queries). 1.2 Inputting MS/MS data, sequence database and search parameters to the web form Ariadne offers two search programs: upload search and genome search. Read to select an appropriate one. In this section, we are looking at upload search in and genome search in Several sample data and parameters are available at the Ariadne server Upload search Showing search form Click Try Ariadne search link on the top page (ariadne.riken.jp/), and the web form will appear as shown in Figure 5. We offer example data and parameter sets for evaluating the program. Those can be downloaded from Examples: mgf, fasta and mods... in the form. 8

9 Figure 5. Web form of upload search Loading MS/MS data Press 'Browse file...' button of 'Peak list file' section. Choose a MGF file you would like to search and then press OK button. If the MGF file is correctly loaded, the status turns into 'loaded'(figure 6). Figure 6. Loading MS/MS peak list file Loading sequence database and, if needed, modifications Press 'Browse file...' button of Fasta file of Sequences and modifications section of the form. Choose a fasta file you would like to search and press OK button. If the file is correctly loaded, the status turns into 'loaded' (green letters). Then, if necessary, 'Browse file...' button of Modification file of the same section (Figure 7). Choose a.mods 9

10 file you would like to use and press OK button. If the file is correctly loaded, the status turns into 'loaded'. If you would like to input sequences directly or paste them from another application, select input sequence directly with modification(s). Symbols having more than two letters should be enclosed in parenthesis (Figure 8). In this mode, since modifications have to be placed in the Fasta sequences, the variable modification is not available. Figure 7. Loading sequence and modification files Figure 8. Input sequence with site-directed modifications through text area of the form We summarize the ways for loading sequences and modifications in upload search in Table 1. A user can choose one of the 6 combinations. Table 1. 6 ways of loading sequences and modifications to upload search 10

11 Sequences Modifications Site-directed Variable 1 Fasta file Fasta file -.mod file 3 Modified sequence(s) in Fasta file - 4 Fasta file.mod file - 5 Fasta file.mod file.mod file 6 Modified sequence(s) in Fasta file.mod file 7 Text area Modified sequence(s) in Text area Specifying parameters Choose appropriate search parameters according to your experimental conditions. The detailed description and the default value of each parameter are shown in Table 2. Table 2. Search parameters for Ariadne search 11

12 Parameter name Explanation default value Enzyme The specificity of an endonuclease used in the experiment. RNase T1, RNase T2, colicin_e5, MazF, RNase A, RNase U2, RNase U2 + Bacterial alkaline phosphatase, no enzyme that does not cut any positions or unknown enzyme that cuts every positions are available. RNase T1 Max number of missed cleavages considered. Choose 1 or 2 if you include modifications in your Max missed cleavages search because some modified sites are hardly hydrolyzed by RNases. In the case of selecting 'noenzyme', this parameter does not work. 0 Intact 5 term 5 functional group of intact nucleic acids. Default value: hydroxy (OH) or phosphate (P). OH Intact 3 term 3 functional group of intact nucleic acids. Default value: hydroxy (OH), 2,3 -cyclic phosphate (cp) or phosphate (P). OH Mass tolerance Precursor ion's mass toletance in parts per million (ppm). 50 MS2 tolerance Product ion's mass toletance in ppm. 50 Polarity Positive or Negative. The program does not support data with polarity switching. negative base_number Lower limit of oligonucleotide length considered in MS/MS ion search. Lower limit of oligonucleotide length considered in reject_length_for_mapping nucleotide mapping. If there are many short nucleotides in your sample (ex. RNase A digest), smaller number should be specified. Ariadne calculates the mass values of RNA based on the mass table specified. This parameter especially useful for the characterization of site-specific stable isotope labeled RNAs. The content of the table will appear when Show table button is pressed. At present, Mass table default (natural isotope distribution; non-labeled), default 13C10_G for SILNAS* with RNase T1, 13C10_A, 13C9_C, 13C9_U, 13C9_CU, 5D_CU or 56D2_CU for pseudouridine identification are available. If you would like to use another mass table for a specific isotope labeling, please contact us via Specificities for each Enzyme are summarized in Table 3. Table 3. Specificities for available RNases 12

13 Enzyme name Specificity 5 terminus 3 terminus RNaseT1 G ^ N OH cp or P RNaseA [C U]* ^ N OH cp or P RNaseT2 C ^ [A G U] OH cp or P RNaseU2 [A G] ^ N OH cp or P RNaseU2+BAP [A G] ^ N OH OH colicin_e5 G ^ U OH cp or P MazF N ^ AC OH cp or P unknownenzyme N ^ N OH or P OH, cp or P noenzyme ^: cleavage site *: [C U] means C or U cp: 2,3 -cyclic phosphate P: phosphate Starting the search After all data and parameters above are set, an upload search can be started by pressing 'Search' button at the bottom of the web form (Figure 5). Searching a large sequence database with a lot of modifications will take some amount of time. On completion of the search, the browser window is updated to show an html search report. See section 2 to browse/interpret the results Genome search Showing search form Click Go to search of For registered user section at the top page (ariadne.riken.jp/) and input your ID and password into the dialog box for authentication. If you do not have ID, contact us to issue your own. Click Genome search hyperlink, and the web form will appear as shown in Figure 9. 13

14 Figure 9. Web form of genome search Loading MS/MS data Press 'Browse file...' button of 'Peak list file' section (Figure 9). Choose a MGF file you would like to search and then press OK button. If the MGF file is correctly loaded, the status turns into 'Loaded'(Figure 10). Figure 10. Loading MS/MS peak list file Selecting sequence database and modifications Currently 4 eukaryotic genome databases are available: S. cerevisiae genome, S. pombe genome, M. musculus genome, and H. sapiens genome. Select a database from the web form. If you would like to use another genome database, please contact us via . Variable modifications considered in the genome search are methylation to 4 canonical nucleosides and reduction of uridine to dihydrouridine. To search using modified RNAs, specify the Max mods parameter to 1 or 2. 14

15 Specifying parameters The detailed descriptions of parameters are shown in Table 2 in For genome search, however, there are several limitations as described below. Enzyme only RNase T1 is supported. Max mods Maximum number of modifications (methylation and reduction of uridine in an enzyme-digested oligonucleotide). Choose number from select box. Maximum number is 2. reject_length_for_mapping Lower limit of oligonucleotide length considered in nucleotide mapping. For genome search, 5 or bigger is recommended Starting the search After all data and parameters above are set, a genome search can be started by pressing 'Search' button at the bottom of the web form (Figure 9). Searching a large genome sequence database with a lot of modifications will take some amount of time. On completion of the search, the browser window is updated to show an html search report. See section 2 to browse/interpret the results. 2. How to browse / interpret an Ariadne RNA search result? 2.1 Showing your search result You can see a specific search result by inputting its search ID into the Browse search result section of the home page. A search ID is issued when the search is correctly accepted. Please write down the ID. If you have user ID, the list view of the search results is available after basic authentication. Click For Registered User link on the top page (ariadne.riken.jp/). Click one of search IDs on the list, and the new window or browser tab will be opened to show 'Results view' of the search. 2.2 Browsing Results view page As shown in Figure 11, 'Results view' has three parts: 'search parameter', 'Nucleotide mapping', and 'MS/MS ion search'. The topmost Search parameter part represents main search parameters used by the search. All parameters can be seen by clicking 'more' link (Figure 12). The second part represents 'Nucleotide mapping' result consisting of Score histogram (Figures 13), Identification summary (Figures 14) and Oligonucleotide-Nucleic acid matrix (Figures 15). The third part exhibits a list of identified oligonucleotides by MS/MS ion search (Figure 16). 15

16 Figure 11. An example of Results view page 16

17 Figure 12. Figure 13. An example of score histogram for nucleotide mapping Figure 14. An example of Identification summary for nucleotide mapping 17

18 Figure 15. An example of oligonucleotide-rna matrix Figure 16. An example of matched nucleotide list Nucleotide mapping Score histogram The histogram represents the statistics of nucleotide mapping score for the search. The x-axis is nucleotide mapping score, and the y-axis is logarithm of numbers of identified RNA regions having the nominal score (Figure 13). Actually the y-value is log(n + 1) instead of logn because, if there is only one RNA (N = 1) in a nominal score, logn = 0; 18

19 where N is the number of the region Identification summary The identification summary is a table which represents the number for the identified sequence region of the RNA, its description, its nucleotide mapping score, matched oligonucleotide number by MS/MS ion search, total oligonucleotide number which is obtained by in silico cleavage of the specified RNase, the position list of the matched oligonucleotides, and the sequence list of the matched oligonucleotides. The results are sorted by the descending order of the nucleotide mapping score. When clicking a region number of RNA on the left most RNA column, Mapping results view that shows detailed results of nucleotide mapping of the corresponding RNA will appear Oligonucleotide-Nucleic acid matrix This matrix shows relationship between the oligonucleotides identified by the MS/MS ion search and the RNAs identified by the nucleotide mapping. A typical matrix is shown in Figure 15. Each row represents statistics of how many times the identified oligonucleotide appears in RNA. Otherwise, each column represents which oligonucleotides are mapped onto the corresponding RNA MS/MS ion search Matched oligonucleotide list (overall) Since the list shows a summary of MS/MS ion search results for all of MS/MS queries, it may be sometimes a very large list representing the information on the identified sequence, its mass values, and title field of the MGF for each MS/MS peak list. The meanings of each column are explained as shown in the Table 4 below (2.3.2). The q# contains a hyperlink to Nucleotide view of the corresponding MS/MS spectrum and its assignment table. Table 4. Entities in Matched oligonucleotide list 19

20 Name Explanation q# sequential number unique to each MS/MS. modseq nucleotide sequence including modifications score probability-based score of MS/MS ion search thre statistical threshold for the score. The significant level is start the start position of the oligonucleotide in the sequence region (subset). end the end position of the oligonucleotide in the sequence region (subset). m/z observed mass to charge ratio of the precursor ion z charge of the precursor ion obs_mw MW calculated from the m/z and z calc_mw MW calculated from the modseq delta relative mass difference between obs_mw and calc_mw in part-per-million q_title title field of the MGF file 2.3 Browsing Mapping results view page The Mapping results view page shows Nucleotide map and Matched nucleotide list for the identified region (subset) of RNA after a brief header section representing the description of the corresponding database query and positions of the sequence in the query Nucleotide map The map in Figure 17 represents the identified sequence region of the database query. The region is shown in boldface black letter, and neighboring sequence region is also shown in gray letter. The identified oligonucleotides in the region are highlighted with red underlines. Each underline contains a hyperlink to MS/MS assignment page of the corresponding MS/MS spectrum and its assignment table (Figure 17). Shown just below the map is a summary of the nucleotide mapping. 20

21 Figure 17. An example of Mapping results view for a single RNA Matched nucleotide list The list shows a summary of MS/MS ion search results, representing the information on the identified oligonucleotide sequence, its mass values, and title field of the MGF for each MS/MS peak list. The explanations of each column are as shown in Table 4. The q# contains a hyperlink to Nucleotide view of the corresponding MS/MS spectrum and its assignment table. 2.4 Browsing each 'MS/MS assignment view' page Clicking a link of an underline in Subset sequence or that of q# in Matched nucleotide list opens a new tab representing MS/MS assignment. This page shows an MS/MS spectrum with peak assignments and the corresponding assignment table. Matched peaks (within the tolerance of the search) in the spectrum are shown in red. If the m/z is matched but charge (z) is different (it might mean incorrect peak detection), the peak is indicated in blue. The spectrum can be enlarged by mouse drag & drop from left-top to right-down (drag at the start m/z and drop at the end m/z; See Figure 19 and video tutorial). The Y-axis is automatically normalized to the most intense peak in the range. Undoing the last enlargement can be done by drag & drop in left-up (opposite) direction. The m/z range can also be rescaled by inputting start and end values in text boxes just right of the spectrum. 21

22 Figure 18. An example of MS/MS spectrum and the corresponding assignment table Figure 19. Zoming in and out of assigned MS/MS spectrum The assignment table shows the calculated mass values of product ions expected from the identified sequence. Although only singly charged mass values are shown by default, selecting value from the Charge select box rewrite m/z values of the table up to the 22

23 selected value of charge (Figure 20). The colors of mass values in the table correspond to those of peaks in the spectrum. If the MGF contains the charge of product ions, the Deconvolution check box is displayed below the table. When checked, both the spectrum and table are to be rewritten to show the deconvoluted state.(figure 21) Reassignment of the spectrum with altered MS2 tolerance can be done by entering a desired value into the text box above the spectrum and pressing the update button (Figure 22). Further reassignment of the spectrum against a different sequence can be done by entering the sequence in the text box (See Figure 23). Both the reassignments do not recalculate the score of MS/MS ion search. Figure 20. Showing multiple charged ions in assignment table Figure 21. Showing deconvoluted MS/MS spectrum 23

24 Figure 22. Re-assignment of MS/MS spectrum with different MS2 tolerance Figure 23. Re-assignment of MS/MS spectrum with different sequence and/or MS2 tolerance 24