It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change

Size: px
Start display at page:

Download "It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change"

Transcription

1 Generation of transcriptome resources in rubber in response to Corynespora cassiicola causing Corynespora leaf disease for gene discovery and marker identification using NGS platform C. Bindu Roy and T. Saha Rubber Research Institute of India, Rubber Board (Ministry of Commerce and Industry, Government of India) Kottayam, Kerala , INDIA It is not the strongest of the species that survive, nor the most intelligent, but the one most responsive to change 1

2 Plant response to stress Stress response initiated at the cellular level. Stress recognition activates signal transduction pathways transmitting information within the individual cell and throughout the plant. Drought Biotic Cold Ethylene Altered cellular metabolism Corynespora leaf disease in Hevea 2

3 An effort was made to identify genes induced during disease development in rubber due to interaction with Corynespora cassiicola. Challenge inoculation of rubber plants and development of disease symptoms 3

4 To speed up the discovery of genes involved in host tolerance during disease development Next Generation Sequencing based RNA-Seq technology was adopted, which is a resource for gene discovery and development of functional molecular markers. Hevea transcriptome sequencing in response to pathogen infection has not been attempted till date. Methodology adopted 1. Clone selection: RRII 105 and GT 1 2. Challenge inoculation 3. Sample collection 4. RNA isolation, QC check, cdna Library construction 5. Sequencing HighSeq Bioinformatic analysis - De nova assembly of contigs, finding overlapping transcripts, finding homologous genes, functional annotations, characterization of transcripts, gene ontology and pathway assignments, inter and intra comparison of gene expression, identification of disease related transcription factors data mining, detection of SNPs, SSRs etc. 4

5 Next generation sequencing workflow library preparation - cluster generation - sequencing using HiSeq data analysis Summary of Illumina HiSeq 2000 sequencing results for a Corynespora susceptible (RRII 105) and resistant clone (GT 1) of rubber transcriptomes in response to C. cassiicola challenge inoculation Sample name C1 C2 T1 T2 Maximum read length Minimum read length Mean read length Total no. of high quality reads (million) Percentage of high quality reads (%) Percentage of high quality bases (%) Percentage of reads with non ATGC characters (%) C1 = RRII 105 control; C2 = GT 1 control; T1 = RRII 105 challenged; T2 = GT 1 challenged 5

6 De novo assembly statistics using velvet (for contigs) and oasis (for transcripts) Sample name C1 C2 T1 T2 Hash length Program used Velvet Oases Velvet Oases Velvet Oases Velvet Oases Contigs generated Maximum contig length Minimum contig length Average contig length Contigs >= 100 b or 200 b* Contigs >= 500 b Contigs >= 1 Kb Contigs >= 10 Kb N50 value Percentage of assembled reads (%) * Minimum length of contigs was 100 bp and for transcripts 200 bp. Statistics of SSR detected Control Total SSR in Control C1C2 Common SSRs Unique to C1 Unique to C Treated Total SSR in Treated T1T2 Common SSRs Unique to T1 Unique to T2 6

7 SSR motif length and their number Repeat motif length Set of repeat bases Repeat length threshold Unique to C1 Unique to C2 Common to C1 and C2 Unique to T1 Unique to T2 Common to T1 and T2 Mono nucleotide repeat 1 (A) Di nucleotide repeat 2 (CA) Tri nucleotide repeat 3 (ATG) Tetra nucleotide repeat 4 (TGAG) Penta nucleotide repeat 5 (CTAGT) Hexa nucleotide repeat 6 (ATGCAG) Frequency distribution of EST-SSR T ATC CT AGGA TA AT AAAT TCT TTTA AATTG TTC GAA AAG TTTC ATT AGA CTT AGAA TTA AAAG AC SSR Type CTTT TTAT CAGT AAGA AATA ATAA ATTT TTCT TATT TCTT GAT TAA TGA ATA CTC GAG TCA TGG AATT TAAA CCA GGA TAT GAAA 7

8 Longest SSR motifs observed C1 C2 C1C2 T1 T2 T1T2 Mononucleotide repeat (T)80 (T)73 (A)53 (A)87 (A)71 (A)64 Dinucleotide repeat (GA)24 (AT)31 (CT)26 (CT)28 (GA)38 (GA)30 Trinucleotide repeat (ATA)17 (TTA)17 (AAT)17 (TTA)18 (TAA)18 (ATT)17 Tetranucleotide repeat (AAAG)7 (TGTT)7 (TTTA)8 (TTTC)7 (TTTG)7 (AATC)8 Pentanucleotide repeat (TCAAC)7 (TGTTT)6 (CTATT)7 (TGTTT)6 (AATTA)6 (TCAAC)7 Hexanucleotide repeat (TGCTGG)8 (CACCAT)6 (TGCTGG)8 (TTGGGC)7 (GCACCA)9 (CGATTC)7 C1 = RRII 105 control; C2 = GT 1 control; T1 = RRII 105 challenged; T2 = GT 1 challenged Most common repeat motifs identified Motif Count Percentage AAG/CTT ACG/CTG AGG/CCT AAAC/GTTT

9 Single Nucleotide Polymorphisms DNA base pairs differing at some points SNP snapshot 9

10 Differential gene expression statistics Sample name Number of transcripts down regulated Number of transcripts up regulated C1C2 common C1T1 common C2T2 common T1T2 common C1 = RRII 105 control; C2 = GT 1 control; T1 = RRII 105 challenged; T2 = GT 1 challenged Utilizing the transcriptome data 1. To identify genes involved in disease related/ resistance pathways 2. To identify SNPs and SSRs for marker development and validation 10