Whole Transcriptome Sequencing/RNA-Seq

Whole Transcriptome Sequencing/RNA-Seq RNA Seq refers to the use of high throughput next genera on sequencing technologies to sequence complementary DNA (cdna) sequences. Successful whole transcriptome analysis depends on RNA quality and efficient conversion into cdna libraries. The en re pool of RNA present in the sample is converted into cdna copies, sequenced and mapped on to a reference genome. Xcelris Genomics helps you with the best solu ons for transcriptome sequencing on various pla orms like Illumina MiSeq, HiSeq 2000,/2500 SOLiD 4, Ion Torrent Life Technologies and Roche GS FLX Titanium etc. These technologies offer powerful combina on of readlength and fragment or paired end library flexibility for transcriptomes of various sizes. This allows profiling of the whole popula on of mrna and enables mapping and digital quan fica on of whole transcriptome. We have successfully established accurate and comprehensive methodologies for animal, plants, bacteria, fungi and many more. Barcoding/Indexing can be carried to allow RNA Seq of mul ple samples in single run/lane/slide. Analysis of the reads generated by the sequencer depends on presence or absence of reference genome. In non model organisms, the data generated is assembled using the denovo approach. The reference based approach is preferred for model organisms whose genomes are available along with proper annota ons. Requirement for WTA: We accept isolated RNA, microbial cultures (non pathogenic), plant ssues etc. Isolated Total RNA: 8 10µg of total RNA should be provided with RNA Integrity Number (RIN)> 6. RNA must not be degraded & should be free from DNA contamina on. Quality control of RNA samples: Samples will be subjected to both qualifica on, quan fica on and those having RIN > 6 will be QC passed. However, inclusion of low RIN value of the samples will be processed upon customer s confirma on. Op onal: Isola on of total RNA from plants, microbial (non pathogenic) & fungal samples Xcelris will isolate total RNA from various parts of plant ssues, fungal & bacterial cultures. Standardiza on is required for certain samples to obtain good quality & quan ty suitable for RNA Seq experiment, and will be charged separately. Microbial cultures (non pathogenic): Pure isolated culture in 10 15% glycerol stock Plant ssues (seedling, leaf, stem, flower, fruit, grain etc.) Minimum 3 5 gm of ssue sample should be transported in RNAlater. The volume of RNAlater should be at least ten mes the volume of ssue. Plant ssue samples should be harvested and immediately immersed in RNAlater solu on. Note: All types of samples should be transported in dry ice ( 20 C) containing cool packs to Xcelris Genomics, Ahmedabad, Gujarat, India. Brief methodology for WTA/RNA Seq: The comprehensive methodology that Xcelris follows for the transcriptome sequencing with years of experience using Next Genera on Technology is: RNA isola on (Op onal) Quality check of RNA sample to assess RIN > 6 Prepara on of whole transcr ptome/rna Seq library for various pla orm

Data Genera on on Illumina HiSeq 2000 or Ion Torrent or Roche GS FLX pla orms (as per the catalogue no. HiSeq_WTA or Ion Torrent_WTA_200bp or SOLiD_WTA or GSFLX_Trans respec vely) Quality filtra on of SE/ read data High quality raw data with QV 20 in (fastq, fasta & qual or sff) Denovo based transcriptome analysis/deliverables: Denovo assembly of transcripts using Velvet/SOAP Trans/Trinity/gsAssembler or CLC genomics workbench to generate con gs/unigenes Assembly valida on using SOAPaligner/gsMapper/ CLC Genomics Workbench or ESTScan Assembly sta s cs include depth of coverage in base pair, transcript length range, max length, minimum length & N50 Coding DNA sequences (CDS) predic on Func onal annota on of the predicted CDS Annota on of CDS with nr database or customer specified database GO func onal classifica on of CDS/unigenes in biological processes, cellular components and molecular func on. Compara ve analysis of CDS annota on (if more than 1 sample) Compara ve analysis of sequence distribu on based on Gene Ontology study (including biological process, molecular func on and cellular components) (if more than 1 sample) List of common and exclusive annotated CDS across the samples in FASTA format based on hit accession (if more than 1 sample) Differen al gene expression based on FPKM(fragment per kilobase of redas per million reads mapped) (if more than 1 sample) COG/ KOG func onal classifica on of unigenes/cds Pathway analysis using KEGG Iden fica on of transcrip on factor Simple sequence repeats (SSRs) discovery Comprehensive compiled report and data Op onal: Submission of data to NCBI (raw data and assembled con gs) Further analysis, figures, tables, required by specific journals are provided as per customer's requirement Timeline: It depends upon genome size, technology selected, number of samples, complexity and coverage required. The generalized me line is six to ten weeks. Data Genera on Pla orm Illuminma HiSeq 2000/ 2500 Illumina MiSeq Library Type Data (GB) Avg read length File format 2 4 2 x 100/2 x 150 bp fastq 6 4 to 5 2 x 250 bp fastq SOLiD 4 SE 3 50 bp csfasta & qual Ion Torrent SE ~ to 0.4 100 150 bp fastq Shotgun Full PTP ~0.5 to 0.6 Roche GS FLX Half PTP 0.18 450 600 bp fasta & qual/ sff Quarter PTP 0.08

CDS/UNIGENE SEQUENCES >transcript_unigene_1 CGGAGATATCTTTTGTTCGCCAACAAATTTTGGATGGGTGATGGGACCAATCTTGATGTATTCATGCTTT TTGTGTGGCTCTACTCTTGCTCTTTATCATGGGTCTCCTCTTGATCGTGGTTTTGGAAAGTTTATTCAAG ATGCAGGTGTTACTACGTTAGGTACCGTACCAAGCTTAGTGAAAACTTGGAAGAGCACAAGGTGTATGGA AGGCCTTGACTGGACAAAGATAAAGTTATTTGCTTCAACTGGGGAATCTTCCAATGTCGATGATGACCTA TGGCTTTCTTCAAGAGCTTATTACAAGCCAGTCATTGAATGCTGTGGAGGTACAGAGCTTGCATCTTCTT ATATTCAAGGAACTGTGCTTCAACCACAAGCTTTTGGAGCATTTAGCACTGCTACAATGACTACCGGATT TATCATCTTTGATGAGAATGGAGAAGCTTATCCAGATCATCAACCTTGTGTTGGAGAAGTGGGTTTGTTT CCTCTTTATATGGGAGCGAGTGATAGATTGCTGAATGCAGATCATGACGTTATTTACTACAAGGGGATGC CATCATACAAAGGAATGAAACTTAGACGTCACGGAGATATCATTAAAAGAACGGTGGGAGGATATTACAT TGTGCAGGGCAGGGCTGATGATACCATGAACCTTGGTGGCATTAAGACTAGTTCAGTTGAAATCGAGCGT GTTTGTGATCGTTGCGATGAAAATGTACTAGAGACTGCTGCAATTGGCATTCCTCCGGTGAACGGTGGAC CAGAGCAGCTAGTCATATTTGTAGTGCTA >transcript_cds_1 ATGGGTCTCCTCTTGATCGTGGTTTTGGAAAGTTTATTCAAGATGCAGGTGTTACTACGTTAGGTACCGT ACCAAGCTTAGTGAAAACTTGGAAGAGCACAAGGTGTATGGAAGGCCTTGACTGGACAAAGATAAAGTTA TTTGCTTCAACTGGGGAATCTTCCAATGTCGATGATGACCTATGGCTTTCTTCAAGAGCTTATTACAAGC CAGTCATTGAATGCTGTGGAGGTACAGAGCTTGCATCTTCTTATATTCAAGGAACTGTGCTTCAACCACA AGCTTTTGGAGCATTTAGCACTGCTACAATGACTACCGGATTTATCATCTTTGATGAGAATGGAGAAGCT TATCCAGATCATCAACCTTGTGTTGGAGAAGTGGGTTTGTTTCCTCTTTATATGGGAGCGAGTGATAGAT TGCTGAATGCAGATCATGACGTTATTTACTACAAGGGGATGCCATCATACAAAGGAATGAAACTTAGACG TCACGGAGATATCATTAAAAGAACGGTGGGAGGATATTACATTGTGCAGGGCAGGGCTGATGATACCATG AACCTTGGTGGCATTAAGACTAGTTCAGTTGAAATCGAGCGTGTTTGTGATCGTTGCGATGAAAATGTAC TAGAGACTGCTGCAATTGGCATTCCTCCGGTGAACGGTGGACCAGAGCAGCTAG COG FUNCTION CLASSIFICATION Clusters of Orthologous Groups of proteins (COGs) is an a empt on a phylogene c classifica on of the CDS/unigenes Function Class SSR DISCOVERY Contig Repeat Type Repeat Core # of Repeats SSR Repeat Length Start End CON_002_01185 Di TA 6 TATATATATATA 12 477 488 CON_003_02869 Tri TCT 7 TCTTCTTCTTCTTCATCTTCTT 22 44 65 CON_001_07736 Tetra TCTT 3 TCTTTCTTTCTTT 13 40 52 CON_002_00632 Penta AAAAC 2 AAAACAAAACAAAA 14 330 343

GO CLASSIFICATION FOR CDS/UNIGENE Biological process Molecularfunc on Cellular component cell organelle macromolecular complex membrane enclosed lumen extracellular region symplast binding cataly c ac vity transporter ac vity structural molecule ac vity molecular transducer ac vity electron carrier ac vity transcrip on regulator ac vity enzyme regulator ac vity an oxidant ac vity nutrient reservoir ac vity metabolic process cellular process response to s mulus biological regula on localiza on developmental process mul cellular organismal process cellular component organiza on reproduc on signaling mul organism process cellular component biogenesis growth cell wall organiza on or biogenesis immune system process death cell prolifera on rhythmic process viral reproduc on locomo on biological adhesion pigmenta on 5.7 5.4 0.2 9.6 5.4 4.3 2.7 2.1 2.1 1.4 10.9 8.3 7.4 6.3 4.6 3.0 2.7 1.6 1.4 0.4 0.2 0.2 20.6 20.4 17.0 14.3 14.1 30.2 70.8 65.9 75.5 79.6 78.2 0 20 40 60 80 100 120 Percentage of CDS in each category 99.6 DISTRIBUTION OF TRANSCRIPTION FACTOR Tify,12 WRKY,20 TUB,17 Trihelix,14 TRAF,47 Orphans,65 AP2 EREBP,50 GNAT, 13 Alfiqoike, 14 ARF,32 AUX/IAA,43 BES1,13 bzip,33 SNF2, 29 C2C2 GATA,18 C2C2 CO like, 11 SBP,19 CCAAT, 34 NAC,24 MYB related,68 MYB,27 mterf,36 G2 like,14 GRAS,27

Reference based transcriptome analysis/deliverables: Mapping of transcriptome data on reference genome Differen al gene expression based on FPKM (more than one samples) Func onal annota on of expressed genes in samples Up regulated & down regulated significant genes informa on along with Heat map Novel transcripts iden fica on Various types of plots/graphs like Volcano Plot, Sca er Plot are provided Comprehensive compiled report and data Op onal: Submission of data to NCBI (raw data) Further analysis, figures, tables, required by specific journals are provided as per customer's requirement Timeline: It depends upon genome size, technology selected, number of samples, complexity and coverage required. The generalized me line is six to ten weeks. HEAT MAP FOR UP AND DOWN REGULATED GENES sample 3 sample 1 sample 2 0.07.3199999.86 0.16239607 0.58119804 1.0 VOLCANO PLOT DEPICTING THE DIFFERENTIAL EXPRESSED GENES Gene Regula on genes: Stress/Control 10 8 log 10 (p value) 6 4 significant no yes Upregulated Downregulated 2 0 5 0 5 log 2(fold change)

KEGG PATHWAY KEGG CATEGORIES OF TRANSCRIPTS CONTIGS MAPD TO KEGG DATABASE