RNA-Seq Module 2 From QC to differential gene expression.
|
|
- Magnus Turner
- 6 years ago
- Views:
Transcription
1 RNA-Seq Module 2 From QC to differential gene expression. Ying Zhang Ph.D, Informatics Analyst Research Informatics Support System (RISS) MSI Apr. 24, 2012
2 RNA-Seq Tutorials Tutorial 1: Introductory (Mar. 28 & Apr. 19) RNA-Seq experiment design and analysis Instruction on individual software will be provided in other tutorials Tutorial 2: Introductory (Apr. 3 & Apr 24) Analysis RNA-Seq using TopHat and Cufflinks Tutorial 3: Intermediate (May 23) Advanced RNA-Seq Analysis topics and Trouble- Shooting
3 Tutorial Outline Review Key definitions and concepts Pre-processing of RNA-seq data QC and data cleaning Applications of RNA-seq Identification of differential gene expression Using TopHat, Cufflinks and Cuffdiff Definition of the transcriptome Transcriptome Assembly with / without reference genome Comparison of transcriptomes Identification of novel transcripts
4 RNA-Seq Definitions & Concepts
5 Key definitions (I) Sample mrna isolation SE: single end sequencing PE: paired end sequencing Mate-pair sequencing Fragmentation Size: ~200 bp Library preparation Circulation Size: ~2000 bp Sequence fragment end(s) Fragmentation SE sequencing PE sequencing Mate-Pair sequencing Sequence fragment end(s)
6 Key definitions (II) Fragment size selection - Only fragments with size around 200bp will be sequenced in order to reduce sequencing bias. Sequencing Depth is the average reads coverage of target sequences - Sequencing depth = total number of reads X read length / estimated target sequence length - Example, for a 5MB transcriptome, if 1Million 50 bp reads are produced, the depth is 1 M X 50 bp / 5M ~ 10 X Library Type: Sequencing Depth: De novo Assembly of transcriptome Refine gene model Differential Gene Expression PE, Mated PE PE, SE PE PE Extensive (> 50 X) Extensive Moderate (10 X ~ 30 X) Identification of structural variants Extensive ENCODE RNA-Seq guidelines
7 Phred (quality) Score Key definitions (III) Phred Score (Q) is the log transformation of error rate (P) at each base calling position Q = -10log 10 P Encoded using ASCII codes: Sanger standard: ASCII = Phred Score 0-93 Phred score 30 ~ 1 error per 1000 nucleotides Phred score 20 ~ 1 error per 100 nucleotides
8 NGS File formats
9 File formats in NGS (I) CASAVA software fastq Mapping SAM/BAM Assembly GTF
10 File format NGS (II) - FASTQ and FASTQ_flt (MSI) CASAVA software fastq CASAVA: Illumina software package for base calling Fastq format: Text format. Stores sequence and quality info 4 lines per sequences CASAVA 1.8 header line: Machine ID QC Filter flag Y=bad N=good barcode Read ID (header) Sequence + Quality score 1:N:0:AGATC TTCAGAGAGAATGAATTGTACGTGCTTTTTTTGT + Read pair # =1:?7A7+?77+<<@AC<3<,33@A;<A?A=:4= FASTQ_flt data Fastq files processed by MSI standard to remove reads with QC flag Y
11 File formats NGS (III) CASAVA software fastq mapping SAM/BAM format: Sequence alignment format SAM: text format BAM: binary file of SAM Bitwise flag field: indicating mapped or not, paired or not, etc SAM/BAM SAM/BAM format is the standard format of mapped reads, and could be used by almost all NGS tools, e.g. assembler, viewer, quantifier.
12 File formats in NGS (IV) CASAVA software fastq SAM/BAM mapping assembly GTF format Gene Transfer Format Widely used format for annotated genome and transcriptome Downloadable from major browser sites, e.g. UCSC, Ensembl, NCBI Illumina also provides a set of annotated genomes: igenomes Available through Galaxy and command line GTF Seqname Source feature start end score strand frame a0ributes chr1 unknown exon gene_id "Xkr4"; transcript_id "NM_ ;
13 Steps in RNA-Seq Data Analysis Step 1: Quality Control FastQC fastq Step 2: Data prepping Filter/Trimmer/Converter fastqsanger Step 3: Map Reads to Reference Genome/Transcriptome TopHat bam/sam Step 4: Assemble Transcriptome Cufflinks gtf; fpkm Other applications: De novo Assembly Refine gene models Identify Differentially Expressed Gens Cuffdiff fpkm; diff
14 Step 1 Quality control of the input data Step 2 Data prepping
15 Quality control of the raw reads Goal: to determine quality of the sequencing process Recommended program: fastqc Available both in Galaxy and Linux platform Checklist of reads quality: Ø File format q Basic Statistics Ø Reliability of base calling q Per base sequence quality q Per sequence quality score Ø Contamination q Per sequence GC content q Overrepresented sequences
16 Is NOT NEEDED, if: In the right format Good reads quality Data prepping (I) Phred score per base & per sequence >=20 ( better if >=30) No contamination detected Paired reads are synchronized Bad mapping efficiency of PE reads is symptomatic of desynchronization
17 BAD format GOOD Wrong Fastq Format (CASAVA 1:N:0:AGATC TTCAGAGAGAATGAATTGTACGTGCTTTTTTTGT + Right Fastq Format (CASAVA 1.7): TTCAGAGAGAATGAATTGTACGTGCTTTTTTTGT + =1:?7A7+?77+<<@AC<3<,33@A;<A?A=:4=
18 Data prepping Needed (I) Data format is incorrect, paired reads are not indicated as RNAME/1, RNAME/2 the quality score is not Sanger/Illumina 1.9 format Action: change the format, using edit attributes, fastq groomer, header line converter Notes: If you are using Galaxy to analyze your data, change file name WILL NOT change the file format.
19 Distribution of Phred Score in reads Bad Trimming needed Good
20 Data prepping Needed (II) Data contains bad reads, the quality score of reads/part of the reads is < 20 Action: remove the low quality reads, using fastq filter, and fastq trimmer fastq filter: remove entire reads fastq column trimmer: uniformly remove the nucleotides positions in all reads. fastq quality trimmer: remove all nucleotide positions with low quality.
21 Example of bad data: sequence contamination
22 Data prepping Needed (III) Adapter sequences are detected Action: remove the adapter sequences, using CutAdapt
23 Data prepping Needed (IV) Data is out of synch, the Forward and Reverse reads are not arranged in the same order. Action: synchronize the files, using fastq interlacer and fastq de-interlacer Notes: Synchronization check and correction should be the last step in data prepping, because the previous steps in prepping can cause de-synchronization of PE data.
24 Summary - Data prepping Data prepping is NEEDED, if: Data format is incorrect Data contains bad reads Adapter sequences are detected Data is out of synch, meaning the pairing of Forward and Reverse reads are out of order
25 Summary: Galaxy Tools for pre-processing 1. Fastq Groomer: Convert quality score to Sanger standard format Necessary for data generated with CASAVA 1.7 or less, no need for CASAVA 1.8 and above. 2. Convert read header format from 1.8 to Fastq filter: removal low quality reads 4. Fastq Trimmer: removal of low quality end bases 5. Cutadapt: Cut Adapter sequences 6. Synchronization Fastq Interlacer/De-Interlacer Critical for PE data analysis
26 Applications of RNA-Seq
27 1 Evaluation of a tissue s transcriptome What is the composition of the transcriptome? 2 Comparative analysis of two or more transcriptomes How do two or more species transcriptomes compare? 3 Differential gene expression What genes are differentially regulated in two or more conditions? This tutorial
28 Differential Gene Expression (DGE) Two Scenarios
29 1 DGE Non discovery mode DGE without detection of novel transcripts 2 DGE - Discovery mode DGE with detection of novel transcripts
30 1 DGE - Non discovery mode Quality Control (fastqc) Mapped Reads (sample 1) bam/sam Condition 1 fastq Map Reads to Reference sequence or genome (TopHat) Pre-defined Annotation Identify Differential Expression (Cuffdiff) Condition 2 fpkm; diff fastq Quality Control (fastqc) Mapped Reads (sample 2) bam/sam
31 2 DGE - Discovery mode Assemble sample transcriptome with discovery of novel transcripts (Cufflinks) gtf Merge sample transcriptomes into one (Cuffcompare */ Cuffmerge) gtf Data Prepping fastq Map Reads to Reference sequence or genome (TopHat) SAM/BAM Identify Differential Expression (Cuffdiff) fpkm; diff * Only available in Galaxy
32 RNA-Seq analytical tool: Tuxedo A mapper: Bowtie Maps short reads to the reference genome. Bowtie A splice junction aligner: Tophat Uses Bowtie to align short reads to reference genome or sequence It infers and estimates the splicing sites. A transcriptome assembler: Cufflinks cuffcompare (comparing transcriptomes) cuffmerge (merging transcriptomes) cuffdiff (identifying differentially expressed genes). Cuffmerge Cuffcompare TopHat Cufflinks Cuffdiff bam/sam gtf; fpkm diff; fpkm A visualization (R) package: cummerbund cummerbund Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Trapnell C. et al. (2012) Nature Protocols
33 Quantify Expression Abundance (Cufflinks): FPKM Sample 1 mrna isolation Fragmentation RNA -> cdna Paired End Sequencing Calculate transcript abundance Gene A Gene B Sample # of Fragment (Paired Reads) Gene A Gene B Sample # Fragments per kilobase of exon Genome A Reference Transcriptome Map reads B Gene A Gene B Total Sample Sample # Fragments per kilobase of exon per million mapped reads FPKM
34 DGE Non discovery mode
35 DGE without detection of novel transcripts Approach: Treat RNA-Seq as a high-resolution microarray. Most appropriate for: Quick identification and analysis of differentially expressed genes Best for systems with annotated reference genome, such as human and mouse Analysis specificities- limitations: - Only map reads to previously known transcripts - Only test the differential expression of previously known transcripts. Programs to use: 1. TopHat: the mapper 2. Cuffdiff: the tester
36 Why choose TopHat as the mapper? Mapping RNA-Seq reads to the genome is a big challenge continuous RNA-Seq reads dis-continuous genomic sequence Initially, genome aligner (such as BWA) treated introns as gaps. BWA Dgcr2 TopHat: discovering splice junctions with RNA-Seq Trapnell C et al. (2009) Bioinformatics
37 INTRONs shouldn t be treated as GAPs. BWA Dgcr2 If there were no introns: Reads should continuously cover the splice junctions. Exon Dgcr2
38 TopHat The splice junction aligner It became intuitive to incorporate the splicing information in the mapping process. Later, it became necessary to build splicing junctions ab initio, because of the incompleteness of known junctions Splicing signals: donor and acceptor sites So TopHat is developed. So, TopHat is developed. TopHat BWA Dgcr2 TopHat: discovering splice junctions with RNA-Seq Trapnell C et al. (2009) Bioinformatics
39 Step 1: mapping Step 2: building splicing junctions (+/- using ref juncs) Step 3: 2 nd mapping Basis of TopHat Direct un-spliced, un-paired mapping (using bowtie) Assemble contiguous coverage island Identify possible splice donor and acceptor Predict possible splicing junctions Uses bowtie again in closure search (finding splice junctions with mapping support) Output: Mapping results (SAM/BAM file), and junctions.bed Only SAM/BAM file will be used by Cufflinks and Cuffdiff. From the courtesy of Dr. Kevin Silverstein
40 TopHat General considerations Q1: Is the project in human or mouse? TopHat is optimized for human and mouse genome. A: Yes Action: Nothing to be changed. Can use default parameters A: No Action: Cannot use default parameters. Need to input all species specific parameters, e.g. those gene-model related parameters, such as intron length.
41 TopHat General considerations Q2: Is the library paired-end? A: Yes Action: Set the parameters for mean distance between paired reads (- r) and the standard deviation of the inner distance (--mate-std-dev) inner distance = fragment length (220) 2 X read length A: No Action: Nothing to be changed.
42 TopHat options Non discovery analytical approach Q3: What are the parameters to select? 1 Select Full parameter list 2 Select Yes for the option to Use own junctions. 3 Select Yes for the option to Use gene annotation model, AND provide known annotation (gtf file). 4 Select Yes for the option to Only look for supplied junctions.
43 Assessing mapping efficiency A: Review of the mapping statistics. 1 % of reads mapped, % of reads properly paired 2 Use: Samtools and Picard tools flagstat: line 3 and line 7 For TopHat, first filter BAM file on MAPQ value of 255 Filter SAM or BAM files on FLAG MAPQ RG LN or by region SAM/BAM Alignment Summary Metrics 3 Estimate the insertion size Insertion size metrics Recommendations: For human and mouse, good mapping will result in - >= 80% mapping percentage >=70% paired reads
44 Mapping visualization Integrative Genome Viewer (IGV)
45 Run IGV locally to view multiple tracks Direction to install IGV: Healthy Sample Cancer
46 DGE Workflow - Non discovery mode Quality Control (fastqc) Mapped Reads (sample 1) bam/sam Condition 1 fastq Condition 2 Map Reads to Reference sequence or genome (TopHat) Pre-defined Annotation Identify Differential Expression (Cuffdiff) fpkm; diff fastq Quality Control (fastqc) Mapped Reads (sample 2) bam/sam
47 Cuffdiff Facts Cuffdiff: Quantifies the gene expression abundance, Statistical evaluation of the differential expression. Considerations on handling Tail data. Exclude the lowexpressed genes to remove transcription artifacts. Set the parameter for Min Alignment Count. Density Global gene expression log10(fpkm)
48 Cuffdiff Facts Cuffdiff: Quantifies the gene expression abundance, Statistical evaluation of the differential expression. Considerations on handling Tail data. Density Global gene expression Exclude the highly expressed genes, such as some house-keeping genes. Set yes to Perform quartile normalization log10(fpkm)
49 Cuffdiff output Healthy_sample Cancer_sample
50 Post-analysis processing and iterations Check for non-biological variations Also known as technical variation, or within-group variation. This type of variation is detected among samples of the same group. Source of the technical variations: Batch effect How were the samples collected and processed? Were the samples processed as groups, and if so what was the grouping? Non-synchronized cell cultures Were all the cells from the same genetic backgrounds and growth phase? Use technical replicates rather than biological replicates Detection of non biological variation PCA analysis; or MDS analysis; or Unsupervised clustering analysis of FPKM values
51 Steps in PCA analysis PCA analysis Construct the multiple variable matrix e.g. tables of FPKM values transcripts Sample A Sample V Sample O Sample E Sample I Sample U gene gene gene gene gene gene gene gene gene gene gene gene gene gene Group 1 (A,V,O) Group 2 (E,I,U) PC V U I E A O PC1 O
52 DGE Discovery mode
53 QC with fastqc QC with fastqc CONDITION A SampA.fq CONDITION B SampB.fq Alignment with TopHat SampA.bam Reference Index Genome.fa Alignment with TopHat SampB.bam Assemble with Cufflinks SampA.gtf Reference Gene Annotation Genes.gtf Assemble with Cufflinks SampB.gtf Discovery Phase Merge assemblies with Cuffcompare merged.gtf Store Results Quantitation and differential expression with cuffdiff gene_exp.diff; isoform_exp.diff; Visualization with cummerbund
54 DGE with detection of novel transcripts Novel transcripts will be assembled and tested for differential expression. Potential identification of new splicing variants Key advantage (over microarray). Not limited by previous knowledge Extends current knowledge banks Programs used: 1. TopHat: the mapper; 2. Cufflinks: the assembler; 3. Cuffdiff: the tester
55 TopHat Best practice in Discovery Mode Same as before, but TopHat needs to be run at least TWICE in order to reliably and consistently identify the splicing junctions. 1 First run is to generate a full list of junctions. 2 Second run is to apply the full junction files to all the samples to keep mapping consistence. The TWO-STEP running of TopHat: 1. Running TopHat as before 2. Re-run TopHat with a list of junctions (see setting in next slide).
56 TopHat options discovery analytical approach 1 Combine the sample junctions.bed files into one using Concatenate. 2 Turn on Full parameter list. 3 Turn on (set yes to) the option for Use own junctions. 4 Provide junctions files (bed file). 5 Turn on the option for Use Closure Search. 6 Turn on Use Microexon Search.
57 Considerations for Cufflinks
58 Cufflinks facts Optimized for human and mouse genomes Uses a parsimonious method to assemble the transcripts +/- known annotation Can estimate the transcript abundances FPKM: # of Fragments Per Kilobases of exon model per Million mapped fragments Can estimate the fragment length distribution Not available in Galaxy Output file: GTF file
59 Cufflinks General considerations Q1: Is the project in human or mouse? Cufflinks is optimized for human and mouse genome. A: Yes Action: Nothing to change A: No Action: Cannot use default parameters. Need to input all species specific parameters, e.g. those gene-model related parameters, such as intron length.
60 Cufflinks General considerations Q2: Want to use a known annotation in transcriptome assembly and report novel transcripts assembled? A: Yes Action: Use the option for Use Reference Annotation ; Select Use Reference Annotation as Guide. A: No Action: Nothing to change
61 Cufflinks General considerations Q3: Can I pool samples as one input to cufflinks? A: No. Because we might lose some isoforms in this manner. It is possible that one isoform may only be called from one sample, due to some uncontrollable sample preparation process. Cufflinks will only report isoforms above certain abundance threshold (10% of the major transcripts). The rare isoform will be diluted in the pooled samples, so that it may become missing in the assembly. Isoform A (FPKM) Isoform B (FPKM) Called? Sample Yes Sample No Pooled No
62 Cuffcompare Facts Cuffcompare Compares multiple transcriptomes and reports the similarity between them. Available in Galaxy. Cuffmerge A new function implemented in Cufflinks package. Purpose is to remove assembly artifacts. Available using command line tools.
63 Follow the same instruction to run Cuffdiff and postprocessing as in DGE-non discovery mode.
64 Reproducibility and the value of Workflow
65 Analysis strategy in Workflow Workflow is A sequential collection of Galaxy operations to complete an analysis
66 Create a Workflow From scratch From current history Edit existing workflow
67 Share/Publish/Use Workflow
68 Tutorial optional material = = Evaluation of transcriptome Two Scenarios
69 1 De novo assembly of transcriptome Assemble transcriptome without a reference transcriptome/genome 2 Reference-guided assembly of transcriptome
70 Key definitions Short Reads Contigs = consensus of overlapping reads Scaffolds = contigs + known-length gaps known-length gaps could be estimated by Mate-pair sequencing Draft transcriptome/genome = a collection of non-ordered scaffolds
71 De novo assemble the transcriptome. fastq Samples (RNA-Seq) Pre-processing: QC and Data cleaning fastq De novo Assembly of Transcriptome Trans-ABySS * fastq * We only put one assembler in this diagram to illustrate the concept of assembling. However, in order to construct a reliable transcriptome, multiple assembler should be used to generate a consensus assembly.
72 Trans-ABySS Facts ABySS is a de novo, parallel sequence assembler that is designed for short reads. Can work on single end reads and paired end reads. Is a de Bruijn graph assembler It takes two steps: Using all possible k-mers from the reads to build the initial contigs Using mate-pair information to extend contigs Trans-ABySS is a pipeline for analyzing ABySSassembled contigs from RNA-Seq data. Use several k-mer length Availability: Command line Homepage:
73 Reference-guided assembly of transcriptome (Also known as transcriptome reconstruction ) fastq Samples (RNA-Seq) Pre-processing: QC and Data cleaning fastq Known Annotation Map short reads to reference genome (TopHat) BAM/SAM Assemble transcriptome from mapped reads (Cufflinks) GTF
74 If choosing TopHat and Cufflinks as the assembler, follow the instructions in DGEdiscovery mode
75 Specific Notes for Prokaryotes samples Cufflinks developer: We don t recommend assembling bacteria transcripts using Cufflinks at first. If you are working on a new bacteria genome, consider a computational gene finding application such as Glimmer. So for bacteria transcriptome: If the genome is available, do genome annotation first then reconstruct the transcriptome. If the genome is not available, try the de novo assembly, then followed by gene annotation.
76 Next-generation transcriptome assembly Martin J. et al (2011) Nature Review Summary Hybrid method on transcriptome assembly
77 Comparative Study of Transcriptomes
78 Q: How can I compare different transcriptomes? Sample 1.gtf Sample 2.gtf Sample 3.gtf. Cuffcompare: Compare individual transcriptome Generic tools: Operate on Genomic Intervals Sample N.gtf
79 Galaxy Tools for pre-processing Cuffcompare Operate on Genomic Intervals
80 Downstream visualization and analysis: Will be covered in Tutorial Module 3. IGV: interactive genome viewer IPA: Ingenuity pathway analysis Other analysis package: R package: ArrayExpressHTS, cummerbund
81 Discussion and Questions? Get Support at MSI: General Questions: Subject line: RISS: Galaxy Questions: Subject line: Galaxy:
Intermediate RNA-Seq Tips, Tricks and Non-Human Organisms
Intermediate RNA-Seq Tips, Tricks and Non-Human Organisms Kevin Silverstein PhD, John Garbe PhD and Ying Zhang PhD, Research Informatics Support System (RISS) MSI September 25, 2014 Slides available at
More informationRNA-Seq Software, Tools, and Workflows
RNA-Seq Software, Tools, and Workflows Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 1, 2016 Some mrna-seq Applications Differential gene expression analysis Transcriptional profiling Assumption:
More informationRNA-Seq with the Tuxedo Suite
RNA-Seq with the Tuxedo Suite Monica Britton, Ph.D. Sr. Bioinformatics Analyst September 2015 Workshop The Basic Tuxedo Suite References Trapnell C, et al. 2009 TopHat: discovering splice junctions with
More informationSanger vs Next-Gen Sequencing
Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics
More informationRNA Seq: Methods and Applica6ons. Prat Thiru
RNA Seq: Methods and Applica6ons Prat Thiru 1 Outline Intro to RNA Seq Biological Ques6ons Comparison with Other Methods RNA Seq Protocol RNA Seq Applica6ons Annota6on Quan6fica6on Other Applica6ons Expression
More informationRNAseq Differential Gene Expression Analysis Report
RNAseq Differential Gene Expression Analysis Report Customer Name: Institute/Company: Project: NGS Data: Bioinformatics Service: IlluminaHiSeq2500 2x126bp PE Differential gene expression analysis Sample
More informationRNA-Seq Workshop AChemS Sunil K Sukumaran Monell Chemical Senses Center Philadelphia
RNA-Seq Workshop AChemS 2017 Sunil K Sukumaran Monell Chemical Senses Center Philadelphia Benefits & downsides of RNA-Seq Benefits: High resolution, sensitivity and large dynamic range Independent of prior
More informationRNA-seq Data Analysis
Lecture 3. Clustering; Function/Pathway Enrichment analysis RNA-seq Data Analysis Qi Sun Bioinformatics Facility Biotechnology Resource Center Cornell University Lecture 1. Map RNA-seq read to genome Lecture
More informationCourse Presentation. Ignacio Medina Presentation
Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital
More informationEcole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech
GALAXY INITIATION A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech How does Next- Gen sequencing work? DNA fragmentation Size selection and clonal amplification Massive parallel sequencing ACCGTTTGCCG
More informationIntroduction to RNA-Seq
Introduction to RNA-Seq Monica Britton, Ph.D. Sr. Bioinformatics Analyst March 2015 Workshop Overview of RNA-Seq Activities RNA-Seq Concepts, Terminology, and Work Flows Using Single-End Reads and a Reference
More informationDifferential gene expression analysis using RNA-seq
https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, August 2017 Friederike Dündar with Luce Skrabanek & Ceyda Durmaz Day 3 QC of aligned reads
More informationBenchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data
Benchmarking of RNA-seq data processing pipelines using whole transcriptome qpcr expression data Jan Hellemans 7th international qpcr & NGS Event - Freising March 24 th, 2015 Therapeutics lncrna oncology
More informationDATA FORMATS AND QUALITY CONTROL
HTS Summer School 12-16th September 2016 DATA FORMATS AND QUALITY CONTROL Romina Petersen, University of Cambridge (rp520@medschl.cam.ac.uk) Luigi Grassi, University of Cambridge (lg490@medschl.cam.ac.uk)
More informationIntroduction to transcriptome analysis using High Throughput Sequencing technologies. D. Puthier 2012
Introduction to transcriptome analysis using High Throughput Sequencing technologies D. Puthier 2012 A typical RNA-Seq experiment Library construction Protocol variations Fragmentation methods RNA: nebulization,
More informationIntroduction to RNA sequencing
Introduction to RNA sequencing Bioinformatics perspective Olga Dethlefsen NBIS, National Bioinformatics Infrastructure Sweden November 2017 Olga (NBIS) RNA-seq November 2017 1 / 49 Outline Why sequence
More informationBioinformatics in next generation sequencing projects
Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Standard sequence library generation Illumina
More informationIntroduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013
Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance
More informationRNA-Sequencing analysis
RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges
More informationLong and short/small RNA-seq data analysis
Long and short/small RNA-seq data analysis GEF5, 4.9.2015 Sami Heikkinen, PhD, Dos. Topics 1. RNA-seq in a nutshell 2. Long vs short/small RNA-seq 3. Bioinformatic analysis work flows GEF5 / Heikkinen
More informationSCALABLE, REPRODUCIBLE RNA-Seq
SCALABLE, REPRODUCIBLE RNA-Seq SCALABLE, REPRODUCIBLE RNA-Seq Advances in the RNA sequencing workflow, from sample preparation through data analysis, are enabling deeper and more accurate exploration
More informationMapping strategies for sequence reads
Mapping strategies for sequence reads Ernest Turro University of Cambridge 21 Oct 2013 Quantification A basic aim in genomics is working out the contents of a biological sample. 1. What distinct elements
More informationWhole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist
Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data
More informationNext Gen Sequencing. Expansion of sequencing technology. Contents
Next Gen Sequencing Contents 1 Expansion of sequencing technology 2 The Next Generation of Sequencing: High-Throughput Technologies 3 High Throughput Sequencing Applied to Genome Sequencing (TEDed CC BY-NC-ND
More informationRNA-Seq Tutorial 1. Kevin Silverstein, Ying Zhang Research Informatics Solutions, MSI October 18, 2016
RNA-Seq Tutorial 1 Kevin Silverstein, Ying Zhang Research Informatics Solutions, MSI October 18, 2016 Slides available at www.msi.umn.edu/tutorial-materials RNA-Seq Tutorials Lectures RNA-Seq experiment
More informationForm for publishing your article on BiotechArticles.com this document to
Your Article: Article Title (3 to 12 words) Article Summary (In short - What is your article about Just 2 or 3 lines) Category Transcriptomics sequencing and lncrna Sequencing Analysis: Quality Evaluation
More informationAnalytics Behind Genomic Testing
A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical
More informationDe Novo Assembly of High-throughput Short Read Sequences
De Novo Assembly of High-throughput Short Read Sequences Chuming Chen Center for Bioinformatics and Computational Biology (CBCB) University of Delaware NECC Third Skate Genome Annotation Workshop May 23,
More informationMachine Learning Methods for RNA-seq-based Transcriptome Reconstruction
Machine Learning Methods for RNA-seq-based Transcriptome Reconstruction Gunnar Rätsch Friedrich Miescher Laboratory Max Planck Society, Tübingen, Germany NGS Bioinformatics Meeting, Paris (March 24, 2010)
More informationRNA-seq data analysis with Chipster. Eija Korpelainen CSC IT Center for Science, Finland
RNA-seq data analysis with Chipster Eija Korpelainen CSC IT Center for Science, Finland chipster@csc.fi What will I learn? 1. What you can do with Chipster and how to operate it 2. What RNA-seq can be
More informationAlignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014
Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationSCIENCE CHINA Life Sciences
SCIENCE CHINA Life Sciences SPECIAL TOPIC February 2013 Vol.56 No.2: 143 155 RESEARCH PAPER doi: 10.1007/s11427-013-4442-z Comparative study of de novo assembly and genome-guided assembly strategies for
More informationRNA-Seq analysis workshop
RNA-Seq analysis workshop Zhangjun Fei Boyce Thompson Institute for Plant Research USDA Robert W. Holley Center for Agriculture and Health Cornell University Outline Background of RNA-Seq Application of
More informationDemo of mrna NGS Concluding Report
Demo of mrna NGS Concluding Report Project: Demo Report Customer: Dr. Demo Company/Institute: Exiqon AS Date: 09-Mar-2015 Performed by Exiqon A/S Company Reg.No.(CVR) 18 98 44 31 Skelstedet 16 DK-2950,
More informationGenomics AGRY Michael Gribskov Hock 331
Genomics AGRY 60000 Michael Gribskov gribskov@purdue.edu Hock 331 Computing Essentials Resources In this course we will assemble and annotate both genomic and transcriptomic sequence assemblies We will
More informationDifferential gene expression analysis using RNA-seq
https://abc.med.cornell.edu/ Differential gene expression analysis using RNA-seq Applied Bioinformatics Core, August 2017 Friederike Dündar with Luce Skrabanek & Ceyda Durmaz Day 1: Introduction into high-throughput
More informationReference genomes and common file formats
Reference genomes and common file formats Overview Reference genomes and GRC Fasta and FastQ (unaligned sequences) SAM/BAM (aligned sequences) Summarized genomic features BED (genomic intervals) GFF/GTF
More informationData Analysis with CASAVA v1.8 and the MiSeq Reporter
Data Analysis with CASAVA v1.8 and the MiSeq Reporter Eric Smith, PhD Bioinformatics Scientist September 15 th, 2011 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa, Making Sense
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationOutline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions
Outline Introduction to ab initio and evidence-based gene finding Overview of computational gene predictions Different types of eukaryotic gene predictors Common types of gene prediction errors Wilson
More informationRead Quality Assessment & Improvement. UCD Genome Center Bioinformatics Core Tuesday 14 June 2016
Read Quality Assessment & Improvement UCD Genome Center Bioinformatics Core Tuesday 14 June 2016 QA&I should be interactive Error modes Each technology has unique error modes, depending on the physico-chemical
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationNGS sequence preprocessing. José Carbonell Caballero
NGS sequence preprocessing José Carbonell Caballero jcarbonell@cipf.es Contents Data Format Quality Control Sequence capture Fasta and fastq formats Sequence quality encoding Evaluation of sequence quality
More informationGene Expression Technology
Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene
More informationNIH-PA Author Manuscript NIH-PA Author Manuscript NIH-PA Author Manuscript
NIH Public Access Author Manuscript Published in final edited form as: Nat Protoc. ; 7(3): 562 578. doi:10.1038/nprot.2012.016. Differential gene and transcript expression analysis of RNA-seq experiments
More informationQuality assessment and control of sequence data
Quality assessment and control of sequence data Naiara Rodríguez-Ezpeleta Workshop on Genomics 2015 Cesky Krumlov fastq format fasta Most basic file format to represent nucleotide or amino-acid sequences
More informationNext-Generation Sequencing in practice
Next-Generation Sequencing in practice Bioinformatics analysis techniques and some medical applications Salvatore Alaimo, MSc. Email: alaimos@dmi.unict.it Overview Next Generation Sequencing: how it works
More informationBCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html UCSC
More informationA step-by-step guide to ChIP-seq data analysis
A step-by-step guide to ChIP-seq data analysis December 03, 2014 Xi Chen, Ph.D. EMBL-European Bioinformatics Institute Wellcome Trust Sanger Institute Target audience Wet-lab biologists with no experience
More informationmeasuring gene expression December 5, 2017
measuring gene expression December 5, 2017 transcription a usually short-lived RNA copy of the DNA is created through transcription RNA is exported to the cytoplasm to encode proteins some types of RNA
More informationCNV and variant detection for human genome resequencing data - for biomedical researchers (II)
CNV and variant detection for human genome resequencing data - for biomedical researchers (II) Chuan-Kun Liu 劉傳崑 Senior Maneger National Center for Genome Medican bioit@ncgm.sinica.edu.tw Abstract Common
More informationGreen Center Computational Core ChIP- Seq Pipeline, Just a Click Away
Green Center Computational Core ChIP- Seq Pipeline, Just a Click Away Venkat Malladi Computational Biologist Computational Core Cecil H. and Ida Green Center for Reproductive Biology Science Introduc
More informationQuality assessment and control of sequence data. Naiara Rodríguez-Ezpeleta
Quality assessment and control of sequence data Naiara Rodríguez-Ezpeleta Workshop on Genomics 2014 Quality control is important Some of the artefacts/problems that can be detected with QC Sequencing Sequence
More informationBioinformatics small variants Data Analysis. Guidelines. genomescan.nl
Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines
More informationL3: Short Read Alignment to a Reference Genome
L3: Short Read Alignment to a Reference Genome Shamith Samarajiwa CRUK Autumn School in Bioinformatics Cambridge, September 2017 Where to get help! http://seqanswers.com http://www.biostars.org http://www.bioconductor.org/help/mailing-list
More informationGene Regulation Solutions. Microarrays and Next-Generation Sequencing
Gene Regulation Solutions Microarrays and Next-Generation Sequencing Gene Regulation Solutions The Microarrays Advantage Microarrays Lead the Industry in: Comprehensive Content SurePrint G3 Human Gene
More informationAnalysis of Differential Gene Expression in Cattle Using mrna-seq
Analysis of Differential Gene Expression in Cattle Using mrna-seq mrna-seq A rough guide for green horns Animal and Grassland Research and Innovation Centre Animal and Bioscience Research Department Teagasc,
More informationIncorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits
Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing
More informationNext Generation Sequencing Data Analysis with BioHPC. Updated for
Next Generation Sequencing Data Analysis with BioHPC 1 Updated for 2015-04-15 Next Generation Sequencing Genomic, transcriptomic sequencing now commonplace in projects. Now very cheap! UTSW McDermott Core
More informationBIOINFORMATICS ORIGINAL PAPER
BIOINFORMATICS ORIGINAL PAPER Vol. 27 no. 21 2011, pages 2957 2963 doi:10.1093/bioinformatics/btr507 Genome analysis Advance Access publication September 7, 2011 : fast length adjustment of short reads
More informationAssembly of Ariolimax dolichophallus using SOAPdenovo2
Assembly of Ariolimax dolichophallus using SOAPdenovo2 Charles Markello, Thomas Matthew, and Nedda Saremi Image taken from Banana Slug Genome Project, S. Weber SOAPdenovo Assembly Tool Short Oligonucleotide
More informationGenomics and Transcriptomics of Spirodela polyrhiza
Genomics and Transcriptomics of Spirodela polyrhiza Doug Bryant Bioinformatics Core Facility & Todd Mockler Group, Donald Danforth Plant Science Center Desired Outcomes High-quality genomic reference sequence
More informationBioinformatics Advice on Experimental Design
Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics
More informationNext Generation Sequencing: An Overview
Next Generation Sequencing: An Overview Cavan Reilly November 13, 2017 Table of contents Next generation sequencing NGS and microarrays Study design Quality assessment Burrows Wheeler transform Next generation
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationContact us for more information and a quotation
GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA
More informationIntroduction to genome biology
Introduction to genome biology Lisa Stubbs We ve found most genes; but what about the rest of the genome? Genome size* 12 Mb 95 Mb 170 Mb 1500 Mb 2700 Mb 3200 Mb #coding genes ~7000 ~20000 ~14000 ~26000
More informationCSE182-L16. LW statistics/assembly
CSE182-L16 LW statistics/assembly Silly Quiz Who are these people, and what is the occasion? Genome Sequencing and Assembly Sequencing A break at T is shown here. Measuring the lengths using electrophoresis
More informationWorkflow of de novo assembly
Workflow of de novo assembly Experimental Design Clean sequencing data (trim adapter and low quality sequences) Run assembly software for contiging and scaffolding Evaluation of assembly Several iterations:
More informationShuji Shigenobu. April 3, 2013 Illumina Webinar Series
Shuji Shigenobu April 3, 2013 Illumina Webinar Series RNA-seq RNA-seq is a revolutionary tool for transcriptomics using deepsequencing technologies. genome HiSeq2000@NIBB (Wang 2009 with modifications)
More informationOutline. Annotation of Drosophila Primer. Gene structure nomenclature. Muller element nomenclature. GEP Drosophila annotation projects 01/04/2018
Outline Overview of the GEP annotation projects Annotation of Drosophila Primer January 2018 GEP annotation workflow Practice applying the GEP annotation strategy Wilson Leung and Chris Shaffer AAACAACAATCATAAATAGAGGAAGTTTTCGGAATATACGATAAGTGAAATATCGTTCT
More informationWelcome to the NGS webinar series
Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationDe novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly
Additional File 1 De novo metatranscriptome assembly and coral gene expression profile of Montipora capitata with growth anomaly Monika Frazier, Martin Helmkampf, M. Renee Bellinger, Scott Geib, Misaki
More informationB&DA Committee Bioinformatics and Data Analysis. PAG January 2016
B&DA Committee Bioinformatics and Data Analysis PAG January 2016 B&DA progress so far Aim: to define standard bfx pipelines for FAANG data Group has skyped many times We have a Wiki: hosted by EBI Sub-groups
More informationHow much sequencing do I need? Emily Crisovan Genomics Core
How much sequencing do I need? Emily Crisovan Genomics Core How much sequencing? Three questions: 1. How much sequence is required for good experimental design? 2. What type of sequencing run is best?
More informationGenomic DNA ASSEMBLY BY REMAPPING. Course overview
ASSEMBLY BY REMAPPING Laurent Falquet, The Bioinformatics Unravelling Group, UNIFR & SIB MA/MER @ UniFr Group Leader @ SIB Course overview Genomic DNA PacBio Illumina methylation de novo remapping Annotation
More informationRNA-Seq analysis workshop. Zhangjun Fei
RNA-Seq analysis workshop Zhangjun Fei Outline Background of RNA-Seq Application of RNA-Seq (what RNA-Seq can do?) Available sequencing platforms and strategies and which one to choose RNA-Seq data analysis
More informationNext Generation Sequencing Lecture Saarbrücken, 19. March Sequencing Platforms
Next Generation Sequencing Lecture Saarbrücken, 19. March 2012 Sequencing Platforms Contents Introduction Sequencing Workflow Platforms Roche 454 ABI SOLiD Illumina Genome Anlayzer / HiSeq Problems Quality
More informationSMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA
SMARTer Ultra Low RNA Kit for Illumina Sequencing Two powerful technologies combine to enable sequencing with ultra-low levels of RNA The most sensitive cdna synthesis technology, combined with next-generation
More informationIntroduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017
Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS
More informationIntroduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu
Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics NGS applications in cancer research Typical NGS
More informationFrom reads to results. Dr Torsten Seemann
From reads to results Dr Torsten Seemann AGRF/EMBL Introduction to Bioinformatics - Monash University - Wed 1 Aug 2012 What I will cover * NGS Applications Sequences Sequence quality Read file formats
More informationNOW GENERATION SEQUENCING. Monday, December 5, 11
NOW GENERATION SEQUENCING 1 SEQUENCING TIMELINE 1953: Structure of DNA 1975: Sanger method for sequencing 1985: Human Genome Sequencing Project begins 1990s: Clinical sequencing begins 1998: NHGRI $1000
More informationAbout Strand NGS. Strand Genomics, Inc All rights reserved.
About Strand NGS Strand NGS-formerly known as Avadis NGS, is an integrated platform that provides analysis, management and visualization tools for next-generation sequencing data. It supports extensive
More informationHaploid Assembly of Diploid Genomes
Haploid Assembly of Diploid Genomes Challenges, Trials, Tribulations 13 October 2011 İnanç Birol Assembly By Short Sequencing IEEE InfoVis 2009 2 3 in Literature ~40 citations on tool comparisons ~20 citations
More informationGene Finding Genome Annotation
Gene Finding Genome Annotation Gene finding is a cornerstone of genomic analysis Genome content and organization Differential expression analysis Epigenomics Population biology & evolution Medical genomics
More informationRNA- seq data analysis tutorial. Andrea Sboner
RNA- seq data analysis tutorial Andrea Sboner 2015-05- 21 NGS Experiment Data management: Mapping the reads CreaCng summaries Downstream analysis: the interes)ng stuff DifferenCal expression, chimeric
More informationSequence Annotation & Designing Gene-specific qpcr Primers (computational)
James Madison University From the SelectedWorks of Ray Enke Ph.D. Fall October 31, 2016 Sequence Annotation & Designing Gene-specific qpcr Primers (computational) Raymond A Enke This work is licensed under
More informationless sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput
Chapter 11: Gene Expression The availability of an annotated genome sequence enables massively parallel analysis of gene expression. The expression of all genes in an organism can be measured in one experiment.
More informationGene Identification in silico
Gene Identification in silico Nita Parekh, IIIT Hyderabad Presented at National Seminar on Bioinformatics and Functional Genomics, at Bioinformatics centre, Pondicherry University, Feb 15 17, 2006. Introduction
More informationVariant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4
WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics Alla L Lapidus, Ph.D. SPbSU St. Petersburg Term Bioinformatics Term Bioinformatics was invented by Paulien Hogeweg (Полина Хогевег) and Ben Hesper in 1970 as "the study of
More informationresequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics
RNA Sequencing T TM variation genetics validation SNP ncrna metagenomics private trio de novo exome mendelian ChIP-seq RNA DNA bioinformatics custom target high-throughput resequencing storage ncrna comparative
More informationGene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays
Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays Application Note Authors Bahram Arezi, Nilanjan Guha and Anne Bergstrom Lucas Agilent Technologies Inc. Santa
More informationTop 5 Lessons Learned From MAQC III/SEQC
Top 5 Lessons Learned From MAQC III/SEQC Weida Tong, Ph.D Division of Bioinformatics and Biostatistics, NCTR/FDA Weida.tong@fda.hhs.gov; 870 543 7142 1 MicroArray Quality Control (MAQC) An FDA led community
More informationRNA-Seq analysis using R: Differential expression and transcriptome assembly
RNA-Seq analysis using R: Differential expression and transcriptome assembly Beibei Chen Ph.D BICF 12/7/2016 Agenda Brief about RNA-seq and experiment design Gene oriented analysis Gene quantification
More informationHLA and Next Generation Sequencing it s all about the Data
HLA and Next Generation Sequencing it s all about the Data John Ord, NHSBT Colindale and University of Cambridge BSHI Annual Conference Manchester September 2014 Introduction In 2003 the first full public
More informationNext-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX
Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Technical Overview Introduction RNA Sequencing (RNA-Seq) is one of the most commonly used next-generation sequencing (NGS)
More informationNCBI web resources I: databases and Entrez
NCBI web resources I: databases and Entrez Yanbin Yin Most materials are downloaded from ftp://ftp.ncbi.nih.gov/pub/education/ 1 Homework assignment 1 Two parts: Extract the gene IDs reported in table
More information