Background Reading and tools: Proposal:

Similar documents
Parts of a standard FastQC report

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Transcriptome analysis

ChIP-seq and RNA-seq. Farhat Habib

(A) Extrachromosomal DNA (B) RNA found in bacterial cells (C) Is part of the bacterial chromosome (D) Is part of the eukaryote chromosome

Applications of RNA -seq in Microbiology

Introduction to RNA-Seq in GeneSpring NGS Software

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Module 6 Microbial Genetics. Chapter 8

Genome annotation & EST

From Genotype to Phenotype

RNA-Seq Analysis. Simon Andrews, Laura v

Bi 8 Lecture 4. Ellen Rothenberg 14 January Reading: from Alberts Ch. 8

Protein Synthesis: From Gene RNA Protein Trait

Goals of pharmacogenomics

Biosc10 schedule reminders

Deep Sequencing technologies

Sequence Analysis Lab Protocol

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

ChIP-seq and RNA-seq

Next Generation Sequencing

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

Integrated NGS Sample Preparation Solutions for Limiting Amounts of RNA and DNA. March 2, Steven R. Kain, Ph.D. ABRF 2013

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

Bi 8 Lecture 5. Ellen Rothenberg 19 January 2016

DNA and RNA. Chapter 12

dbcamplicons pipeline Amplicons

DNA Function. DNA Heredity and Protein Synthesis

Experimental Design. Sequencing. Data Quality Control. Read mapping. Differential Expression analysis

Measuring and Understanding Gene Expression

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

Biology 201 (Genetics) Exam #3 120 points 20 November Read the question carefully before answering. Think before you write.

Solutions to Quiz II

Novel methods for RNA and DNA- Seq analysis using SMART Technology. Andrew Farmer, D. Phil. Vice President, R&D Clontech Laboratories, Inc.

From DNA to Protein: Genotype to Phenotype

How to deal with your RNA-seq data?

SUPPLEMENTARY INFORMATION

Genomics and Gene Recognition Genes and Blue Genes

CBC Data Therapy. Metatranscriptomics Discussion

Differential gene expression analysis using RNA-seq

Total RNA isola-on End Repair of double- stranded cdna

Videos. Lesson Overview. Fermentation

SMRT-Cappable-seqreveals complex operon variants in bacteria

Unit 3c. Microbial Gene0cs

Annotating the Genome (H)

Transcriptome Assembly, Functional Annotation (and a few other related thoughts)

Gene is the basic physical and functional unit of heredity. A Gene, in molecular terms,

Analysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)

1. ADHERE AND DEFEND: Our bacterium has entered the host. Now it needs to adhere and get past the normal microbiota.

RNA-Seq de novo assembly training

Chapter 6 MOLECULAR BASIS OF INHERITANCE

Supplement to: The Genomic Sequence of the Chinese Hamster Ovary (CHO)-K1 cell line

AP Biology Gene Expression/Biotechnology REVIEW

NGS part 2: applications. Tobias Österlund

TECH NOTE Stranded NGS libraries from FFPE samples

Introduction of RNA-Seq Analysis

Name 10 Molecular Biology of the Gene Test Date Study Guide You must know: The structure of DNA. The major steps to replication.

BA, BSc, and MSc Degree Examinations

RNA-Sequencing analysis

CHAPTER 21 LECTURE SLIDES

Guidelines Analysis of RNA Quantity and Quality for Next-Generation Sequencing Projects

DNA, RNA & Proteins Chapter 13

METAGENOMICS. Aina Maria Mas Calafell Genomics

Analysis of RNA-seq Data. Bernard Pereira

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

RNA-Seq with the Tuxedo Suite

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

NCEA Level 2 Biology (91159) 2017 page 1 of 6. Achievement Achievement with Merit Achievement with Excellence

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Carl Woese. Used 16S rrna to develop a method to Identify any bacterium, and discovered a novel domain of life

Videos. Bozeman Transcription and Translation: Drawing transcription and translation:

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

Supplementary Materials for

Wet-lab Considerations for Illumina data analysis

Biology 3201 Genetics Unit #5

MCB 102 University of California, Berkeley August 11 13, Problem Set 8

Workflows and Pipelines for NGS analysis: Lessons from proteomics

The New Genome Analyzer IIx Delivering more data, faster, and easier than ever before. Jeremy Preston, PhD Marketing Manager, Sequencing

7.014 Problem Set 4 Answers to this problem set are to be turned in. Problem sets will not be accepted late. Solutions will be posted on the web.

RNA-Seq Software, Tools, and Workflows

Introduction to Algorithms in Computational Biology Lecture 1

How to Use This Presentation

Ch. 10 Notes DNA: Transcription and Translation

DESIGNER GENES - BIOTECHNOLOGY

What is Bioinformatics?

11/17/14. Why would scientist want to make a mouse glow?

02 Agenda Item 03 Agenda Item

Biology A: Chapter 9 Annotating Notes Protein Synthesis

DNA Evolution of knowledge about gene. Contains information about RNAs and proteins. Polynucleotide chains; Double stranded molecule;

About Strand NGS. Strand Genomics, Inc All rights reserved.

How much sequencing do I need? Emily Crisovan Genomics Core September 26, 2018

Wake Acceleration Academy - Biology Note Guide Unit 5: Molecular Genetics

LECTURE 20. Repeated DNA Sequences. Prokaryotes:

B.Sc V Semester Question Bank. B.Sc V Semester Question Bank. Prepared by Nitin Swamy, Department of Biotechnology, SACJ Page 1 of 6

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

dbcamplicons pipeline Amplicons

The gene. Fig. 1. The general structure of gene

G E N OM I C S S E RV I C ES

Next- genera*on Sequencing. Lecture 13

Transcription:

Background Reading and tools: 1. Ivanov II, Atarashi K, Manel N, Brodie EL, Shima T, Karaoz U, Wei D, Goldfarb KC, Santee CA, Lynch SV, Tanoue T, Imaoka A, Itoh K, Takeda K, Umesaki Y, Honda K, Littman DR. Induction of intestinal Th17 cells by segmented filamentous bacteria. Cell. 2009 Oct 30;139(3):485-98. 2. Sczesnak A, Segata N, Qin X, Gevers D, Petrosino JF, Huttenhower C, Littman DR, Ivanov II. The genome of th17 cell-inducing segmented filamentous bacteria reveals extensive auxotrophy and adaptations to the intestinal environment. Cell Host Microbe. 2011 Sep 15;10(3):260-72. 3. Prakash T, Oshima K, Morita H, Fukuda S, Imaoka A, Kumar N, Sharma VK, Kim SW, Takahashi M, Saitou N, Taylor TD, Ohno H, Umesaki Y, Hattori M. Complete genome sequences of rat and mouse segmented filamentous bacteria, a potent inducer of th17 cell differentiation. Cell Host Microbe. 2011 Sep 15;10(3):273-84. 4. Sonnenburg JL, Xu J, Leip DD, Chen CH, Westover BP, Weatherford J, Buhler JD, Gordon JI. Glycan foraging in vivo by an intestine-adapted bacterial symbiont. Science. 2005 Mar 25;307(5717):1955-9. 5. Perkins TT, Kingsley RA, Fookes MC, Gardner PP, James KD, et al. (2009) A Strand-Specific RNA Seq Analysis of the Transcriptome of the Typhoid Bacillus Salmonella Typhi. PLoS Genet 5(7): e1000569. 6. Langmead et al. Genome Biology 2010, 11:R83; http://genomebiology.com/content/11/8/r83 7. MATLAB RNA SEQ ANALYSIS DEMO: http://www.mathworks.com/products/bioinfo/demos.html?file=/products/demos/shipping/bioinfo/rnas eqdedemo.html 8. JAVA app that provides RNA seq QC: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/ Proposal: The human gut, which is sterile at birth, is colonized by a large array of microbes over the course of life. With 500-1000 resident bacterial species, there are many complex and diverse interactions that occur between resident microbes as well as between microbes and the host. Segmented Filamentous Bacteria (SFB) are Gram-positive anaerobic bacteria which are currently unculturable and residents of the small intestine of various mammals. Their presence is associated with a reduction in the growth and colonization of pathogenic bacteria (Garland et al., 1982 and Heczko et al. 2000). SFB not only impact the microbiota, but SFB also promote the development of the normal mouse immune system by inducing numerous pro- and anti-inflammatory T cell responses as well as B cell responses, including IgA secretion, without causing any harm to the host (Ivanov et al. 2009). Thus understanding the mechanisms used by SFB to alter host biology and microbe-microbe interactions may inform causes and treatments for irritable bowel syndrome (IBS), pathogen resistance, as well as the gut immune development. SFB have been shown to adhere to the intestinal epithelium, a niche which typically contains few bacteria. This intimate association with the host may contribute to the ability of SFB to affect the colonization of other members of the microbiota, including pathogens, and may contribute to their unique and specific ability to affect the host s immune development. However it is unclear how SFB are able to localize to this privy niche. The unique nature of SFB provides a platform to query some important questions in the field including: what are the minimal gene sets or pathways needed to survive in gut and what are the mechanisms used by commensals to modulate immune system and affect other members of the microbiota. To begin to answer some of these questions, we will utilize

germ-free mice which lack a microbiota, and can be colonized with defined populations of bacteria, generating gnotobiotic mice. We will colonize two strains of germ-free mice with SFB in the presence and absence of Bacteroides thetaiotaomicron (B. theta), a human symbiont which can cleave and consume mucin glycans and also alters the mouse intestinal niche in numerous ways including inducing fucosylation of epithelial and mucin glycans. To understand how SFB colonize their unique niche, we will conduct transcriptional profiling studies by RNA sequencing of SFB and B. theta under each in vivo condition. By transcriptionally profiling SFB, we can identify the key pathways up regulated for life in the gut as well as candidate mechanisms used by commensals to affect the host s immune system and to impact each others survival in the gut. RNA sequencing studies have been conducted for bacteria (Perkins et. al. 2009) as well as eukaryotic organisms (Langmead et al. Genome Biology 2010), and it involves using Illumina platforms to sequence the cdna created from a sample s mrna (see figure below). Briefly RNA is extracted, cleaned up by the removal of contaminating DNA as well as ribosomal RNA (which can account for ~80% of the sample), converted to cdna, then a library is generated and sequenced. The programs created in this project will help not only to answer some of key biological questions in our datasets, but will also address important issues that researchers face when analyzing RNA sequencing data. For instance, there is no consensus on the best or most efficient way to normalize RNA seq data or compare samples to each other. Also, no program allows one to look at whether there are certain operons or pathways being differentially expressed across samples. Ultimately, the programs that created in this project will help to streamline the data analysis process not only for this project, but also for many others!

Weekly plan: Week 1: Choose project My project will include two main datasets and a third conceptual dataset: 1. SFB dataset: 4 samples of SFB mono-colonized mice and 4 samples from SFB and B. theta bi-colonized mice. 8 Total samples. This is the main dataset we want to work with to address biological questions! 2. CS dataset: For validation and other work, I also have 1 sample from a CS monocolonized mouse and one sample from a CS + BT bi-colonized mouse. 2 Total samples. 3. 3 rd fake/tweakable dataset: For proof of concepts and working on programs, we can make our own simplified dataset. Chop up a small genome and insert other sequences, errors. Week 2: Reading/understanding project and getting comfortable with the data 1. Go over protocol for getting biological samples, sequencing, what RNA seq is, how RNA seq can be used, and what the needs are in the field, specifically for prokaryotes. 2. Go through a CLC Genomics Workbench (program available through CMGM) analysis: a. Understand the formats: input and output. b. Potential for programming, what is expected, what is needed. c. Use data generated by CLC Genomics Workbench as a starting-off point (let s not reinvent the wheel)! Week 3: Pre-data analysis 1. Determine what the minimal read length is required to map uniquely to a genome. Generate a theoretical genome of size x and a read length of y base pairs plot. Create a graph/dot plot. 2. Write programs to see how different parameters affect read mapping to one or more reference genomes. Generate a graph or dotplot demonstrating the false discovery rate and uniquely mapped reads for each. Also evaluate what would be the optimal parameter per condition. a. How does the sequence length affect the ability to map reads (36 vs. 72 bp length reads)? b. How does allowing for sequence mismatches affect the ability to map reads (0 vs. 1 vs.2 mismatches)? c. How does having paired-end vs. single-end reads affect the ability to map reads? d. What if parts of the genomes are highly similar? What if the genomes are very different? e. How do you deal with reads that don t uniquely map, i.e. map to more than one gene? Week 4: Perform quality control on RNA seq data 1. Can you identify whether the sequencing data consist of high quality reads? *JAVA Fastqc app a. Evaluate the quality scores that were assigned by the sequencing facility

i. Exclude low-quality scores b. Evaluate the sequences for potential sequencing biases or artifacts i. Nucleotide bias, GC bias, etc. 2. Can you write a program to identify what the range of expression is per gene in each sample? a. How does RNA contamination (rrna and trna) affect the signal per sample (graph)? How much RNA contamination would be too much? *spike rrna into your fake dataset* 3. Write a program to normalize the samples so that they can be compared to each other. a. Try excluding vs. including rrna sequences as well as sequences that map to other genomes or don t map uniquely. b. Try raw counts vs. RPKM values. c. Justify which method is best, what you would choose and why! 4. Use Matlab to average 4 samples together and determine the mean and standard deviation for each gene expression. a. Can you determine a cutoff for background signal vs. differences in gene expression? Week 5: Preliminary analysis of RNA seq data: Can you write programs or Matlab GUIs to answer the following questions: 1. What genes are being expressed highly in all of the in vivo conditions by SFB, i.e. what genes/pathways/metabolic processes appear to be important for living in vivo in the SFB niche? a. What are the most highly expressed genes in vivo? 2. What genes are being expressed differentially between the conditions, i.e. what genes/pathways are important for SFB functionally in different conditions? a. What are the most highly expressed genes in mono-association as compared to biassociation? b. What are the most highly expressed genes in bi-association as compared to monoassociation? 3. SFB is not a clonal population. Can you identify gene variants, SNPs, mutations, etc., and are they differentially expressed? a. We have DNA sequencing data suggesting that the SFB population contains multiple variants of flagellin. Can you detect these variants in the RNA seq data, and is one variant expressed more highly than the other? Week 6: Secondary analysis of RNA seq data: 1. Generate a program to identify highly expressed operons (based on genome proximity). What are the most highly expressed operons? 2. Which non-orfs are highly expressed (small RNAs)? 3. Can you identify potential immune modulators or microbiota modulators from the transcriptional data? Which genes are cell surface or secreted? Which genes resemble toxins?

Week 7: Conquer the biology/functionality: Using pathway tools, kegg (a database of metabolic maps and pathways), biocyc (collections of metabolic pathways, reactions, enzymes), let s analyze differences in metabolic pathways. Let s perform some functional analyses of the transcriptome! 1. Compare the list of pathways and metabolic processes to the genome as a whole. 2. Compare the representation of COG Functional Categories in the expression datasets compared to the genome as a whole. Is there an over or under-representation of certain COG categories in the expression data? Can you write a program to determine whether there is statistical significantly different utilization of certain COG categories? Remember to normalize to genome content. 3. Any metabolic pathways overrepresented in the transcriptional data? Between the two sets of samples? What pathways are utilized differentially by SFB in the mono and bi-associated conditions? 4. Can you identify any biases in the location of the genes that are highly expressed across the genome, or are these genes randomly distributed across the genome? Are there any biases as far as PSORT predictions? 5. Can you come up with a method to look at all of these different variables together and give an overall ranking of important genes/pathways/operons? Ideally this would narrow your list of genes to the most important ones. Think about how much weight each analysis should be given, possibly come up with a ranking system. Think about clustering your outputs by proximity (operons) + overlapping pathways between genes Week 8: Discuss biological implications, future studies, and interesting findings. 1. What are the 3 most exciting findings you made in this project and why do you think they are exciting? 2. What are some explanations for differences you have found in the transcriptional expression between the samples? 3. If there are differences in metabolic pathways, how can one validate them experimentally? 4. What implications does this have for the biology of the gut microbiota and the host? Week 9: Write up findings