EECS730: Introduction to Bioinformatics

Similar documents
Expressed genes profiling (Microarrays) Overview Of Gene Expression Control Profiling Of Expressed Genes

Microarray Technique. Some background. M. Nath

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

STATISTICAL CHALLENGES IN GENE DISCOVERY

Gene Expression Data Analysis

Gene expression. What is gene expression?

Measuring gene expression

Bioinformatics: Microarray Technology. Assc.Prof. Chuchart Areejitranusorn AMS. KKU.

3.1.4 DNA Microarray Technology

Measuring and Understanding Gene Expression

Introduction to Bioinformatics: Chapter 11: Measuring Expression of Genome Information

Measuring gene expression (Microarrays) Ulf Leser

Gene Expression Data Analysis (I)

Exploration and Analysis of DNA Microarray Data

Gene expression profiling experiments:

DNA/RNA MICROARRAYS NOTE: USE THIS KIT WITHIN 6 MONTHS OF RECEIPT.

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Gene Expression Technology

Moc/Bio and Nano/Micro Lee and Stowell

Optimizing crna fragmentation for microarray experiments using the Agilent 2100 bioanalyzer. Application Deborah Vitale.

6. GENE EXPRESSION ANALYSIS MICROARRAYS

Lecture #1. Introduction to microarray technology

Methods of Biomaterials Testing Lesson 3-5. Biochemical Methods - Molecular Biology -

FACTORS CONTRIBUTING TO VARIABILITY IN DNA MICROARRAY RESULTS: THE ABRF MICROARRAY RESEARCH GROUP 2002 STUDY

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology.

Normalization. Getting the numbers comparable. DNA Microarray Bioinformatics - #27612

Gene expression: Microarray data analysis. Copyright notice. Outline: microarray data analysis. Schedule

DNA Microarray Technology

Computing with large data sets

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

David M. Rocke Division of Biostatistics and Department of Biomedical Engineering University of California, Davis

Introduction to Bioinformatics. Fabian Hoti 6.10.

DNA Chip Technology Benedikt Brors Dept. Intelligent Bioinformatics Systems German Cancer Research Center

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

V10-8. Gene Expression

Introduction to Microarray Analysis

CS-E5870 High-Throughput Bioinformatics Microarray data analysis

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Analysis of Microarray Data

CAP BIOINFORMATICS Su-Shing Chen CISE. 10/5/2005 Su-Shing Chen, CISE 1

Announcements. Lecture 2: DNA Microarray Overview. Gene Expression: The Central Dogma. Talks. Go to class web page

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

Analysis of Microarray Data

ChIP-seq and RNA-seq. Farhat Habib

Microarray Data Analysis Workshop. Preprocessing and normalization A trailer show of the rest of the microarray world.

10.1 The Central Dogma of Biology and gene expression

MICROARRAYS: CHIPPING AWAY AT THE MYSTERIES OF SCIENCE AND MEDICINE

Microarray. Key components Array Probes Detection system. Normalisation. Data-analysis - ratio generation

Gene expression analysis: Introduction to microarrays

Lecture 22 Eukaryotic Genes and Genomes III

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods

STATISTICAL ANALYSIS OF 70-MER OLIGONUCLEOTIDE MICROARRAY DATA FROM POLYPLOID EXPERIMENTS USING REPEATED DYE-SWAPS

Motivation From Protein to Gene

Exploration and Analysis of DNA Microarray Data

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

Introduction to gene expression microarray data analysis

Microarray Experiment Design

Bioinformatics for Biologists

Deoxyribonucleic Acid DNA

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

Human Genomics. 1 P a g e

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods

Computational Biology I

ChIP-seq and RNA-seq

Exam 1 from a Past Semester

Gene expression analysis. Gene expression analysis. Total RNA. Rare and abundant transcripts. Expression levels. Transcriptional output of the genome

Lecture 11 Microarrays and Expression Data

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

1. Introduction Gene regulation Genomics and genome analyses

New Statistical Algorithms for Monitoring Gene Expression on GeneChip Probe Arrays

The Agilent Total RNA Isolation Kit

Design. Construction. Characterization

Introduction to Bioinformatics and Gene Expression Technology

Introduction to microarrays

Selected Techniques Part I

Some Principles for the Design and Analysis of Experiments using Gene Expression Arrays and Other High-Throughput Assay Methods

Granulometric Analysis of Spots in DNA Microarray Images

Gene Expression Analysis with Pathway-Centric DNA Microarrays

Chapter 1. from genomics to proteomics Ⅱ

Analysis of Microarray Data

In vivo genome-wide profiling of RNA secondary structure reveals novel regulatory features

7.03, 2006, Lecture 23 Eukaryotic Genes and Genomes IV

7.05, 2005, Lecture 23 Eukaryotic Genes and Genomes IV

Quantitative Real Time PCR USING SYBR GREEN

Microarray Informatics

Preprocessing Methods for Two-Color Microarray Data

Roche Molecular Biochemicals Technical Note No. LC 10/2000

COS 597c: Topics in Computational Molecular Biology. DNA arrays. Background

Preprocessing of Microarray data I: Normalization and Missing Values. For example, a list of possible sources in spotted arrays:

Tool for the identification of differentially expressed genes using a user-defined threshold

Introduction to Molecular Biology

less sensitive than RNA-seq but more robust analysis pipelines expensive but quantitiatve standard but typically not high throughput

Microarray pipeline & Pre-processing

Analysis of Microarray Data

Microarray Informatics

Transcription:

EECS730: Introduction to Bioinformatics Lecture 14: Microarray Some slides were adapted from Dr. Luke Huan (University of Kansas), Dr. Shaojie Zhang (University of Central Florida), and Dr. Dong Xu and Trupti Joshi (University of Missouri Columbia)

Review of Central Dogma Gene Expression mrna level Protein level

Gene expression profile Gene expression profile represents a specific state of the cell

Microarray Profile the DNA transcription level Microarrays measure the activity (expression level) of the genes under varying conditions/time points Expression level is estimated by measuring the amount of mrna for that particular gene A gene is active if it is being transcribed More mrna usually indicates more gene activity

Information we can measure nucleus cytosol inactive mrna DNA transcriptional control Primary RNA transcript RNA processing control Microarray mrna Mass-spec protein mrna RNA transport control nucleus membrane mrna mrna degradation control translation control protein activity control protein inactive protein

Applications Gene discovery Biological mechanisms (gene regulatory network, etc.) Disease diagnosis (cancer, infectious disease, etc.) Drug discovery: Pharmacogenomics Toxicological research: Toxicogenomics Microbial diversity in the environment

Caveat mrna levels and protein levels are not always directly correlated. Translational control But we roughly get ~50-70% correlation Measuring mrna is much cheaper!!!

Microarray technology Typically a glass slide with cdna or oligo Printed by robot or synthesized by photolithography. Typical arrays are 25x75 mm. Contains up to 500,000 probed gene fragments.

Microarray technology

Hybridization

The Chip is scanned

Summary cdna clones (probes) PCR product amplification purification laser 2 scanning laser 1 printing mrna target) emission overlay images and normalize microarray Hybridise target to microarray analysis

What does the data look like Green: expressed only from control Red: expressed only from experimental cell Yellow: equally expressed in both samples Black: NOT expressed in either control or experimental cells

What does the data look like begin with a data matrix (gene expression values versus samples) Typically, there are many genes (> 10,000) and few samples (~ 10) The log2 ratio

log2 transformation Logarithm base 2 transformation, has the advantage of producing a continuous spectrum of values and treating up and down regulated genes in a similar fashion. The logarithms of the expression ratios are also treated symmetrically, such that genes up regulated by a factor of 2 has a log2(ratio) of 1, gene down regulated by a factor of 2 has a log2(ratio) of 1, gene expressed at a constant level (ratio of 1) has a log2(ratio) equal to zero.

The data needs to be normalized Unequal quantities of starting RNA Differences in labeling Differences in detecting efficiencies between the fluorescent dyes Scanning saturation Systematic biases in the measured expression levels

Normalization by total intensity G i and R i are the measured intensities for the ith array element Log2(T i ) is the normalized value

Normalizing the quenching effect quenching (a phenomenon where dye molecules in close proximity, re-absorb light from each other, thus diminishing the signal) The log ratio is also dependent of the absolute values of the intensities

Normalize the quenching effect You can view the intensity of a given gene is a linear combination of the quenching effect, its true expression change, and measurement error. Genes that are adjacent to each other should have similar strength of quenching effect, but we can assume that they have independent expression change and measurement error. So we can perform linear regression on a set of adjacent genes in the graph, depict the potential quenching effect, and normalize it.

Normalizing the quenching effect LOWESS (locally weighted scatterplot smoothing) regression Normalize the value point by point

Normalizing the quenching effect

Statistical analysis of significance What can we tell if we find a gene whose expression is upregulated by two fold between two samples? Unfortunately nothing (at least this is what the statistician would argue) Biological variation / measure errors

Is the gene significantly differentially expressed??? Rank results by confidence with significance metrics (e.g. p-value) Estimate the false positive (Type I errors) and false negatives (Type II errors) Achieve the desired balance of sensitivity and specificity Result in a certain amount of flexibility (and arbitrariness) when interpreting significance metrics generated by a test

T-test Paired t test: the size of two groups should be same Comparison for organism before or after treatment (before and after heat shock) Unpaired t test: the size of two groups do not need to be same Comparison between organisms with treatment or non-treatment Assume equal variance (otherwise use Welch s test)

T-test Paired T-test Un-Paired T-test

Example Paired T test Unpaired T test

Wilcoxon Signed-Rank Test Use if sample is not distributed normally Similar to paired T test but non-parametric Rank the absolute difference between arrays, i.e. If the difference between two pairs is 0, the value is not used If the difference is identical between 2 pairs, the average rank of the two groups is used Compute W value using Look up Wilcoxon Table for significance value

Mann-Whitney Test Use if sample is not distributed normally Similar to non-paired T test but non-parametric Use the rankings of the numerical values instead of variance Take the less U value and look up table for significance

Mann-Whitney Test

Measure of performance of prediction -wikipedia

Multiple testing Statistical hypothesis testing is based on rejecting the null hypothesis if the likelihood of the observed data under the null hypotheses is low. If multiple comparisons are done or multiple hypotheses are tested, the chance of a rare event increases, and therefore, the likelihood of incorrectly rejecting a null hypothesis (i.e., making a Type I error) increases. The development of "high-throughput" sciences, such as genomics, allowed for rapid data acquisition. -wikipedia

Correction Bonferroni Correction: Reject the null hypothesis if False discovery rate: Benjamini Hochberg procedure And many more other correction methods -wikipedia