STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays. Materials are from

Similar documents
Preprocessing Affymetrix GeneChip Data. Affymetrix GeneChip Design. Terminology TGTGATGGTGGGGAATGGGTCAGAAGGCCTCCGATGCGCCGATTGAGAAT

DNA Microarray Data Oligonucleotide Arrays

Gene Expression Technology

Lecture #1. Introduction to microarray technology

DNA Microarray Technology

Introduction to BioMEMS & Medical Microdevices DNA Microarrays and Lab-on-a-Chip Methods

DNA Arrays Affymetrix GeneChip System

Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.

Introduction to gene expression microarray data analysis

Technical Note. Performance Review of the GeneChip AutoLoader for the Affymetrix GeneChip Scanner Introduction

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Predicting Microarray Signals by Physical Modeling. Josh Deutsch. University of California. Santa Cruz

Outline. Array platform considerations: Comparison between the technologies available in microarrays

EECS730: Introduction to Bioinformatics

Data Sheet. GeneChip Human Genome U133 Arrays

Bioinformatics III Structural Bioinformatics and Genome Analysis. PART II: Genome Analysis. Chapter 7. DNA Microarrays

Microarray Technique. Some background. M. Nath

SPH 247 Statistical Analysis of Laboratory Data

6. GENE EXPRESSION ANALYSIS MICROARRAYS

Introduction to microarrays. Overview The analysis process Limitations Extensions (NGS)

Soybean Microarrays. An Introduction. By Steve Clough. November Common Microarray platforms

Original summary generated in 2003 Updated in 2007

Microarrays: since we use probes we obviously must know the sequences we are looking at!

DNA Microarray Data Analysis and Mining: Affymetrix Software Package and In-House Complementary Packages

SIMS2003. Instructors:Rus Yukhananov, Alex Loguinov BWH, Harvard Medical School. Introduction to Microarray Technology.

Technical Review. Real time PCR

Humboldt Universität zu Berlin. Grundlagen der Bioinformatik SS Microarrays. Lecture

3.1.4 DNA Microarray Technology

What is a microarray

Please purchase PDFcamp Printer on to remove this watermark. DNA microarray

AFFYMETRIX GENECHIP HYBRIDIZATION ANALYSIS (Updated: April 19, 2007) Experimental Organs:

GeneChip Eukaryotic Small Sample Target Labeling Assay Version II *

Introduction to Bioinformatics and Gene Expression Technologies

Gene Signal Estimates from Exon Arrays

What does PLIER really do?

Introduction to Bioinformatics. Fabian Hoti 6.10.

Description of Logit-t: Detecting Differentially Expressed Genes Using Probe-Level Data

Analyzing Affymetrix GeneChip SNP 6 Copy Number Data in Partek

Fatchiyah

Methods of Biomaterials Testing Lesson 3-5. Biochemical Methods - Molecular Biology -

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

Frequently Asked Questions

Parameter Estimation for the Exponential-Normal Convolution Model

Optimized in situ construction of oligomers on an array surface

Use of DNA microarrays, wherein the expression levels of

DNA Microarray Experiments: Biological and Technological Aspects

Chromosome Analysis Suite 3.0 (ChAS 3.0)

Computational Biology I LSM5191

Using Low Input of Poly (A) + RNA and Total RNA for Oligonucleotide Microarrays Application

Gene Regulation Solutions. Microarrays and Next-Generation Sequencing

Analysis of Microarray Data

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

ADVANCED STATISTICAL METHODS FOR GENE EXPRESSION DATA

Agilent Genomic Workbench 7.0

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

Outline. Analysis of Microarray Data. Most important design question. General experimental issues

Introduction to Microarray Data Analysis and Gene Networks. Alvis Brazma European Bioinformatics Institute

Identifying Candidate Informative Genes for Biomarker Prediction of Liver Cancer

Calculation of Spot Reliability Evaluation Scores (SRED) for DNA Microarray Data

Quantitative Real Time PCR USING SYBR GREEN

Microarray. Slide Selection Chart... J2. Epoxide-coated Slides... J3. GAPS II-coated Slides... J5. Corning Cover Glass... J6

Molecular Cell Biology - Problem Drill 11: Recombinant DNA

Spotted DNA Array Design. Todd Lowe Bio 210 Jan 13 & 15, 2003

Class Information. Introduction to Genome Biology and Microarray Technology. Biostatistics Rafael A. Irizarry. Lecture 1

Intelligent DNA Chips: Logical Operation of Gene Expression Profiles on DNA Computers

University of Groningen

Exploration and Analysis of DNA Microarray Data

Improving CRISPR-Cas9 Gene Knockout with a Validated Guide RNA Algorithm

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Analysis of Microarray Data

Cancer Detection and Prevention 27 (2003)

Mentor: Dr. Bino John

Universal Gene Expression Analysis with Combinatorial Arrays

GeneChip TM WT Terminal Labeling and Hybridization User Manual

BABELOMICS: Microarray Data Analysis

DNA CHIPS- Technology and Utility

Supporting Online Material for

Enhancers mutations that make the original mutant phenotype more extreme. Suppressors mutations that make the original mutant phenotype less extreme

Expression Array System

Protein Synthesis. DNA to RNA to Protein

Philippe Hupé 1,2. The R User Conference 2009 Rennes

Introduction to Molecular Biology

DNA Microarrays Introduction Part 2. Todd Lowe BME/BIO 210 April 11, 2007

Functional Genomics Overview RORY STARK PRINCIPAL BIOINFORMATICS ANALYST CRUK CAMBRIDGE INSTITUTE 18 SEPTEMBER 2017

New Stringent Two-Color Gene Expression Workflow Enables More Accurate and Reproducible Microarray Data

Estimating Cell Cycle Phase Distribution of Yeast from Time Series Gene Expression Data

Relative Quantification (Mono-Color) Unknown samples (purified total RNA, mrna or cdna or genomic DNA)

Quality Measures for CytoChip Microarrays

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

PrimePCR Assay Validation Report

High-Resolution Oligonucleotide- Based acgh Analysis of Single Cells in Under 24 Hours

Analysis of Biological Sequences SPH

CAP 5510: Introduction to Bioinformatics CGS 5166: Bioinformatics Tools. Giri Narasimhan

M Keramatipour 2. M Keramatipour 1. M Keramatipour 4. M Keramatipour 3. M Keramatipour 5. M Keramatipour

2/5/16. Honeypot Ants. DNA sequencing, Transcriptomics and Genomics. Gene sequence changes? And/or gene expression changes?

Agilent s Mx3000P and Mx3005P

including, but not limited to:

Feature Selection of Gene Expression Data for Cancer Classification: A Review

ARTICLES. Direct multiplexed measurement of gene expression with color-coded probe pairs

Microarray Gene Expression Analysis at CNIO

Transcription:

STATC 141 Spring 2005, April 5 th Lecture notes on Affymetrix arrays Materials are from http://www.ohsu.edu/gmsr/amc/amc_technology.html The GeneChip high-density oligonucleotide arrays are fabricated by using in-situ synthesis of short oligonucleotide sequences on a small glass chip using light directed synthesis. This technique allows for the precise construction of a highly ordered matrix of DNA oligomers on the chip. In the GeneChip system a known gene or potentially expressed sequence is represented on the chip by 11-20 unique oligomeric probes, each 25 bases in length. The group of probes corresponding to a given gene or small group of highly similar genes is known as the probe set and generally spans a region of about 600 bases, known as the target sequence. Many copies of each oligomer are synthesized in discrete features (or cells) on the GeneChip array. In addition, for each oligomer on the array there is a matched oligomer, synthesized in an adjacent cell that is identical with the exception of a mismatched base at the central position (i.e. base 13). These are designated Perfect Match (PM) and Mismatch (MM) probes, respectively. The MM probes serves as a control for non-specific hybridization.

Appendix (optional) Assay Overview The GeneChip arrays are scanned and the images processed using Affymetrix software, Microarray Suite (MAS 5.0). For more information on the GeneChip expression assay, please see http://www.affymetrix.com/support/technical/manual/expression_manual.affx

Data Overview Affymetrix GeneChip experiments are managed with the Affymetrix Microarray Suite (MAS 5.0) software. The MAS software interfaces with equipment to run a probe array experiment and is also used to generate preliminary analysis data from an experiment. Below we cover the basics of files generated by MAS 5.0 and also explain some of the most widely used variables generated by MAS. MAS File Types There are five file types that MAS 5.0 generates during the process of a GeneChip Array experiment. They are as follows: Experiment File *.EXP: This file contains the parameters of the experiment such as Probe Array Type, Experiment Name, Equipment parameters, Sample Description, and others. This file is not used for analysis, but is required to open other MAS files for the designated chip experiment. Image Data File *.DAT: This file is the image file generated by the scanner from the Probe Array after processing on the Fluidics Station. This file can be viewed in MAS 5.0 or exported as a *.TIFF image. This file is used in MAS 5.0 to generate the *.CEL file (see below). Cell Intensity File *.CEL: The cell file contains the processed cell intensities from the primary image in the *.DAT file. The cell file can be viewed in MAS 5.0, but cannot be exported. The cell file is used by MAS 5.0 to generate the *.CHP file, which contains the numerical data from the *.DAT, and *.CEL files. Probe Array Results File *.CHP: The chip file is the output file from the MAS expression analysis of the Probe Array. The chip file contains the data that will be used for statistical analysis and data mining analysis. Report File *.RPT: The report file is generated from the chip file. This expression report summarizes information about expression analysis settings and probe set hybridization intensity data. MAS Analysis Metrics Signal: a measure of the abundance of transcript Detection: the call that indicates whether the transcript is detected (P present), undetected ( A, absent), or at the limit of detection (M, marginal). Detection p-value: p-value that indicates the significance of the detection call. Signal Log Ratio: the change in expression level of a transcript between a baseline and an experiment array. This change is expressed as the log2 ratio. A log2 ratio of 1 is equal to a fold change of 2. Change: the call that indicates the change in the transcript level between a baseline and experiment (increase (I), marginal increase (MI), no change (NC), marginal decrease (MD), decrease (D)). Change p-value: p-value that indicates the significance of the change call. Each probe set on a GeneChip array has a unigue name known as the Probe set ID. Probe set ID's have different extensions that denote important information about how the probe set was designed.. The nomenclature for the probe set extensions are below.

Probe Set Extension Nomenclature All probe sets have one of the following two extensions: _at : anti-sense target (most probe sets on the array) _st : sense target (only some control probes are in sense orientation on the array) A few probe sets are designated as follows: _i : reduced number of pairs in the probe set. Some probe sets represent more than one gene or EST: _s_at : designates probe sets that share common probes among multiple transcripts from different genes. _a_at : designates probe sets that recognize multiple alternative transcripts from the same gene (on HG-U133 these probe sets have an "_s" suffix). _x_at : designates probe sets where it was not possible to select either a unique probe set or a probe set with identical probes among multiple transcripts. Rules for cross-hybridization were dropped. Therefore, these probe sets may cross-hybridize in an unpredictable manner with other sequences. _g_at : similar genes, also unique probe sets elswhere on the array. _f_at : similarity rules dropped, probe set will recognize more than one gene. _i_at : designates sequences for which there are fewer than the required numbers of unique probes specified in the design. _b_at : all probe selection rules were ignored. Withdrawn from GenBank. _l_at : sequence represented by more than 20 probe pairs. _r_ : designates sequences for which it was not possible to pick a full set of unique probes using Affymetrix probe selection rules. Probes were picked after dropping some of the selection rules. Most of the descriptions for the probe set ID extensions above were taken from the Affymetrix GeneChip Expression Analysis Data Analysis Fundamentals. Glossary of Analysis Terms Target: Fragmented, biotinylated anti-sense crna prepared from mrna to be analyzed. Target molecules are hybridized to the probe array and the levels of hybridization are measured with the GeneArray scanner after the array is stained with streptavidin-phycoerythrin (SAPE). Probe: Single-stranded DNA oligonucleotide synthesized directly on the surface of the GeneChip array using photolithography and combinatorial chemistry. The 25 base oligonucleotide is designed to be complementary to a specific gene transcript. Probe Cell: Single square-shaped feature on an array containing probes with a unique sequence. The size can vary depending on the array type, typically 20 µm or 18 µm. Each probe cell contains millions of probe molecules. Perfect Match (PM): Probes that are designed to be complementary to a reference sequence. Mismatch (MM): Probes that are designed to be complementary to a reference sequence except for a homomeric mismatch at the central position (e.g., 13th position of 25 base probe. A->T or G->C). Mismatch probes serve as a control for cross-hybridization. Probe Pair: Two probe cells, a PM and its corresponding MM. On the probe array, a probe pair is arranged with

a PM cell directly above a MM cell. Probe set: A set of probes designed to detect one transcript. A probe set usually consists of 11-20 probe pairs. For example, an 11 probe pair set is made up of 11 PM probes and 11 MM probes for a total of 22 probe cells. Newer array designs from Affymetrix, e.g., HG-U133, contain probe sets with 11 probe pairs. Older designs have average probe set numbers of 16 or 20 probe pairs. Target Sequence: The portion of a transcript reference sequence that is interrogated by a probe set on the array. The target sequence extends from the first base of the most 5 probe to the last base of the most 3 probe. Absolute Analysis: This is an analysis of a single GeneChip array using Affymetrix Microarray Suite software. The software applies an algorithm developed by Affymetrix to determine the expression level for each gene represented on the array. Analysis Metrics: Probe set performance descriptors calculated by the software from measured probe cell intensities. Analysis metrics are used to determine biologically meaningful results, such as the presence or absence of gene transcripts. Analysis Parameters: Variables with user-defined values used in the expression analysis (default values in the software are empirically determined at Affymetrix). *More extensive glossaries can be found in Statistical Algorithms Reference Guide and Data Analysis Fundamentals, available on the Affymetrix website (www.affymetrix.com).