Integration of heterogeneous omics data
|
|
- Colin Ambrose McCarthy
- 6 years ago
- Views:
Transcription
1 Integration of heterogeneous omics data Andrea Rau March 11, 2016 Formation doctorale: Biologie expe rimentale animale et mode lisation pre dictive Integration of heterogeneous omics data 1 / 36
2 Introduction Outline 1 Introduction Integromics Example data: TCGA multi-omics data 2 Descriptive integration with multiple factor analysis 3 Clustering integration with icluster+ 4 Discussion andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 2 / 36
3 Integrative data analysis
4 Introduction Integromics Integrative omics data analysis ( integromics ) Public genome databases like NCBI already house petabytes (10 6 GB) of data, and are growing exponentially each year Increasingly difficult to extract full value from massive omics data in a unified and meaningful way: Gene expression (RNA-seq, microarrays) Protein expression Methylation Metabolome Copy number variants Genomic mutations Functional annotations Gene pathway membership Protein-protein interactions High-throughput phenotypic information Focusing on a single platform runs the risk of missing an obvious signal! andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 4 / 36
5 Introduction Integromics A relatively new phenomenon andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 5 / 36
6 Introduction Integromics The broad umbrella of integrative data analysis Ultimate goal: Understanding complex processes Lots of different meanings: Exploration Description Classification (supervised, unsupervised, semi-supervised) Variable selection / biomarker identification Phenotype prediction Meta-analysis... andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 6 / 36
7 Introduction Integromics Integrative multi-omics analysis: What? Why? 1 Exploration Multiple Factor Analysis (MFA) Regularized Canonical Correlation Multiple co-inertia analysis 2 Classification Clustering (iclusterplus) 3 Prediction Integrative lasso with Penalty Factors (IPF-Lasso) Multi-group partial least squares Penalized linear discriminant analysis andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 7 / 36
8 Introduction Integromics... with lots of statistical and practical difficulties! Missing or incomplete data Potentially heterogenous quality across datasets Need for normalization / standardization / preprocessing (???) Many (!!) more variables than observations (ultra-high dimensionality) Multiple testing Datasets of differing sizes Potentially large requirements for data storage and computing power... and of course, biological interpretation! andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 8 / 36
9 Introduction Example data: TCGA multi-omics data Introduction to the TCGA data Comprehensive and coordinated effort to improve the molecular understanding of major types and sub-types of cancer through high-throughput genomics Clinical information + genomic characterization data + high level sequence analysis of tumor genomes 34 cancer types/sub-types Open-access tier (public data not unique to individuals) and controlled-access tier (primary sequence data, raw SNP data) andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 9 / 36
10 Introduction Example data: TCGA multi-omics data TCGA data (matched/unmatched tumor/normal samples) Clinical (demographic, treatment, survival information) mirna sequencing Protein expression mrna sequencing DNA methylation Copy number variants Somatic mutations Biospecimen data Diagnostic / tissue / radiological images Whole exome / genome sequencing Total RNA sequencing Array-based expression andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 10 / 36
11 Introduction Example data: TCGA multi-omics data TCGA breast cancer data For illustration, we make use of tumoral data from 104 patients with breast invasive carcinoma: Clinical information: cancer subtype (Basal, Luminal A, Luminal B, HER2-enriched), estrogen / progesterone status, survival time, pathologic stage, race, age,... Subtype: Basal-like HER2-enriched Luminal A Luminal B ER status: Negative Positive PR status Negative Positive andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 11 / 36
12 Introduction Example data: TCGA multi-omics data TCGA breast cancer data For illustration, we make use of tumoral data from 104 patients with breast invasive carcinoma: mirna-seq (Illumina Hi-Seq): 725 mirs Normalized protein expression (reverse phase protein arrays): 156 proteins RNA-seq (Illumina Hi-Seq): genes Methylation (Infinium HumanMethylation27 BeadChip): genes Somatic mutations: 4398 genes Copy number alterations: genes Integration of heterogeneous omics data 12 / 36
13 Descriptive integration with multiple factor analysis Outline 1 Introduction Integromics Example data: TCGA multi-omics data 2 Descriptive integration with multiple factor analysis 3 Clustering integration with icluster+ 4 Discussion andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 13 / 36
14 Descriptive integration with multiple factor analysis Multi-table analyses Individuals are described by a set of (possibly related) variables that are structured into several groups: Several potential goals: Identify relationships between tables (inter-structure): canonical correlation Identify a consensus (common structure) among tables: multiple factor analysis (Escofier and Pagès, 1997) Borrow from multivariate methods developed for ecological/survey/chemometrics data andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 14 / 36
15 Descriptive integration with multiple factor analysis Multiple factor analysis (MFA) Integration of heterogeneous omics data 15 / 36
16 Descriptive integration with multiple factor analysis Multiple factor analysis (MFA) We seek common structures present in some or all of the data tables: Simultaneously deal with tables containing information on the same individuals but first, groups of variables must be made comparable! Balanced weighting of different groups of variables Differing numbers of variables in each group Type of variables (quantitative, categorial) may differ between groups Integration of heterogeneous omics data 16 / 36
17 Descriptive integration with multiple factor analysis Multiple factor analysis Four major steps: 1 Perform principal components analysis (PCA) on each dataset individually 2 Normalize each dataset by dividing its elements by the square root of the first eigenvalue obtained from step 1 3 Merge normalized data, and perform a global PCA on the merged data 4 Project individual datasets onto the global analysis to analyze commonalities and discrepancies andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 17 / 36
18 Descriptive integration with multiple factor analysis Multiple factor analysis Superposed graphical representation of partial PCAs andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 18 / 36
19 Descriptive integration with multiple factor analysis Multiple factor analysis 2 for TCGA data 3 via ade4 Measure of proximity between each data table and the consensus = projected inertia from each table on the first two MFA axes 2 All MFA graphics courtesy of Denis Laloë 3 Pre-processing: log 2 ( + 1) for RNA-seq and mirna-seq, arcsin( ) for methylation andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 19 / 36
20 Descriptive integration with multiple factor analysis Multiple factor analysis for TCGA data Similarity between MFA and individual PCA results Integration of heterogeneous omics data 20 / 36
21 Descriptive integration with multiple factor analysis Multiple factor analysis for TCGA data Similarity between MFA and individual PCA results Integration of heterogeneous omics data 20 / 36
22 Descriptive integration with multiple factor analysis Multiple factor analysis for TCGA data Projection of data tables onto consensus Integration of heterogeneous omics data 21 / 36
23 Clustering integration with icluster+ Outline 1 Introduction Integromics Example data: TCGA multi-omics data 2 Descriptive integration with multiple factor analysis 3 Clustering integration with icluster+ 4 Discussion andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 22 / 36
24 Clustering integration with icluster+ Integrative clustering Goal: discover new phenotype subgroups (e.g., cancer subtypes) and their molecular drivers in a comprehensive genetic context Jointly model discrete and continuous variables arising from genomic/epigenomic/transcriptomic profiling Hypothesis: diverse molecular phenotypes can be predicted by a set of orthogonal latent variables 4 representing distinct molecular drivers 4 = not directly observable andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 23 / 36
25 Clustering integration with icluster+ icluster+ integrative clustering Integration of heterogeneous omics data 24 / 36
26 Clustering integration with icluster+ icluster+ integrative clustering Integrates binary (mutation), categorical (copy number gain/normal/loss), continuous or count (gene expression) data Generalized linear regression for joint model, with common set of latent variables 5 + penalization via lasso terms: f (X t ) = β t Z + E t where X t is the p t n data matrix for data type t, β t the loading matrix, Z the shared K n latent variables, and E t the uncorrelated Gaussian error terms Assume Z i N (0, I K ) Sparse model obtained via data-specific lasso penalties λ t 5 NOTE: similar to PCA but better suited to heteroscedastic data andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 25 / 36
27 Clustering integration with icluster+ A word on sparse methods High-dimensional data often contain many irrelevant variables for predicting a response / assigning observations to a group Including these irrelevant variables in a predictive model leads to a loss in predictive performance Sparse methods add an appropriate penalty term to the objective function of the method to suppress these irrelevant variables andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 26 / 36
28 Clustering integration with icluster+ icluster+ integrative clustering Let x ijt be the j th genomic feature in sample i of data type t. If x ijt is binary (i.e., mutation statuts): log P(x ijt = 1 Z i ) 1 P(x ijt = 1 Z i ) = α jt + β jt Z i andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 27 / 36
29 Clustering integration with icluster+ icluster+ integrative clustering Let x ijt be the j th genomic feature in sample i of data type t. If x ijt is binary (i.e., mutation statuts): log P(x ijt = 1 Z i ) 1 P(x ijt = 1 Z i ) = α jt + β jt Z i If x ijt is categorical (i.e., copy number status: loss/normal/gain): P(x ijt = c Z i ) = exp(α jct + β jct Z i ) c exp(α jct + β jct Z i ), c = 1,..., C andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 27 / 36
30 Clustering integration with icluster+ icluster+ integrative clustering Let x ijt be the j th genomic feature in sample i of data type t. If x ijt is binary (i.e., mutation statuts): log P(x ijt = 1 Z i ) 1 P(x ijt = 1 Z i ) = α jt + β jt Z i If x ijt is categorical (i.e., copy number status: loss/normal/gain): P(x ijt = c Z i ) = exp(α jct + β jct Z i ) c exp(α jct + β jct Z i ), c = 1,..., C If x ijt is continuous (i.e., expression): x ijt = α jt + β jt Z i + ε ijt, ε ijt N(0, σ 2 jt) andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 27 / 36
31 Clustering integration with icluster+ iclusterplus Bioconductor package Estimation via modified Monte Carlo Newton-Raphson algorithm Optimization of number of latent variables K (deviance ratio) and lasso penalty terms λ t (BIC) needed... andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 28 / 36
32 Clustering integration with icluster+ Preparing data for integrative clustering 6 Somatic mutation data: keep genes that have mutations in at least 2% of the samples RNA-seq data: keep the 1000 most variable genes (i.e., those with the largest coefficient of variance), and center data for each individual CNA data: keep the 1000 most variable genes (i.e., those with the largest coefficient of variance),set all values between and 0.25 equal to 0 Protein data: keep all values 6 For now, only 4 datasets may be integrated in icluster+. andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 29 / 36
33 Clustering integration with icluster+ Preparing data for integrative clustering 6 Somatic mutation data: keep genes that have mutations in at least 2% of the samples RNA-seq data: keep the 1000 most variable genes (i.e., those with the largest coefficient of variance), and center data for each individual CNA data: keep the 1000 most variable genes (i.e., those with the largest coefficient of variance),set all values between and 0.25 equal to 0 Protein data: keep all values Set K = 4 latent variables (equal to the number of cancer subtypes), use default values of lasso penalty parameters (λ t = 0.03 for all t) 6 For now, only 4 datasets may be integrated in icluster+. andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 29 / 36
34
35 Clustering integration with icluster+ icluster+ results (K = 4 latent variables) andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 31 / 36
36 Clustering integration with icluster+ icluster+ results (K = 4 latent variables) Top features based on lasso penalized coefficients for each data type: $mutation "CDH1" "GATA3" "PCDH15" "PIK3CA" "RYR1" "TP53"... $protein "4E-BP1-R-V" "Akt_pS473-R-V" "AR-R-V" "Bcl-2-M-V" "Bim-R-V" "c-kit-r-v"... $rna "A2ML " "ABCA " "ABCC8 6833" "ADCY1 107" "ADH1B 125" "ADIPOQ 9370"... $tumor "ASIC1" "ACVR1B" "ACVRL1" "APOF" "AQP2" "AQP5"... > lapply(sigfeatures, length) $mutation [1] 46 $protein [1] 39 $rna [1] 250 $tumor [1] 247 andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 32 / 36
37 Discussion Outline 1 Introduction Integromics Example data: TCGA multi-omics data 2 Descriptive integration with multiple factor analysis 3 Clustering integration with icluster+ 4 Discussion andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 33 / 36
38 Discussion Two major integrative strategies Description Variable symmetry No matrix inversions Multi-table analysis through MFA (supervised analysis possible between groups) Explanation / Prediction Asymmetry of variables: one group explains another group Matrix inversion Colinearity n < p and matrix ranks Regularization procedures needed Clustering via icluster+, supervised (discriminant) analysis via predictive methods like IPF-Lasso andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 34 / 36
39 Discussion Discussion Integrative predictive/explicative methods like iclusterplus seem very promising for integrative omics analysis but data preprocessing/model tuning is often needed (and not straightfoward to perfom) Choice of number of latent variables K Choice of lasso penalty terms Influence of pre-processing steps on results... andrea.rau@jouy.inra.fr Integration of heterogeneous omics data 35 / 36
40 Discussion Discussion Integrative approaches can (should?) account for the intrinsic structures of biological relationships from different high-throughput platforms Integration of heterogeneous omics data 36 / 36
41 R/Bioconductor Packages: Thank you! ade4: Multiple factor analysis, multiple co-inertia analysis, STATIS FactomineR: Multiple factor analysis mixomics: Correlation analysis, partial least squares iclusterplus
42 Some references... Meng, C. et al (2014). A multivariate approach to the integration of multi-omics datasets. BMC Bioinformatics 15:162 de Tayrac, M. et al (2009). Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach. BMC genomics, 10(1), 32 Culhane, A. C., et al (2005). MADE4: an R package for multivariate analysis of gene expression data. Bioinformatics, 21(11), Dray, S. et Dufour, A-B. (2007). The ade4 package: implementing the duality diagram for ecologists. Journal of Statistical Software, 22(4). Escofier B., et Pags, J.(1998). Analyses factorielles simples et multiples. Dunod. Lebart, L., Piron, M, Morineau, A. (2006). Statistique exploratoire multidimensionnelle. Dunod. L Cao, K. A.,et al (2008). A sparse PLS for variable selection when integrating omics data. Statistical applications in genetics and molecular biology, 7(1). Salmi B. et al (2010). Multivariate analysis to compare pig meat quality traits according to breed and rearing system.proceedings of the 9th WCGALP, Leipzig, August 1-6, 2010, 442 Tenenhaus, A., et Tenenhaus, M. (2014). Regularized generalized canonical correlation analysis for multiblock or multigroup data analysis. European Journal of Operational Research, 238(2),
Multivariate Methods to detecting co-related trends in data
Multivariate Methods to detecting co-related trends in data Canonical correlation analysis Partial least squares Co-inertia analysis Classical CCA and PLS require n>p. Can apply Penalized CCA and sparse
More informationUnravelling `omics' data with the mixomics R package
Introduction Concept of Single Omics Analysis Integrative Omics Analysis Recent developments Conclusio Unravelling `omics' data with the R package Illustration on several studies Queensland Facility for
More informationAGILENT S BIOINFORMATICS ANALYSIS SOFTWARE
ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS
More informationThe Future of IntegrOmics
The Future of IntegrOmics Kristel Van Steen, PhD 2 kristel.vansteen@ulg.ac.be Systems and Modeling Unit, Montefiore Institute, University of Liège, Belgium Bioinformatics and Modeling, GIGA-R, University
More informationResearch Powered by Agilent s GeneSpring
Research Powered by Agilent s GeneSpring Agilent Technologies, Inc. Carolina Livi, Bioinformatics Segment Manager Research Powered by GeneSpring Topics GeneSpring (GS) platform New features in GS 13 What
More information2017 HTS-CSRS COMMUNITY PUBLIC WORKSHOP
2017 HTS-CSRS COMMUNITY PUBLIC WORKSHOP GenomeNext Overview Olympus Platform The Olympus Platform provides a continuous workflow and data management solution from the sequencing instrument through analysis,
More informationWhole Transcriptome Analysis of Illumina RNA- Seq Data. Ryan Peters Field Application Specialist
Whole Transcriptome Analysis of Illumina RNA- Seq Data Ryan Peters Field Application Specialist Partek GS in your NGS Pipeline Your Start-to-Finish Solution for Analysis of Next Generation Sequencing Data
More informationMachine Learning in Computational Biology CSC 2431
Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs
More informationSmart India Hackathon
TM Persistent and Hackathons Smart India Hackathon 2017 i4c www.i4c.co.in Digital Transformation 25% of India between age of 16-25 Our country needs audacious digital transformation to reach its potential
More informationBioinformatics Analysis of Nano-based Omics Data
Bioinformatics Analysis of Nano-based Omics Data Penny Nymark, Pekka Kohonen, Vesa Hongisto and Roland Grafström Hands-on Workshop on Nano Safety Assessment, 29 th September, 2016, National Technical University
More informationGene Regulation Solutions. Microarrays and Next-Generation Sequencing
Gene Regulation Solutions Microarrays and Next-Generation Sequencing Gene Regulation Solutions The Microarrays Advantage Microarrays Lead the Industry in: Comprehensive Content SurePrint G3 Human Gene
More informationBioinformatics : Gene Expression Data Analysis
05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used
More informationOur website:
Biomedical Informatics Summer Internship Program (BMI SIP) The Department of Biomedical Informatics hosts an annual internship program each summer which provides high school, undergraduate, and graduate
More informationNext-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX
Next-Generation Sequencing Gene Expression Analysis Using Agilent GeneSpring GX Technical Overview Introduction RNA Sequencing (RNA-Seq) is one of the most commonly used next-generation sequencing (NGS)
More informationLab 1: A review of linear models
Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need
More informationInferring Gene-Gene Interactions and Functional Modules Beyond Standard Models
Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Haiyan Huang Department of Statistics, UC Berkeley Feb 7, 2018 Background Background High dimensionality (p >> n) often results
More informationKnowledge-Guided Analysis with KnowEnG Lab
Han Sinha Song Weinshilboum Knowledge-Guided Analysis with KnowEnG Lab KnowEnG Center Powerpoint by Charles Blatti Knowledge-Guided Analysis KnowEnG Center 2017 1 Exercise In this exercise we will be doing
More informationCS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016
CS262 Lecture 12 Notes Single Cell Sequencing Jan. 11, 2016 Background A typical human cell consists of ~6 billion base pairs of DNA and ~600 million bases of mrna. It is time-consuming and expensive to
More informationNima Hejazi. Division of Biostatistics University of California, Berkeley stat.berkeley.edu/~nhejazi. nimahejazi.org github/nhejazi
Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation for the annual retreat of the Center for Computational Biology, given 18 November 2017 Nima Hejazi Division of Biostatistics
More informationBioinformatics. Microarrays: designing chips, clustering methods. Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute
Bioinformatics Microarrays: designing chips, clustering methods Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Course Syllabus Jan 7 Jan 14 Jan 21 Jan 28 Feb 4 Feb 11 Feb 18 Feb 25 Sequence
More informationIntroduction to BIOINFORMATICS
COURSE OF BIOINFORMATICS a.a. 2016-2017 Introduction to BIOINFORMATICS What is Bioinformatics? (I) The sinergy between biology and informatics What is Bioinformatics? (II) From: http://www.bioteach.ubc.ca/bioinfo2010/
More informationiclusterplus: integrative clustering of multiple genomic data sets
iclusterplus: integrative clustering of multiple genomic data sets Qianxing Mo and Ronglai Shen 2 December 7, 202 Contents Division of Biostatistics Dan L. Duncan Cancer Center Baylor College of Medicine
More informationGENOMICS for DUMMIES
ØGC seminar 31. oktober 2013 GENOMICS for DUMMIES Torben A. Kruse Klinisk Genetisk Afdeling, Odense Universitetshospital Klinisk Institut, Syddansk Universitet Human MicroArray Center, OUH / SDU Årsag:
More informationBioinformatics for Biologists
Bioinformatics for Biologists Microarray Data Analysis. Lecture 1. Fran Lewitter, Ph.D. Director Bioinformatics and Research Computing Whitehead Institute Outline Introduction Working with microarray data
More informationadvanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA
advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents
More informationBioinformatics for Biologists
Bioinformatics for Biologists Functional Genomics: Microarray Data Analysis Fran Lewitter, Ph.D. Head, Biocomputing Whitehead Institute Outline Introduction Working with microarray data Normalization Analysis
More informationROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE
CHAPTER1 ROAD TO STATISTICAL BIOINFORMATICS Jae K. Lee Department of Public Health Science, University of Virginia, Charlottesville, Virginia, USA There has been a great explosion of biological data and
More informationOur view on cdna chip analysis from engineering informatics standpoint
Our view on cdna chip analysis from engineering informatics standpoint Chonghun Han, Sungwoo Kwon Intelligent Process System Lab Department of Chemical Engineering Pohang University of Science and Technology
More informationPioneering Clinical Omics
Pioneering Clinical Omics Clinical Genomics Strand NGS An analysis tool for data generated by cutting-edge Next Generation Sequencing(NGS) instruments. Strand NGS enables read alignment and analysis of
More informationGene expression connectivity mapping and its application to Cat-App
Gene expression connectivity mapping and its application to Cat-App Shu-Dong Zhang Northern Ireland Centre for Stratified Medicine University of Ulster Outline TITLE OF THE PRESENTATION Gene expression
More informationMulti-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer
Multi-SNP Models for Fine-Mapping Studies: Application to an association study of the Kallikrein Region and Prostate Cancer November 11, 2014 Contents Background 1 Background 2 3 4 5 6 Study Motivation
More informationIntroduction to Bioinformatics. Fabian Hoti 6.10.
Introduction to Bioinformatics Fabian Hoti 6.10. Analysis of Microarray Data Introduction Different types of microarrays Experiment Design Data Normalization Feature selection/extraction Clustering Introduction
More informationGenetics and Bioinformatics
Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s
More informationIntegrative clustering methods for high-dimensional molecular data
Review Article Integrative clustering methods for high-dimensional molecular data Prabhakar Chalise, Devin C. Koestler, Milan Bimali, Qing Yu, Brooke L. Fridley Department of Biostatistics, University
More informationCorporate Medical Policy
Corporate Medical Policy Proteogenomic Testing for Patients with Cancer (GPS Cancer Test) File Name: Origination: Last CAP Review: Next CAP Review: Last Review: proteogenomic_testing_for_patients_with_cancer_gps_cancer_test
More informationSingle-cell sequencing
Single-cell sequencing Harri Lähdesmäki Department of Computer Science Aalto University December 5, 2017 Contents Background & Motivation Single cell sequencing technologies Single cell sequencing data
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationCorporate Medical Policy
Corporate Medical Policy Proteogenomic Testing for Patients with Cancer (GPS Cancer Test) File Name: Origination: Last CAP Review: Next CAP Review: Last Review: proteogenomic_testing_for_patients_with_cancer_gps_cancer_test
More informationCorporate Medical Policy
Corporate Medical Policy Proteogenomic Testing for Patients with Cancer (GPS Cancer Test) File Name: Origination: Last CAP Review: Next CAP Review: Last Review: proteogenomic_testing_for_patients_with_cancer_gps_cancer_test
More informationThe 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential
The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential Applications Richard Finkers Researcher Plant Breeding, Wageningen UR Plant Breeding, P.O. Box 16, 6700 AA, Wageningen, The Netherlands,
More informationG E N OM I C S S E RV I C ES
GENOMICS SERVICES ABOUT T H E N E W YOR K G E NOM E C E N T E R NYGC is an independent non-profit implementing advanced genomic research to improve diagnosis and treatment of serious diseases. Through
More informationStefano Monti. Workshop Format
Gad Getz Stefano Monti Michael Reich {gadgetz,smonti,mreich}@broad.mit.edu http://www.broad.mit.edu/~smonti/aws Broad Institute of MIT & Harvard October 18-20, 2006 Cambridge, MA Workshop Format Morning
More informationDNA. bioinformatics. genomics. personalized. variation NGS. trio. custom. assembly gene. tumor-normal. de novo. structural variation indel.
DNA Sequencing T TM variation DNA amplicon mendelian trio genomics NGS bioinformatics tumor-normal custom SNP resequencing target validation de novo prediction personalized comparative genomics exome private
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a
More informationIntroduction to Bioinformatics and Gene Expression Technologies
Vocabulary Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 Gene: Genetics: Genome: Genomics: hereditary
More informationILLUMINA SEQUENCING SYSTEMS
ILLUMINA SEQUENCING SYSTEMS PROVEN QUALITY. TRUSTED SOLUTIONS. Every day, researchers are using Illumina next-generation sequencing (NGS) systems to better understand human health and disease, as well
More informationMeasuring and Understanding Gene Expression
Measuring and Understanding Gene Expression Dr. Lars Eijssen Dept. Of Bioinformatics BiGCaT Sciences programme 2014 Why are genes interesting? TRANSCRIPTION Genome Genomics Transcriptome Transcriptomics
More informationDNA. Clinical Trials. Research RNA. Custom. Reports CLIA CAP GCP. Tumor Genomic Profiling Services for Clinical Trials
Tumor Genomic Profiling Services for Clinical Trials Custom Reports DNA RNA Focused Gene Sets Clinical Trials Accuracy and Content Enhanced NGS Sequencing Extended Panel, Exomes, Transcriptomes Research
More informationCentro Nacional de Análisis Genómico. Where are the Bottlenecks of Genome Analysis Today? Teratec. Ecole Polytechnique, Palaiseau, F.
Centro Nacional de Análisis Genómico Where are the Bottlenecks of Genome Analysis Today? Teratec Ecole Polytechnique, Palaiseau, F Ivo Glynne Gut 29.06.2016 The genomehenge Sequencing capacity >1000 Gbases/day
More informationDesigning a Complex-Omics Experiments. Xiangqin Cui. Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham
Designing a Complex-Omics Experiments Xiangqin Cui Section on Statistical Genetics Department of Biostatistics University of Alabama at Birmingham 1/7/2015 Some slides are from previous lectures of Grier
More informationStatistical Methods for Network Analysis of Biological Data
The Protein Interaction Workshop, 8 12 June 2015, IMS Statistical Methods for Network Analysis of Biological Data Minghua Deng, dengmh@pku.edu.cn School of Mathematical Sciences Center for Quantitative
More informationData-Adaptive Estimation and Inference in the Analysis of Differential Methylation
Data-Adaptive Estimation and Inference in the Analysis of Differential Methylation for the annual retreat of the Center for Computational Biology, given 18 November 2017 Nima Hejazi Division of Biostatistics
More informationData Mining for Biological Data Analysis
Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han
More informationEcological genomics and molecular adaptation: state of the Union and some research goals for the near future.
Ecological genomics and molecular adaptation: state of the Union and some research goals for the near future. Louis Bernatchez Genomics and Conservation of Aquatic Resources Université LAVAL! Molecular
More informationFirst steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes
First steps in signal-processing level models of genetic networks: identifying response pathways and clusters of coexpressed genes Olga Troyanskaya lecture for cheme537/cs554 some slides borrowed from
More informationThis place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.
G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic
More informationStatistical Applications in Genetics and Molecular Biology
Statistical Applications in Genetics and Molecular Biology Volume 5, Issue 1 2006 Article 16 Reader s reaction to Dimension Reduction for Classification with Gene Expression Microarray Data by Dai et al
More informationCharacterization of Allele-Specific Copy Number in Tumor Genomes
Characterization of Allele-Specific Copy Number in Tumor Genomes Hao Chen 2 Haipeng Xing 1 Nancy R. Zhang 2 1 Department of Statistics Stonybrook University of New York 2 Department of Statistics Stanford
More informationComputational Challenges of Medical Genomics
Talk at the VSC User Workshop Neusiedl am See, 27 February 2012 [cbock@cemm.oeaw.ac.at] http://medical-epigenomics.org (lab) http://www.cemm.oeaw.ac.at (institute) Introducing myself to Vienna s scientific
More informationWelcome to the NGS webinar series
Welcome to the NGS webinar series Webinar 1 NGS: Introduction to technology, and applications NGS Technology Webinar 2 Targeted NGS for Cancer Research NGS in cancer Webinar 3 NGS: Data analysis for genetic
More informationFunctional genomics + Data mining
Functional genomics + Data mining BIO337 Systems Biology / Bioinformatics Spring 2014 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ of Texas/BIO337/Spring 2014 Functional genomics + Data
More informationBasics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility
2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,
More informationIntroduction to Bioinformatics and Gene Expression Technology
Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA
More informationIPA Advanced Training Course
IPA Advanced Training Course Academia Sinica 2015 Oct Gene( 陳冠文 ) Supervisor and IPA certified analyst 1 Review for Introductory Training course Searching Building a Pathway Editing a Pathway for Publication
More informationStudy on the Application of Data Mining in Bioinformatics. Mingyang Yuan
International Conference on Mechatronics Engineering and Information Technology (ICMEIT 2016) Study on the Application of Mining in Bioinformatics Mingyang Yuan School of Science and Liberal Arts, New
More informationBioinformatics. Outline of lecture
Bioinformatics Uma Chandran, MSIS, PhD Department of Biomedical Informatics University of Pittsburgh chandran@pitt.edu 412 648 9326 07/08/2014 Outline of lecture What is Bioinformatics? Examples of bioinformatics
More informationThe EORTC Molecular Screening programme SPECTA
The EORTC Molecular Screening programme SPECTA February 2016 Denis Lacombe, MD, MSc EORTC, Director General Brussels, Belgium The changing shape of clinical research Phase I RESOURCES Phase III The changing
More informationCancer Genetics Solutions
Cancer Genetics Solutions Cancer Genetics Solutions Pushing the Boundaries in Cancer Genetics Cancer is a formidable foe that presents significant challenges. The complexity of this disease can be daunting
More informationBIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis
BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology
More informationMachine Learning. HMM applications in computational biology
10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly
More informationlatestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe
Overviewof Illumina s latestdevelopments relevant for the Ag sector André Eggen Agriculture Segment Manager, Europe Seminar der Studienrichtung Tierwissenschaften, TÜM, July 1, 2009 Overviewof Illumina
More informationMicroarrays & Gene Expression Analysis
Microarrays & Gene Expression Analysis Contents DNA microarray technique Why measure gene expression Clustering algorithms Relation to Cancer SAGE SBH Sequencing By Hybridization DNA Microarrays 1. Developed
More informationPotential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge
Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge Key considerations Strength of association Exposure Genetic model Outcome Quantitative trait Binary
More informationBioinformatics Advice on Experimental Design
Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics
More informationLees J.A., Vehkala M. et al., 2016 In Review
Sequence element enrichment analysis to determine the genetic basis of bacterial phenotypes Lees J.A., Vehkala M. et al., 2016 In Review Journal Club Triinu Kõressaar 16.03.2016 Introduction Bacterial
More informationAnalytics Behind Genomic Testing
A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical
More informationIntroduction to Bioinformatics
Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru
More informationFeature Selection of Gene Expression Data for Cancer Classification: A Review
Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 50 (2015 ) 52 57 2nd International Symposium on Big Data and Cloud Computing (ISBCC 15) Feature Selection of Gene Expression
More informationIntroducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight
Introducing QIAseq Accelerate your NGS performance through Sample to Insight solutions Sample to Insight From Sample to Insight let QIAGEN enhance your NGS-based research High-throughput next-generation
More informationDiscriminant models for high-throughput proteomics mass spectrometer data
Proteomics 2003, 3, 1699 1703 DOI 10.1002/pmic.200300518 1699 Short Communication Parul V. Purohit David M. Rocke Center for Image Processing and Integrated Computing, University of California, Davis,
More informationGene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis
Gene expression analysis Biosciences 741: Genomics Fall, 2013 Week 5 Gene expression analysis From EST clusters to spotted cdna microarrays Long vs. short oligonucleotide microarrays vs. RT-PCR Methods
More informationStatistical Inference and Reconstruction of Gene Regulatory Network from Observational Expression Profile
Statistical Inference and Reconstruction of Gene Regulatory Network from Observational Expression Profile Prof. Shanthi Mahesh 1, Kavya Sabu 2, Dr. Neha Mangla 3, Jyothi G V 4, Suhas A Bhyratae 5, Keerthana
More informationSEQUENCING. M Ataei, PhD. Feb 2016
CLINICAL NEXT GENERATION SEQUENCING M Ataei, PhD Tehran Medical Genetics Laboratory Feb 2016 Overview 2 Background NGS in non-invasive prenatal diagnosis (NIPD) 3 Background Background 4 In the 1970s,
More informationNormalization of metabolomics data using multiple internal standards
Normalization of metabolomics data using multiple internal standards Matej Orešič 1 1 VTT Technical Research Centre of Finland, Tietotie 2, FIN-02044 Espoo, Finland matej.oresic@vtt.fi Abstract. Success
More informationAgilent GeneSpring GX 10: Beyond. Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008
Agilent GeneSpring GX 10: Gene Expression and Beyond Pam Tangvoranuntakul Product Manager, GeneSpring October 1, 2008 GeneSpring GX 10 in the News Our Goals for GeneSpring GX 10 Goal 1: Bring back GeneSpring
More informationDeep Sequencing technologies
Deep Sequencing technologies Gabriela Salinas 30 October 2017 Transcriptome and Genome Analysis Laboratory http://www.uni-bc.gwdg.de/index.php?id=709 Microarray and Deep-Sequencing Core Facility University
More informationGenomic solutions for complex disease
Genomic solutions for complex disease Power your with our genomic solutions Access a breadth of applications. Gain a depth of insights. To enhance their understanding of complex disease, researchers are
More informationSyllabus for BIOS 101, SPRING 2013
Page 1 Syllabus for BIOS 101, SPRING 2013 Name: BIOSTATISTICS 101 for Cancer Researchers Time: March 20 -- May 29 4-5pm in Wednesdays, [except 4/15 (Mon) and 5/7 (Tue)] Location: SRB Auditorium Background
More informationAnalysis of RNA-seq Data. Feb 8, 2017 Peikai CHEN (PHD)
Analysis of RNA-seq Data Feb 8, 2017 Peikai CHEN (PHD) Outline What is RNA-seq? What can RNA-seq do? How is RNA-seq measured? How to process RNA-seq data: the basics How to visualize and diagnose your
More informationIntroduction to Microarray Analysis
Introduction to Microarray Analysis Methods Course: Gene Expression Data Analysis -Day One Rainer Spang Microarrays Highly parallel measurement devices for gene expression levels 1. How does the microarray
More informationBiomarker discovery and high dimensional datasets
Biomarker discovery and high dimensional datasets Biomedical Data Science Marco Colombo Lecture 4, 2017/2018 High-dimensional medical data In recent years, the availability of high-dimensional biological
More informationresequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics
RNA Sequencing T TM variation genetics validation SNP ncrna metagenomics private trio de novo exome mendelian ChIP-seq RNA DNA bioinformatics custom target high-throughput resequencing storage ncrna comparative
More informationSupplementary Methods
Supplemental Information for funtoonorm: An improvement of the funnorm normalization method for methylation data from multiple cell or tissue types. Kathleen Oros Klein et al. Supplementary Methods funtoonorm
More informationSupport Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner
Support Vector Machines (SVMs) for the classification of microarray data Basel Computational Biology Conference, March 2004 Guido Steiner Overview Classification problems in machine learning context Complications
More informationIncluding prior knowledge in shrinkage classifiers for genomic data
Including prior knowledge in shrinkage classifiers for genomic data Jean-Philippe Vert Jean-Philippe.Vert@mines-paristech.fr Mines ParisTech / Curie Institute / Inserm Statistical Genomics in Biomedical
More informationAdditional file 2. Figure 1: Receiver operating characteristic (ROC) curve using the top
Additional file 2 Figure Legends: Figure 1: Receiver operating characteristic (ROC) curve using the top discriminatory features between HIV-infected (n=32) and HIV-uninfected (n=15) individuals. The top
More informationSample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist
Dr. Bhagyashree S. Birla NGS Field Application Scientist bhagyashree.birla@qiagen.com NGS spans a broad range of applications DNA Applications Human ID Liquid biopsy Biomarker discovery Inherited and somatic
More informationMicroarray Informatics
Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments
More informationRandom matrix analysis for gene co-expression experiments in cancer cells
Random matrix analysis for gene co-expression experiments in cancer cells OIST-iTHES-CTSR 2016 July 9 th, 2016 Ayumi KIKKAWA (MTPU, OIST) Introduction : What is co-expression of genes? There are 20~30k
More informationIntegrative Genomics 1a. Introduction
2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering
More information