Downstream analysis of transcriptomic data

Size: px
Start display at page:

Download "Downstream analysis of transcriptomic data"

Transcription

1 Downstream analysis of transcriptomic data Shamith Samarajiwa CRUK Bioinforma3cs Summer School July 2015

2 General Methods Dimensionality reduc3on methods (clustering, PCA, MDS) Visualizing PaKerns (heatmaps, dendrograms) Over representa3on Analysis (ORA) Ontology enrichment Gene Set Enrichment analysis (GSEA/GSA) Pathway enrichment analysis Network biology methods Promoter analysis of co- regulated genes Gene/Protein interac3on analysis Biomedical Bibliomics

3 The curse of dimensionality Curse of dimensionality (Bellman 1961) phenomena that arise when analyzing and organizing data in high- dimensional spaces that do not occur in low- dimensional sexngs. Caused by number of features > number of samples high dimensional (d): hundreds or thousands of dimensions. gene expression: d SNP data: d 10 6

4 A common theme of these situa3ons is that when the dimensionality increases, the volume of the space increases so fast that the available data become sparse. This sparsity is problema3c for any method that requires sta3s3cal significance.

5 Dimensionality Reduc3on Dimensionality reduc3on techniques are a common approach for dealing with noisy, high- dimensional data. These unsupervised methods can help uncover interes3ng structure in complex datasets. However they give very likle insight into the biological and technical aspects that might explain the uncovered structure.

6 Principal Component Analysis (PCA) Principal component analysis (PCA) reduces the dimensionality of the data while retaining most of the variance in the data set. It accomplishes this reduc3on by iden3fying direc3ons (Eigenvectors + Eigenvalues), called principal components, along which the variance in the data is maximal. By using a few components, each sample can be represented by a rela3vely few numbers instead of by values for thousands of variables. Data can then be visualized, making it possible to assess similari3es and differences between samples and determine whether samples are grouped together.

7

8 Mul3- dimensional Scaling (MDS) Mul7dimensional scaling (MDS) is a means of visualizing the level of similarity of individual cases of a dataset. It refers to a set of related ordina3on techniques used in informa3on visualiza3on, in par3cular to display the informa3on contained in a distance matrix.

9 Clustering The process of grouping together similar en33es. Need to define similarity: distance metric Hierarchical clustering K- means SOFM PAM Biclustering

10 Clustering Given enough genes, a set of genes will always cluster. Clusters produced by a given algorithm is highly dependent on the distance metric used. Same clustering method applied too the same data may produce different results.

11 Visualizing complex datasets with heatmaps There are mul3ple R packages and func3ons that can generate heatmaps. i. Heatplus ii. Pheatmap iii. Heatmap.2{gplots} iv. Heatmap.plus Correla3on Heatmap

12 Over Representa3onal Analysis (ORA) Given a list of genes/features and one or more lists of annota3ons, are any of he annota3ons surprisingly enriched in the gene list? How to assess surprisingly? - Sta3s3cs How to correct for repeated tes3ng?- False Discovery Rate correc3on Test for under and/or over enrichment rela3ve to a background popula3on; selec3ng the right background is important!! i. Hypergeometric test ii. Fisher s exact test iii. Two tailed t- test (sig. difference between means of two distribu3ons) iv. Kolmogorov Smirnov test (test of the equality of con3nuous, one- dimensional probability distribu3ons that can be used to compare a sample with a reference probability distribu3on)

13 Fisher s Exact Test Fisherʼs exact test is used for ORA of gene lists for a single type of annota3on. P- value for Fisherʼs exact test is the probability that a random draw of the same size as the gene list from the background popula3on would produce the observed number of annota3ons in the gene list or more., and depends on size of both gene list and background popula3on as well and the number of specific genes in gene list and background.

14 Ontologies and Ontology Enrichment Gene Ontology : a controlled vocabulary and machine readable, Directed Acyclic Graph with mul3ple levels of classifica3on, can be used across species 3 types of gene ontology (BP, MF, CC) 9 levels (generic to extremely specific) is a and part of rela3onships Evidence codes Allows for incomplete knowledge More than 80 types of biomedical ontologies. Enrichment tools: more than 400 Ex: NIH DAVID, GREAT, GOseq

15 Pathway Analysis Iden3fy pathway that are perturbed or significantly impacted in a given condi3on or phenotype Many pathway databases (~550). Incomplete pathway descrip3ons. Provenance of annota3on. Signalling, Metabolic, Regulatory pathways

16 Gene Set Enrichment (GSEA) A method for assessing whether a predefined set of genes show sta3s3cally significant differences between two condi3ons.

17 Msigdb & GMT files A collec3on of gene set No centralized cura3on- user submiked gene sets GSEA uses the list rank informa3on without using a threshold. The gene sets in the Molecular Signatures Database (MSigDB) are divided into 8 major collec3ons, and several sub- collec3ons.

18 Gene Set Analysis (GSA) It differs from a Gene Set Enrichment Analysis (Subramanian et al 2006) in its use of the "maxmean" sta3s3c: this is the mean of the posi3ve or nega3ve part of gene scores in the gene set, whichever is large in absolute values.

19 Promoter Enrichment Analysis Iden3fy co- regulated gene set or signature Extremely Hard to do!! Co regulated genes may have similar biological func3on or involved in a biological process, show differen3al expression, involved in the same pathway, belong to the same cluster? Iden3fy the upstream regulators of that gene set. How? Locate promoters / regulatory elements of the genes set. Then search for transcrip3on factor binding sites (TFBS) that are enriched in that promoter set (compared to a random promoter set).

20 PSCAN

Pathway Analysis Adding Func2onal Context to High- Throughput Results

Pathway Analysis Adding Func2onal Context to High- Throughput Results Pathway Analysis Adding Func2onal Context to High- Throughput Results Stephen D. Turner, Ph.D. Bioinforma2cs Core Director bioinforma2cs@virginia.edu Outline Bioinforma2cs & the Bioinforma2cs Core Service

More information

Downstream analysis of ChIP- seq data

Downstream analysis of ChIP- seq data Downstream analysis of ChIP- seq data Shamith Samarajiwa Integra/ve Systems Biomedicine Group MRC Cancer Unit University of Cambridge CRUK Bioinforma/cs Summer School July 2015 ChIP- seq workflow overview

More information

Canadian Bioinforma3cs Workshops

Canadian Bioinforma3cs Workshops Canadian Bioinforma3cs Workshops www.bioinforma3cs.ca Module #: Title of Module 2 1 Module 3 Expression and Differen3al Expression (lecture) Obi Griffith & Malachi Griffith www.obigriffith.org ogriffit@genome.wustl.edu

More information

Gene List Enrichment Analysis

Gene List Enrichment Analysis Outline Gene List Enrichment Analysis George Bell, Ph.D. BaRC Hot Topics March 16, 2010 Why do enrichment analysis? Main types Selecting or ranking genes Annotation sources Statistics Remaining issues

More information

Lecture 2: Population Structure Advanced Topics in Computa8onal Genomics

Lecture 2: Population Structure Advanced Topics in Computa8onal Genomics Lecture 2: Population Structure 02-715 Advanced Topics in Computa8onal Genomics 1 What is population structure? Popula8on Structure A set of individuals characterized by some measure of gene8c dis8nc8on

More information

Lecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics

Lecture: Genetic Basis of Complex Phenotypes Advanced Topics in Computa8onal Genomics Lecture: Genetic Basis of Complex Phenotypes 02-715 Advanced Topics in Computa8onal Genomics Genome Polymorphisms A Human Genealogy TCGAGGTATTAAC The ancestral chromosome From SNPS TCGAGGTATTAAC TCTAGGTATTAAC

More information

Introduc)on to Pathway and Network Analysis

Introduc)on to Pathway and Network Analysis Introduc)on to Pathway and Network Analysis Alison Motsinger-Reif, PhD Associate Professor Bioinforma)cs Research Center Department of Sta)s)cs North Carolina State University Pathway and Network Analysis

More information

Pathway Analysis in other data types

Pathway Analysis in other data types Pathway Analysis in other data types Alison Motsinger-Reif, PhD Associate Professor Bioinforma

More information

From reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute

From reads to results: differen1al expression analysis with RNA seq. Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute From reads to results: differen1al expression analysis with RNA seq Alicia Oshlack Bioinforma1cs Division Walter and Eliza Hall Ins1tute Purported benefits and opportuni1es of RNA seq All transcripts are

More information

Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization

Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu

More information

User Guide. MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit. Page 1

User Guide. MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit. Page 1 User Guide MAGNET : MicroArray & RNAseq Gene expression Network Evalua=on Toolkit Page 1 Case Western Reserve University February 2012 Page 2 Page 3 1 - Introduction This sec=on will introduce MAGNET:

More information

Popula'on Structure Computa.onal Genomics Seyoung Kim

Popula'on Structure Computa.onal Genomics Seyoung Kim Popula'on Structure 02-710 Computa.onal Genomics Seyoung Kim What is Popula'on Structure? Popula.on Structure A set of individuals characterized by some measure of gene.c dis.nc.on A popula.on is usually

More information

Genetic Variation, Biological Pathways, and Networks. Sarah Pendergrass Center for Systems Genomics

Genetic Variation, Biological Pathways, and Networks. Sarah Pendergrass Center for Systems Genomics Genetic Variation, Biological Pathways, and Networks Sarah Pendergrass Center for Systems Genomics Outline What are networks and why are they important in biology? Biological Pathways Why do we care about

More information

Our goal....to understanding (wisdom)...to knowledge...to information data

Our goal....to understanding (wisdom)...to knowledge...to information data Knowledge Discovery Our goal...to understanding (wisdom)...to knowledge...to information data Why do we need Knowledge Discovery? Data Explosion: web usage, automated data collec?on tools, mature database

More information

Project Alloca,on and Guest Lecture BMS353

Project Alloca,on and Guest Lecture BMS353 Project Alloca,on and Guest Lecture Today s Outline Part A : Summary of the module Alloca,on of projects Project Discussion Break Fes.ve treat -- Part B : Discussion based on your ques,ons from lecture

More information

Analyzing Gene Set Enrichment

Analyzing Gene Set Enrichment Analyzing Gene Set Enrichment BaRC Hot Topics June 20, 2016 Yanmei Huang Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Purpose of Gene Set Enrichment Analysis

More information

Introduc)on to Databases and Resources Biological Databases and Resources

Introduc)on to Databases and Resources Biological Databases and Resources Introduc)on to Bioinforma)cs Online Course : IBT Introduc)on to Databases and Resources Biological Databases and Resources Learning Objec)ves Introduc)on to Databases and Resources - Understand how bioinforma)cs

More information

Knowledge-Guided Analysis with KnowEnG Lab

Knowledge-Guided Analysis with KnowEnG Lab Han Sinha Song Weinshilboum Knowledge-Guided Analysis with KnowEnG Lab KnowEnG Center Powerpoint by Charles Blatti Knowledge-Guided Analysis KnowEnG Center 2017 1 Exercise In this exercise we will be doing

More information

Genome-Wide Associa/on Studies: History, Current Approaches, and Future Opportuni/es. Addie Thompson Genomics,

Genome-Wide Associa/on Studies: History, Current Approaches, and Future Opportuni/es. Addie Thompson Genomics, Genome-Wide Associa/on Studies: History, Current Approaches, and Future Opportuni/es Addie Thompson Genomics, 11-15-2016 Outline History and terminology Sta5s5cs and breeding Linkage and associa5on analysis,

More information

Exploring genomic databases: Practical session "

Exploring genomic databases: Practical session Exploring genomic databases: Practical session Work through the following practical exercises on your own. The objective of these exercises is to become familiar with the information available in each

More information

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options

More information

Sta$s$cs for Genomics ( )

Sta$s$cs for Genomics ( ) Sta$s$cs for Genomics (140.688) Instructor: Jeff Leek Website: http://www.biostat.jhsph.edu/~jleek/teaching/2011/genomics/ Class Times: MW, 10:30AM-11:50AM + R Lab TBA Grading: 20% Reading Assignments,

More information

Александр Предеус. Институт Биоинформатики. Gene Set Analysis: почему интерпретировать глобальные генетические изменения труднее, чем кажется

Александр Предеус. Институт Биоинформатики. Gene Set Analysis: почему интерпретировать глобальные генетические изменения труднее, чем кажется Александр Предеус Институт Биоинформатики Gene Set Analysis: почему интерпретировать глобальные генетические изменения труднее, чем кажется Outline Formulating the problem What are the references? Overrepresentation

More information

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer

CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer CS 5984: Application of Basic Clustering Algorithms to Find Expression Modules in Cancer T. M. Murali January 31, 2006 Innovative Application of Hierarchical Clustering A module map showing conditional

More information

Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26

Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on. Dr. Yanjun Qi 2017/11/26 Making Deep Learning Understandable for Analyzing Sequen;al Data about Gene Regula;on Dr. Yanjun Qi 2017/11/26 Roadmap ² Background of Machine Learning ² Background of Sequen?al Data about Gene Regula?on

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

RNA sequencing Integra1ve Genomics module

RNA sequencing Integra1ve Genomics module RNA sequencing Integra1ve Genomics module Michael Inouye Centre for Systems Genomics University of Melbourne, Australia Summer Ins@tute in Sta@s@cal Gene@cs 2016 SeaBle, USA @minouye271 inouyelab.org This

More information

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data

Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data Lecture 8: Predicting and analyzing metagenomic composition from 16S survey data What can we tell about the taxonomic and functional stability of microbiota? Why? Nature. 2012; 486(7402): 207 214. doi:10.1038/nature11234

More information

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs (3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable

More information

A very brief introduc0on to bioinforma0cs. Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute

A very brief introduc0on to bioinforma0cs. Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute A very brief introduc0on to bioinforma0cs Mikhail Spivakov, PhD European Bioinforma0cs Ins0tute What bioinforma0cs does? Cataloguing Mining Modelling For lab biologists to look at favourite genes etc.

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

Suberoylanilide Hydroxamic Acid Treatment Reveals. Crosstalks among Proteome, Ubiquitylome and Acetylome

Suberoylanilide Hydroxamic Acid Treatment Reveals. Crosstalks among Proteome, Ubiquitylome and Acetylome Suberoylanilide Hydroxamic Acid Treatment Reveals Crosstalks among Proteome, Ubiquitylome and Acetylome in Non-Small Cell Lung Cancer A549 Cell Line Quan Wu 1, Zhongyi Cheng 2, Jun Zhu 3, Weiqing Xu 1,

More information

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics

Decoding Chromatin States with Epigenome Data Advanced Topics in Computa8onal Genomics Decoding Chromatin States with Epigenome Data 02-715 Advanced Topics in Computa8onal Genomics HMMs for Decoding Chromatin States Epigene8c modifica8ons of the genome have been associated with Establishing

More information

Introduc)on to ChIP- seq analysis

Introduc)on to ChIP- seq analysis Introduc)on to ChIP- seq analysis Shamith Samarajiwa Integra/ve Systems Biomedicine Group MRC Cancer Unit University of Cambridge Analysis of High- throughput sequencing data with BioConductor 1-3 June

More information

Pathway Analysis. Min Kim Bioinformatics Core Facility 2/28/2018

Pathway Analysis. Min Kim Bioinformatics Core Facility 2/28/2018 Pathway Analysis Min Kim Bioinformatics Core Facility 2/28/2018 Outline 1. Background 2. Databases: KEGG, Reactome, Biocarta, Gene Ontology, MSigDB, MetaCyc, SMPDB, IPA. 3. Statistical Methods: Overlap

More information

Methods to Study and Control Diseases in Wild Populations

Methods to Study and Control Diseases in Wild Populations Methods to Study and Control Diseases in Wild Populations Steve Bellan, MPH Department of Environmental Sci, Pol & Mgmt University of California at Berkeley USA ASI on Conserva@on Biology Witswatersrand

More information

Annotation and Function of Switch-like Genes in Health and Disease. A Thesis. Submitted to the Faculty. Drexel University. Adam M.

Annotation and Function of Switch-like Genes in Health and Disease. A Thesis. Submitted to the Faculty. Drexel University. Adam M. Annotation and Function of Switch-like Genes in Health and Disease A Thesis Submitted to the Faculty of Drexel University by Adam M. Ertel in partial fulfillment of the requirements for the degree of Doctor

More information

Gene Regulatory Networks Computa.onal Genomics Seyoung Kim

Gene Regulatory Networks Computa.onal Genomics Seyoung Kim Gene Regulatory Networks 02-710 Computa.onal Genomics Seyoung Kim Transcrip6on Factor Binding Transcrip6on Control Gene transcrip.on is influenced by Transcrip.on factor binding affinity for the regulatory

More information

CMSC423: Bioinformatic databases, algorithms and tools

CMSC423: Bioinformatic databases, algorithms and tools CMSC423: Bioinformatic databases, algorithms and tools Héctor Corrada Bravo Dept. of Computer Science Center for Bioinformatics and Computational Biology University of Maryland University of Maryland,

More information

L8: Downstream analysis of ChIP-seq and ATAC-seq data

L8: Downstream analysis of ChIP-seq and ATAC-seq data L8: Downstream analysis of ChIP-seq and ATAC-seq data Shamith Samarajiwa CRUK Bioinformatics Autumn School September 2017 Summary Downstream analysis for extracting meaningful biology : Normalization and

More information

Gene Annotation and Gene Set Analysis

Gene Annotation and Gene Set Analysis Gene Annotation and Gene Set Analysis After you obtain a short list of genes/clusters/classifiers what next? For each gene, you may ask What it is What is does What processes is it involved in Which chromosome

More information

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt Christian Hundt Contents Gene Set Enrichment Analysis (GSEA) Background Algorithmic details cudagsea Performance evaluation

More information

Natural Selection Advanced Topics in Computa8onal Genomics

Natural Selection Advanced Topics in Computa8onal Genomics Natural Selection 02-715 Advanced Topics in Computa8onal Genomics Natural Selection Compara8ve studies across species O=en focus on protein- coding regions Genes under selec8ve pressure Immune- related

More information

scrnaseq clustering tools Åsa Björklund

scrnaseq clustering tools Åsa Björklund scrnaseq clustering tools Åsa Björklund asa.bjorklund@scilifelab.se What is a celltype? What is a cell type? A cell that performs a specific func

More information

Factor Analysis is Your Friend. AnnMaria De Mars, PhD. The Julia Group & 7 Genera>on Games

Factor Analysis is Your Friend. AnnMaria De Mars, PhD. The Julia Group & 7 Genera>on Games Factor Analysis is Your Friend AnnMaria De Mars, PhD. The Julia Group & 7 Genera>on Games WHY? Imagine this What exactly were you planning on doing with that? Let s say you have a massive pile of data

More information

Experience with Weka by Predictive Classification on Gene-Expre

Experience with Weka by Predictive Classification on Gene-Expre Experience with Weka by Predictive Classification on Gene-Expression Data Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Intelligent Data Analysis lab

More information

WPI. Some Issues in Computa/onal Design Crea/vity. David C. Brown Computer Science Department Worcester Polytechnic Ins7tute Worcester, MA, USA

WPI. Some Issues in Computa/onal Design Crea/vity. David C. Brown Computer Science Department Worcester Polytechnic Ins7tute Worcester, MA, USA WPI Some Issues in Computa/onal Design Crea/vity David C. Brown Computer Science Department Worcester Polytechnic Ins7tute Worcester, MA, USA WPI Private university, mostly science & technology. Founded

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

CMSC702: Computational systems biology and functional genomics

CMSC702: Computational systems biology and functional genomics CMSC702: Computational systems biology and functional genomics Héctor Corrada Bravo Dept. of Computer Science Center for Bioinformatics and Computational Biology University of Maryland University of Maryland,

More information

Metagenomics Advanced Topics in Computa8onal Genomics

Metagenomics Advanced Topics in Computa8onal Genomics Metagenomics 02-715 Advanced Topics in Computa8onal Genomics Metagenomics Inves8ga8on of the microbes that inhabit oceans, soils, and the human body, etc. with sequencing technologies Coopera8ve interac8ons

More information

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University Understanding protein lists from proteomics studies Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu A typical comparative shotgun proteomics study IPI00375843

More information

Inferring Cellular Networks Using Probabilis6c Graphical Models. Jianlin Cheng, PhD University of Missouri 2010

Inferring Cellular Networks Using Probabilis6c Graphical Models. Jianlin Cheng, PhD University of Missouri 2010 Inferring Cellular Networks Using Probabilis6c Graphical Models Jianlin Cheng, PhD University of Missouri 2010 Bayesian Network So@ware hap://www.cs.ubc.ca/~murphyk/so@ware/ BNT/bnso@.html Demo References

More information

The Project Management Cer6ficate Program. Project Quality Management

The Project Management Cer6ficate Program. Project Quality Management PMP cross-cutting skills have been updated in the PMP Exam Content Outline June 2015 (PDF of the Examination Content Outline - June 2015 can be found under the Resources Tab). Learn about why the PMP exam

More information

Introduc)on to Chroma)n IP sequencing (ChIP- seq) data analysis

Introduc)on to Chroma)n IP sequencing (ChIP- seq) data analysis Introduc)on to Chroma)n IP sequencing (ChIP- seq) data analysis Introduc)on to Bioinforma)cs using NGS data Linköping, 21 April 2016 Agata Smialowska BILS / NBIS, SciLifeLab, Stockholm University Chroma)n

More information

5/14/12 OUTLINE. A Situa(on Summary Approaches for Describing A Relevant System System Models Influence Diagrams Quan(ta(ve Modelling

5/14/12 OUTLINE. A Situa(on Summary Approaches for Describing A Relevant System System Models Influence Diagrams Quan(ta(ve Modelling OUTLINE A Situa(on Summary Approaches for Describing A Relevant System System Models Influence Diagrams Quan(ta(ve Modelling 1 Systems Modeling We will apply the systems concepts and thinking discussed

More information

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu

Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis. Jenny Wu Introduction to Next Generation Sequencing (NGS) Data Analysis and Pathway Analysis Jenny Wu Outline Introduction to NGS data analysis in Cancer Genomics NGS applications in cancer research Typical NGS

More information

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE CHAPTER1 ROAD TO STATISTICAL BIOINFORMATICS Jae K. Lee Department of Public Health Science, University of Virginia, Charlottesville, Virginia, USA There has been a great explosion of biological data and

More information

Effects of Different Normalization Methods on Gene Set Enrichment Analysis (GSEA)

Effects of Different Normalization Methods on Gene Set Enrichment Analysis (GSEA) Effects of Different Normalization Methods on Gene Set Enrichment Analysis (GSEA) Ming Zhao Tutor: Dr. Mengjin Zhu 12th, Nov, 2011 Key Laboratory of Agricultural Animal Genetics, Breeding and Reproduction

More information

A Prac'cal Guide to NCBI BLAST

A Prac'cal Guide to NCBI BLAST A Prac'cal Guide to NCBI BLAST Leonardo Mariño-Ramírez NCBI, NIH Bethesda, USA June 2018 1 NCBI Search Services and Tools Entrez integrated literature and molecular databases Viewers BLink protein similarities

More information

Failure Predic,on of Jobs in Compute Clouds: A Google Cluster Case Study

Failure Predic,on of Jobs in Compute Clouds: A Google Cluster Case Study Failure Predic,on of Jobs in Compute Clouds: A Google Cluster Case Study Xin Chen, and Karthik Pa0abiraman University of Bri:sh Columbia (UBC) Charng- da Lu, Unaffiliated Compute Clouds } Infrastructure

More information

Lecture 3: Biology Basics Con4nued. Spring 2017 January 24, 2017

Lecture 3: Biology Basics Con4nued. Spring 2017 January 24, 2017 Lecture 3: Biology Basics Con4nued Spring 2017 January 24, 2017 Genotype/Phenotype Blue eyes Phenotype: Brown eyes Recessive: bb Genotype: Dominant: Bb or BB Genes are shown in rela%ve order and distance

More information

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis

BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology. Lecture 2: Microarray analysis BIOINF/BENG/BIMM/CHEM/CSE 184: Computational Molecular Biology Lecture 2: Microarray analysis Genome wide measurement of gene transcription using DNA microarray Bruce Alberts, et al., Molecular Biology

More information

Next- genera*on Sequencing. Lecture 13

Next- genera*on Sequencing. Lecture 13 Next- genera*on Sequencing Lecture 13 ChIP- seq Applica*ons iden%fy sequence varia%ons DNA- seq Iden%fy Pathogens RNA- seq Kahvejian et al, 2008 Protein-DNA interaction DNA is the informa*on carrier of

More information

RNA-Seq Now What? BIS180L Professor Maloof May 24, 2018

RNA-Seq Now What? BIS180L Professor Maloof May 24, 2018 RNA-Seq Now What? BIS180L Professor Maloof May 24, 2018 We have differentially expressed genes, what do we want to know about them? We have differentially expressed genes, what do we want to know about

More information

CS 680: Assembly and Analysis of Sequencing Data. Fall 2012 August 21st, 2012

CS 680: Assembly and Analysis of Sequencing Data. Fall 2012 August 21st, 2012 CS 680: Assembly and Analysis of Sequencing Data Fall 2012 August 21st, 2012 Logis@cs of the Course Logis@cs About the Course Instructor: Chris@na Boucher email: cboucher@cs.colostate.edu Office: CSB 464

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner

Support Vector Machines (SVMs) for the classification of microarray data. Basel Computational Biology Conference, March 2004 Guido Steiner Support Vector Machines (SVMs) for the classification of microarray data Basel Computational Biology Conference, March 2004 Guido Steiner Overview Classification problems in machine learning context Complications

More information

Canadian Bioinforma2cs Workshops

Canadian Bioinforma2cs Workshops Canadian Bioinforma2cs Workshops www.bioinforma2cs.ca Module #: Title of Module 2 1 Introduction to Microarrays & R Paul Boutros Morning Overview 09:00-11:00 Microarray Background Microarray Pre- Processing

More information

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

Shimadzu mul+- omics data analysis Tutorial 2017 June

Shimadzu mul+- omics data analysis Tutorial 2017 June Shimadzu mul+- omics data analysis Tutorial 2017 June 2 What is the Shimadzu mul'- omics data analysis? 3 Shimadzu mul'- omics analysis visualiza'on 4 Shimadzu Mul'- omics analysis Gadgets 5 Installa'on

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Introduc)on to Chroma)n IP sequencing (ChIP- seq) data analysis

Introduc)on to Chroma)n IP sequencing (ChIP- seq) data analysis Introduc)on to Chroma)n IP sequencing (ChIP- seq) data analysis Introduc)on to Bioinforma)cs Using NGS Data 27 January 2016 Agata Smialowska BILS, SciLife Lab, Stockholm University Chroma)n state and gene

More information

Introduc)on to ChIP- Seq data analyses

Introduc)on to ChIP- Seq data analyses Introduc)on to ChIP- Seq data analyses 7 th BBBC, February 2014 Outline Introduc8on to ChIP- seq experiment: Biological mo8va8on. Experimental procedure. Method and sohware for ChIP- seq peak calling:

More information

Predictive Analytics Cheat Sheet

Predictive Analytics Cheat Sheet Predictive Analytics The use of advanced technology to help legal teams separate datasets by relevancy or issue in order to prioritize documents for expedited review. Often referred to as Technology Assisted

More information

Package GeneExpressionSignature

Package GeneExpressionSignature Package GeneExpressionSignature December 30, 2017 Title Gene Expression Signature based Similarity Metric Version 1.25.0 Date 2012-10-24 Author Yang Cao Maintainer Yang Cao , Fei

More information

Good morning. I am Eduardo López, the sponsor of this project. I am director of regional opera>ons for our company Movistar, which is the cellphone

Good morning. I am Eduardo López, the sponsor of this project. I am director of regional opera>ons for our company Movistar, which is the cellphone 1 Good morning. I am Eduardo López, the sponsor of this project. I am director of regional opera>ons for our company Movistar, which is the cellphone and mobile internet operator of the Telefonica Group

More information

the frequency of retraction varies among journals and shows a strong correlation with the journal impact factor 1/20/17

the frequency of retraction varies among journals and shows a strong correlation with the journal impact factor 1/20/17 1 The trouble with retrac4ons: Nature News 2011 the frequency of retraction varies among journals and shows a strong correlation with the journal impact factor Fang 2011 Infect. Immun. 2 Publica4ons with

More information

Mining Association Rule Bases from Integrated Genomic Data and Annotations

Mining Association Rule Bases from Integrated Genomic Data and Annotations Mining Association Rule Bases from Integrated Genomic Data and Annotations Ricardo Martinez 1, Nicolas Pasquier 1 and Claude Pasquier 2 1 Laboratoire I3S, Université de Nice / CNRS UMR-6070, Sophia-Antipolis,

More information

Hawaii Hazards Awareness & Resilience Program. Contents. Module 5: Risk Assessment 3/1/17. Vulnerability and Capacity Assessment (VCA)

Hawaii Hazards Awareness & Resilience Program. Contents. Module 5: Risk Assessment 3/1/17. Vulnerability and Capacity Assessment (VCA) Hawaii Awareness & Resilience Program Produced by Hawaii State Civil Defense HAWAII HAZARDS AWARENESS & RESILIENCE PROGRAM: GOAL: To enhance community resilience to mul?ple hazards through a facilitated

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

FDA and the Regula/on of Next Genera/on Sequencing

FDA and the Regula/on of Next Genera/on Sequencing FDA and the Regula/on of Next Genera/on Sequencing David Litwack, Ph.D. Personalized Medicine Staff Office of In Vitro Diagnos@cs and Radiological Health, FDA In Vitro Diagnos/cs in the Age of Precision

More information

Smart India Hackathon

Smart India Hackathon TM Persistent and Hackathons Smart India Hackathon 2017 i4c www.i4c.co.in Digital Transformation 25% of India between age of 16-25 Our country needs audacious digital transformation to reach its potential

More information

GeneQuery: A phenotype search tool based on gene co-expression clustering. Alexander Predeus 21-oct-2015

GeneQuery: A phenotype search tool based on gene co-expression clustering. Alexander Predeus 21-oct-2015 GeneQuery: A phenotype search tool based on gene co-expression clustering Alexander Predeus 21-oct-2015 About myself Graduated from Moscow state University (1998-2003) PhD: Michigan State University (2003-2009):

More information

Stefano Monti. Workshop Format

Stefano Monti. Workshop Format Gad Getz Stefano Monti Michael Reich {gadgetz,smonti,mreich}@broad.mit.edu http://www.broad.mit.edu/~smonti/aws Broad Institute of MIT & Harvard October 18-20, 2006 Cambridge, MA Workshop Format Morning

More information

DNASeq: Analysis pipeline and file formats Sumir Panji, Gerrit Boha and Amel Ghouila

DNASeq: Analysis pipeline and file formats Sumir Panji, Gerrit Boha and Amel Ghouila DNASeq: Analysis pipeline and file formats Sumir Panji, Gerrit Boha and Amel Ghouila Bioinforma>cs analysis and annota>on of variants in NGS data workshop Cape Town, 4th to 6th April 2016 DNA Sequencing:

More information

Beyond single genes or proteins

Beyond single genes or proteins Beyond single genes or proteins Marylyn D Ritchie, PhD Professor, Biochemistry and Molecular Biology Director, Center for Systems Genomics The Pennsylvania State University Traditional Approach Genome-wide

More information

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA

advanced analysis of gene expression microarray data aidong zhang World Scientific State University of New York at Buffalo, USA advanced analysis of gene expression microarray data aidong zhang State University of New York at Buffalo, USA World Scientific NEW JERSEY LONDON SINGAPORE BEIJING SHANGHAI HONG KONG TAIPEI CHENNAI Contents

More information

Genome Annotation - 2. Qi Sun Bioinformatics Facility Cornell University

Genome Annotation - 2. Qi Sun Bioinformatics Facility Cornell University Genome Annotation - 2 Qi Sun Bioinformatics Facility Cornell University Output from Maker GFF file: Annotated gene, transcripts, and CDS FASTA file: Predicted transcript sequences Predicted protein sequences

More information

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models

Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Inferring Gene-Gene Interactions and Functional Modules Beyond Standard Models Haiyan Huang Department of Statistics, UC Berkeley Feb 7, 2018 Background Background High dimensionality (p >> n) often results

More information

Package pcxn. March 7, 2019

Package pcxn. March 7, 2019 Type Package Version 2.4.0 Package pcxn March 7, 2019 Title Exploring, analyzing and visualizing functions utilizing the pcxndata package Discover the correlated pathways/gene sets of a single pathway/gene

More information

Package pcxn. March 25, 2019

Package pcxn. March 25, 2019 Type Package Version 2.4.0 Package pcxn March 25, 2019 Title Exploring, analyzing and visualizing functions utilizing the pcxndata package Discover the correlated pathways/gene sets of a single pathway/gene

More information

I have my list of differentially expressed genes, now what?

I have my list of differentially expressed genes, now what? I have my list of differentially expressed genes, now what? Mirana Ramialison! Developmental Systems Biology Laboratory! Australian Regenerative Medicine Institute! #UQwinterSchool Outline 1. Background

More information

Process Modeling Best Practices. Raphael Derbier Nicolas Marzin

Process Modeling Best Practices. Raphael Derbier Nicolas Marzin Process Modeling Best Practices Raphael Derbier Nicolas Marzin SAFE HARBOR DISCLOSURE During the course of this presenta1on TIBCO or its representa1ves may make forward-looking statements regarding future

More information

Supplementary Figures Supplementary Figure 1

Supplementary Figures Supplementary Figure 1 Supplementary Figures Supplementary Figure 1 Supplementary Figure 1 COMSOL simulation demonstrating flow characteristics in hydrodynamic trap structures. (a) Flow field within the hydrodynamic traps before

More information

MFMS: Maximal Frequent Module Set mining from multiple human gene expression datasets

MFMS: Maximal Frequent Module Set mining from multiple human gene expression datasets MFMS: Maximal Frequent Module Set mining from multiple human gene expression datasets Saeed Salem North Dakota State University Cagri Ozcaglar Amazon 8/11/2013 Introduction Gene expression analysis Use

More information

Single cell RNA sequencing. Åsa Björklund

Single cell RNA sequencing. Åsa Björklund Single cell RNA sequencing Åsa Björklund asa.bjorklund@scilifelab.se Outline Why single cell gene expression? Library prepara>on methods SeBng up experiments Computa>onal analysis Defining cell types Iden>fying

More information

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology. G16B BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY Methods or systems for genetic

More information

Technology to Job Mapping

Technology to Job Mapping Technology to Job Mapping Dr. Robert McNamee Managing Director, Innova@on & Entrepreneur Ins@tute Fox School of Business, Temple University Dr. Andrew Maxwell Director Bergeron Entrepreneurs in Science

More information

An Overview of Gene Set Enrichment Analysis

An Overview of Gene Set Enrichment Analysis An Overview of Gene Set Enrichment Analysis CHELSEA JUI-TING JU, University of California, Los Angeles Gene set enrichment analysis is a data mining approach designed to facilitate the biological interpretation

More information

Machine Learning in Computational Biology CSC 2431

Machine Learning in Computational Biology CSC 2431 Machine Learning in Computational Biology CSC 2431 Lecture 9: Combining biological datasets Instructor: Anna Goldenberg What kind of data integration is there? What kind of data integration is there? SNPs

More information