Alexander Statnikov, Ph.D.

Size: px
Start display at page:

Download "Alexander Statnikov, Ph.D."

Transcription

1 Alexander Statnikov, Ph.D. Director, Computational Causal Discovery Laboratory Benchmarking Director, Best Practices Integrative Informatics Consultation Service Assistant Professor, Department of Medicine, Division of Translational Medicine Center for Health Informatics and Bioinformatics, NYU School of Medicine 04/11/2011

2 Co-authors: Evgeny Shmelkov (First Author) Zuojian Tang Iannis Aifantis Other collaborators: Constantin F. Aliferis Alexander V. Alekseyenko Timothy Cardozo Yuval Kluger Sergey V. Shmelkov NYU Clinical and Translational Science Institute (CTSI)

3 Core missions: Advise researchers and undertake all aspects of informatics research design and study execution; Review and synthesize of the literature and conducting large-scale benchmarking of a wide array of methods in order to base advice to researchers on solid evidence; Efficiently connect methods developers with methods consumers.

4 Brings experts together; Provides integrated approach to problem-solving. Survey Research Needs Research Problems Impetus for New Methods Best-Practice Recommendations & Methods New Methods Planning Information & User Feedback Planning Information & User Feedback Best-Practice Recommendations & Methods Best-Practice Recommendations & Methods Seek Solutions Implement/ Port & Benchmark

5 There are endless varieties of methods/tools (especially in bioinformatics) and everybody has his/her own preferences BPIC conducts systematic benchmarking of methods/tools in a variety of datasets in order to significantly improve qualify of research

6 Evaluation of classification algorithms for development of molecular signatures from microarray gene expression data Ø Ø Statnikov, et al., Bioinformatics 2005 (~330 citations!) Statnikov, et al., BMC Bioinformatics 2008 (~60 citations!) Evaluation of feature/variable selection & causal discovery algorithms Ø Aliferis, et al., Journal of Machine Learning Research, 2010 Comparison of protocols to detect predictive signal of prognostic molecular signatures Ø Aliferis, et al., PLoS ONE, 2009 Comparison of algorithms for extraction of all maximally predictive and non-redundant molecular signatures Ø Statnikov, et al., PLoS Computation Biology, 2010 (Cover Article) Comprehensive assessment of methods for reverse-engineering of gene regulatory networks Ø Narendra, et al., Genomics, 2011 Assessing quality and completeness of human transcriptional regulatory pathways on a genome-wide scale Ø Shmelkov, et al., Biology Direct, 2011

7 ü As new methods are introduced to the field, we work on revisiting results of our prior evaluation studies. ü New benchmarking efforts: Evaluation of feature/variable selection algorithms for genomics data Evaluation of algorithms for extraction of all maximally predictive and non-redundant feature sets for various application domains Evaluation of methods for next-generation causal orientation from observational data for genomics domains

8 Ø Store huge amount of complex biological data Ø Allow easy access to the collected data representing it in a simple and obvious way of intuitive graphical schemes and notation biological pathways Ø Often provide a variety of integrated tools for rational data handling and analysis by user

9 Ø Basic science hypothesis generation and validation Ø Pharmacology and drug discovery Ø Governmental regulatory agencies such as FDA (!!!)

10 Freely Available: KEGG Biocarta Cell Signaling Technology Pathways WikiPathways Commercial: Pathway Studio Ingenuity Pathway Analysis MetaCore Biobase TRANSPATH Biobase TRANSFAC GeneSpring Pathways

11 Ø Publicly available vs. commercial Ø Different ways of data notation and representation Ø Different approaches to data collection and handling (e.g. expert curation, computational datamining of biomedical literature) à Often proprietary and not fully disclosed for commercial products

12 Shmelkov et al. Biology Direct (2011)

13 Jaccard Index (J) - normalized measure of similarity between two gene lists (A and B): Shmelkov et al. Biology Direct (2011)

14 Data from different pathway databases/ tools are different What pathway database/tool is more reliable?

15 Ø User-friendly interface Ø Colleague advices Ø Prior experience Ø Marketing presentations Ø Software availability and cost To date there have been no prior genome-wide evaluation studies assessing pathway databases and tools!

16 Assess the accuracy and completeness of available data on transcriptional regulation stored in popular pathway databases and tools

17 1. Collect highly-reliable genome-wide experimental data for well-studied transcription factors 2. Integrate the data in order to create reliable lists of direct targets of these TFs 3. Compare by statistical means obtained Gold Standards with relevant lists of transcriptional targets derived from 10 commonly used pathway databases

18 1. Select a TF 2. Obtain a list of all genes bound by the TF using data from ChIPchip or ChIP-seq studies (L1) 3. Using gene expression data (e.g. Microarray or RNA-seq data) obtain a list of genes that are differentially expressed in cell lines that have wild-type TF versus knocked-out TF (L2) 4. Obtain a Gold Standard for the TF by integrating L1 and L2 Jothi et al. Nucleic Acids Research (2008)

19 Ø L1 contains all genes that bind to the TF However, not all genes in this list are functionally regulated by the TF! Ø L2 contains all genes downstream of the TF that are functionally regulated by it However, this list contains both direct and indirect downstream targets of the TF! L1: All genes that are bound by the TF L2: All downstream targets of the TF (functionally regulated) Ø Thus, we need to intersect these two lists L1 and L2 to get a Gold Standard

20 TF Primary activity Gold Standard Number of Genes in the Gold Standard Analyzed to find genes that are AR BCL6 MYC NOTCH1 RELA STAT1 TP53 N/A Repressor Activator Activator Activator Activator Activator I 513 differentially expressed (experiment treated for 4 hr) II 712 differentially expressed (experiment treated for 16 hr) III 526 differentially expressed (all treated) I 369 differentially expressed II 98 differentially expressed III 271 up-regulated in treated samples IV 76 up-regulated in treated samples I 1887 differentially expressed II 1708 down-regulated in treated samples III 3039 differentially expressed IV 2224 up-regulated in MYC-expressing samples I 414 differentially expressed II 302 down-regulated in treated samples III 637 differentially expressed IV 471 down-regulated in treated samples V 167 differentially expressed VI 166 down-regulated in treated samples I 1864 differentially expressed II 1420 down-regulated in treated samples III 188 differentially expressed IV 136 up-regulated in treated samples I 2128 differentially expressed II 2967 down-regulated in treated samples I 34 differentially expressed II 37 down-regulated in treated samples Shmelkov et al. Biology Direct (2011)

21 Ø Microarrays cannot reliably detect small changes in gene expression and/or genes expressed on very low levels Ø ChIP-chip and ChIP-seq transcription factor-dna binding data is known to have imperfect reproducibility Ø Functional gene expression and binding data used in our work often originated from different studies Ø Compensatory mechanisms in the cell can cause some number of false negatives

22 Ø Ø Ø Ø The statistical significance of the overlap between Gold Standards and lists from Pathway Databases/ Tools is calculated using the hypergeometric test The Null-hypothesis of the test is that two assessed sets of genes have not greater number of genes in common than two randomly selected gene sets with the same number of genes The Null-hypothesis is rejected at 5% α-level False discovery rate (FDR) correction was applied for multiple comparisons analysis Shmelkov et al. Biology Direct (2011)

23 Shmelkov et al. Biology Direct (2011)

24 Even though hypergeometric test is based on odds ratios, databases with a very small number of targets may not reach statistical significance regardless of the quality of their data. Enrichment Fold Change (EFC) ratios are calculated as the observed number of genes in the intersection divided by the expected size of intersection under the null hypothesis Notice however that larger values of EFC may correspond to databases that are highly incomplete and contain only a few relations. Shmelkov et al. Biology Direct (2011)

25 Shmelkov et al. Biology Direct (2011)

26 The lists of experimentally derived direct targets obtained in this study can be used to reveal new biological insight in transcriptional regulation and suggest novel putative therapeutic targets. We and others have previously suggested that induction and maintenance of T-cell acute lymphoblastic leukemia (T-ALL), a devastating pediatric blood cancer, depends on the cross talk of three transcription factors, NOTCH1, MYC, and RELA (NF-κB). NOTCH1 & MYC è 438 common targets Two activators of cell cycle entry, CDK4 and CDK6 appear to be induced by both factors. Aifantis et al. has previously shown that silencing of CDK4/6 activity is able to suppress T-ALL suggesting that NOTCH1 and MYC activities could converge on these CDK genes to initiate expansion of transformed cells. MYC itself and its interacting partners MYCB and MYCB2 appear to also be targeted by both factors, suggesting an interesting signal amplification mechanism.

27 RELA & MYC è 561 common targets Several essential T-cell regulators, including RUNX1, BCL2L1 (BCL-xL), ID3, ITCH, JAK3 and NOTCH1, appear to be controlled by both transcription factors. NOTCH1 is downstream of both RELA and MYC but at the same time these two factors are targets of oncogenic NOTCH1, suggesting once more an intricate auto-amplification loop that could sustain transformation. NOTCH1 & RELA è 156 common targets NOTCH1 & RELA & MYC è 117 common targets (Lists of genes are given in the paper supplement)

28 Ø Pathway Databases/Tools often do not agree with each other and contain different target genes of assessed transcription factors Ø Majority of sets of target genes extracted from selected Databases/Tools are most likely incomplete and/or inaccurate as they significantly disagree with experimentally derived Gold Standards (MetaCore is the only exception)

29 Ø In order to obtain a more accurate research hypothesis, the choice of pathway databases has to be informed by solid scientific evidence and rigorous empirical evaluations such as ours Ø We recommend developers of pathway software to Take advantage of high-throughput genome-wide data to refine pathways instead of traditional literature search for single interactions Provide detailed description of each interaction and node in the pathway (e.g. introduce confidence scores reflecting the reliability level of original reference data; address differences in experimental conditions)

30 Assessing quality and completeness of human transcriptional regulatory pathways on a genome-wide scale Evgeny Shmelkov, Zuojian Tang, Iannis Aifantis, Alexander Statnikov Biology Direct (2011), 6:15 6/1/15 Gold Standards could be accessed at:

31 Alexander Statnikov: Evgeny Shmelkov:

Predictive and Causal Modeling in the Health Sciences. Sisi Ma MS, MS, PhD. New York University, Center for Health Informatics and Bioinformatics

Predictive and Causal Modeling in the Health Sciences. Sisi Ma MS, MS, PhD. New York University, Center for Health Informatics and Bioinformatics Predictive and Causal Modeling in the Health Sciences Sisi Ma MS, MS, PhD. New York University, Center for Health Informatics and Bioinformatics 1 Exponentially Rapid Data Accumulation Protein Sequencing

More information

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005 Bioinformatics is the recording, annotation, storage, analysis, and searching/retrieval of nucleic acid sequence (genes and RNAs), protein sequence and structural information. This includes databases of

More information

DIAMANTINA INSTITUTE for Cancer, Immunology and Metabolic Medicine

DIAMANTINA INSTITUTE for Cancer, Immunology and Metabolic Medicine DIAMANTINA INSTITUTE for Cancer, Immunology and Metabolic Medicine Defining MYB Transcriptional Network by Genome-wide Chromatin Occupancy Profiling (ChIP-Seq) 2010 E.Glazov, L. Zhao Transcription Factors:

More information

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility 2018 ABRF Meeting Satellite Workshop 4 Bridging the Gap: Isolation to Translation (Single Cell RNA-Seq) Sunday, April 22 Basics of RNA-Seq (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly,

More information

Gene List Enrichment Analysis

Gene List Enrichment Analysis Outline Gene List Enrichment Analysis George Bell, Ph.D. BaRC Hot Topics March 16, 2010 Why do enrichment analysis? Main types Selecting or ranking genes Annotation sources Statistics Remaining issues

More information

Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development

Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development 1 Methods for Multi-Category Cancer Diagnosis from Gene Expression Data: A Comprehensive Evaluation to Inform Decision Support System Development Alexander Statnikov M.S., Constantin F. Aliferis M.D.,

More information

Computational Challenges of Medical Genomics

Computational Challenges of Medical Genomics Talk at the VSC User Workshop Neusiedl am See, 27 February 2012 [cbock@cemm.oeaw.ac.at] http://medical-epigenomics.org (lab) http://www.cemm.oeaw.ac.at (institute) Introducing myself to Vienna s scientific

More information

Understanding protein lists from comparative proteomics studies

Understanding protein lists from comparative proteomics studies Understanding protein lists from comparative proteomics studies Bing Zhang, Ph.D. Department of Biomedical Informatics Vanderbilt University School of Medicine bing.zhang@vanderbilt.edu A typical comparative

More information

Introduction to Bioinformatics

Introduction to Bioinformatics Introduction to Bioinformatics If the 19 th century was the century of chemistry and 20 th century was the century of physic, the 21 st century promises to be the century of biology...professor Dr. Satoru

More information

DRAGON DATABASE OF GENES ASSOCIATED WITH PROSTATE CANCER (DDPC) Monique Maqungo

DRAGON DATABASE OF GENES ASSOCIATED WITH PROSTATE CANCER (DDPC) Monique Maqungo DRAGON DATABASE OF GENES ASSOCIATED WITH PROSTATE CANCER (DDPC) Monique Maqungo South African National Bioinformatics Institute University of the Western Cape RELEVEANCE OF DATA SHARING! Fragmented data

More information

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis Gene expression analysis Biosciences 741: Genomics Fall, 2013 Week 5 Gene expression analysis From EST clusters to spotted cdna microarrays Long vs. short oligonucleotide microarrays vs. RT-PCR Methods

More information

Our website:

Our website: Biomedical Informatics Summer Internship Program (BMI SIP) The Department of Biomedical Informatics hosts an annual internship program each summer which provides high school, undergraduate, and graduate

More information

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight

Introducing QIAseq. Accelerate your NGS performance through Sample to Insight solutions. Sample to Insight Introducing QIAseq Accelerate your NGS performance through Sample to Insight solutions Sample to Insight From Sample to Insight let QIAGEN enhance your NGS-based research High-throughput next-generation

More information

Analyzing Gene Set Enrichment

Analyzing Gene Set Enrichment Analyzing Gene Set Enrichment BaRC Hot Topics June 20, 2016 Yanmei Huang Bioinformatics and Research Computing Whitehead Institute http://barc.wi.mit.edu/hot_topics/ Purpose of Gene Set Enrichment Analysis

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review

More information

Gene expression connectivity mapping and its application to Cat-App

Gene expression connectivity mapping and its application to Cat-App Gene expression connectivity mapping and its application to Cat-App Shu-Dong Zhang Northern Ireland Centre for Stratified Medicine University of Ulster Outline TITLE OF THE PRESENTATION Gene expression

More information

Biology 644: Bioinformatics

Biology 644: Bioinformatics Processes Activation Repression Initiation Elongation.... Processes Splicing Editing Degradation Translation.... Transcription Translation DNA Regulators DNA-Binding Transcription Factors Chromatin Remodelers....

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 3: Visualization and Functional Analysis George Bell, Ph.D. Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Review Visualizing

More information

Genetics and Bioinformatics

Genetics and Bioinformatics Genetics and Bioinformatics Kristel Van Steen, PhD 2 Montefiore Institute - Systems and Modeling GIGA - Bioinformatics ULg kristel.vansteen@ulg.ac.be Lecture 1: Setting the pace 1 Bioinformatics what s

More information

Enabling Reproducible Gene Expression Analysis

Enabling Reproducible Gene Expression Analysis Enabling Reproducible Gene Expression Analysis Limsoon Wong 25 July 2011 (Joint work with Donny Soh, Difeng Dong, Yike Guo) 2 Plan An issue in gene expression analysis Comparing pathway sources: Comprehensiveness,

More information

Introduction to BIOINFORMATICS

Introduction to BIOINFORMATICS COURSE OF BIOINFORMATICS a.a. 2016-2017 Introduction to BIOINFORMATICS What is Bioinformatics? (I) The sinergy between biology and informatics What is Bioinformatics? (II) From: http://www.bioteach.ubc.ca/bioinfo2010/

More information

Ingenuity Pathway Analysis (IPA )

Ingenuity Pathway Analysis (IPA ) Ingenuity Pathway Analysis (IPA ) For the analysis and interpretation of omics data IPA is a web-based software application for the analysis, integration, and interpretation of data derived from omics

More information

BIMM 143: Introduction to Bioinformatics (Winter 2018)

BIMM 143: Introduction to Bioinformatics (Winter 2018) BIMM 143: Introduction to Bioinformatics (Winter 2018) Course Instructor: Dr. Barry J. Grant ( bjgrant@ucsd.edu ) Course Website: https://bioboot.github.io/bimm143_w18/ DRAFT: 2017-12-02 (20:48:10 PST

More information

ChIP-seq analysis 2/28/2018

ChIP-seq analysis 2/28/2018 ChIP-seq analysis 2/28/2018 Acknowledgements Much of the content of this lecture is from: Furey (2012) ChIP-seq and beyond Park (2009) ChIP-seq advantages + challenges Landt et al. (2012) ChIP-seq guidelines

More information

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015

ChIP-Seq Data Analysis. J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 ChIP-Seq Data Analysis J Fass UCD Genome Center Bioinformatics Core Wednesday 15 June 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA

More information

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015

ChIP-Seq Tools. J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015 ChIP-Seq Tools J Fass UCD Genome Center Bioinformatics Core Wednesday September 16, 2015 What s the Question? Where do Transcription Factors (TFs) bind genomic DNA 1? (Where do other things bind DNA or

More information

Gene Signature Lab: Exploring integrative LINCS (ilincs) Data and Signatures Analysis Portal & Other LINCS Resources

Gene Signature Lab: Exploring integrative LINCS (ilincs) Data and Signatures Analysis Portal & Other LINCS Resources Gene Signature Lab: Exploring integrative LINCS (ilincs) Data and Signatures Analysis Portal & Other LINCS Resources Jarek Meller, PhD BD2K-LINCS Data Coordination and Integration Center University of

More information

DNA. Clinical Trials. Research RNA. Custom. Reports CLIA CAP GCP. Tumor Genomic Profiling Services for Clinical Trials

DNA. Clinical Trials. Research RNA. Custom. Reports CLIA CAP GCP. Tumor Genomic Profiling Services for Clinical Trials Tumor Genomic Profiling Services for Clinical Trials Custom Reports DNA RNA Focused Gene Sets Clinical Trials Accuracy and Content Enhanced NGS Sequencing Extended Panel, Exomes, Transcriptomes Research

More information

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics

Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics Introduction to Microarray Technique, Data Analysis, Databases Maryam Abedi PhD student of Medical Genetics abedi777@ymail.com Outlines Technology Basic concepts Data analysis Printed Microarrays In Situ-Synthesized

More information

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd

QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd QIAGEN s NGS Solutions for Biomarkers NGS & Bioinformatics team QIAGEN (Suzhou) Translational Medicine Co.,Ltd 1 Our current NGS & Bioinformatics Platform 2 Our NGS workflow and applications 3 QIAGEN s

More information

MicroRNAs Sequencing, analysis and then what? Click to edit Master subtitle. Pamela Mukhopadhyay Winter School 5 th July 2016

MicroRNAs Sequencing, analysis and then what? Click to edit Master subtitle. Pamela Mukhopadhyay Winter School 5 th July 2016 MicroRNAs Sequencing, analysis and then what? Click to edit Master subtitle Pamela Mukhopadhyay Winter School 5 th July 2016 Presentation overview Introduction sequencing analysis Identifying targets biogenesis

More information

Александр Предеус. Институт Биоинформатики. Gene Set Analysis: почему интерпретировать глобальные генетические изменения труднее, чем кажется

Александр Предеус. Институт Биоинформатики. Gene Set Analysis: почему интерпретировать глобальные генетические изменения труднее, чем кажется Александр Предеус Институт Биоинформатики Gene Set Analysis: почему интерпретировать глобальные генетические изменения труднее, чем кажется Outline Formulating the problem What are the references? Overrepresentation

More information

Gene Expression Data Analysis (I)

Gene Expression Data Analysis (I) Gene Expression Data Analysis (I) Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Bioinformatics tasks Biological question Experiment design Microarray experiment

More information

Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson Pinn 6-057

Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson Pinn 6-057 Characterizing DNA binding sites high throughput approaches Biol4230 Tues, April 24, 2018 Bill Pearson wrp@virginia.edu 4-2818 Pinn 6-057 Reviewing sites: affinity and specificity representation binding

More information

IPA : Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA

IPA : Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA IPA : Maximizing the Biological Interpretation of Gene, Transcript & Protein Expression Data with IPA Marisa Chen Account Manager Qiagen Advanced Genomics Marisa.Chen@qiagen.com (203) 500-1237 Dev Mistry,

More information

Machine Learning. HMM applications in computational biology

Machine Learning. HMM applications in computational biology 10-601 Machine Learning HMM applications in computational biology Central dogma DNA CCTGAGCCAACTATTGATGAA transcription mrna CCUGAGCCAACUAUUGAUGAA translation Protein PEPTIDE 2 Biological data is rapidly

More information

Gene Expression Data Analysis

Gene Expression Data Analysis Gene Expression Data Analysis Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu BMIF 310, Fall 2009 Gene expression technologies (summary) Hybridization-based

More information

27041, Week 02. Review of Week 01

27041, Week 02. Review of Week 01 27041, Week 02 Review of Week 01 The human genome sequencing project (HGP) 2 CBS, Department of Systems Biology Systems Biology and emergent properties 3 CBS, Department of Systems Biology Different model

More information

GeneWEB Tutorial. Enhancing Biological Research with Gene Networks Bioinformatics Department

GeneWEB Tutorial. Enhancing Biological Research with Gene Networks Bioinformatics Department GeneWEB Tutorial Enhancing Biological Research with Gene Networks Bioinformatics Department 1 Topics to be Discussed What is GeneWEB and what can it do for me? Gene Network Versus Canonical Pathway Entering

More information

EECS730: Introduction to Bioinformatics

EECS730: Introduction to Bioinformatics EECS730: Introduction to Bioinformatics Lecture 14: Microarray Some slides were adapted from Dr. Luke Huan (University of Kansas), Dr. Shaojie Zhang (University of Central Florida), and Dr. Dong Xu and

More information

Pathway Analysis. Min Kim Bioinformatics Core Facility 2/28/2018

Pathway Analysis. Min Kim Bioinformatics Core Facility 2/28/2018 Pathway Analysis Min Kim Bioinformatics Core Facility 2/28/2018 Outline 1. Background 2. Databases: KEGG, Reactome, Biocarta, Gene Ontology, MSigDB, MetaCyc, SMPDB, IPA. 3. Statistical Methods: Overlap

More information

Bioinformatics : Gene Expression Data Analysis

Bioinformatics : Gene Expression Data Analysis 05.12.03 Bioinformatics : Gene Expression Data Analysis Aidong Zhang Professor Computer Science and Engineering What is Bioinformatics Broad Definition The study of how information technologies are used

More information

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt

Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs. Bertil Schmidt Christian Hundt Accelerating Gene Set Enrichment Analysis on CUDA-Enabled GPUs Bertil Schmidt Christian Hundt Contents Gene Set Enrichment Analysis (GSEA) Background Algorithmic details cudagsea Performance evaluation

More information

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University

Understanding protein lists from proteomics studies. Bing Zhang Department of Biomedical Informatics Vanderbilt University Understanding protein lists from proteomics studies Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu A typical comparative shotgun proteomics study IPI00375843

More information

DNA Microarray Technology

DNA Microarray Technology CHAPTER 1 DNA Microarray Technology All living organisms are composed of cells. As a functional unit, each cell can make copies of itself, and this process depends on a proper replication of the genetic

More information

in Biomedicine A Gentle Introduction to Support Vector Machines Volume 1: Theory and Methods

in Biomedicine A Gentle Introduction to Support Vector Machines Volume 1: Theory and Methods A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1: Theory and Methods This page intentionally left blank A Gentle Introduction to Support Vector Machines in Biomedicine Volume 1:

More information

October 6, Below we provide input from our community of health informatics experts regarding select portions of the Draft Guidance.

October 6, Below we provide input from our community of health informatics experts regarding select portions of the Draft Guidance. Division of Dockets Management (HFA-305) Food and Drug Administration 5630 Fishers Lane, rm. 1061 Rockville, MD 20852 Submitted electronically via http://www.regulations.gov RE: Use of Public Human Genetic

More information

Gene Expression Microarrays. For microarrays, purity of the RNA was further assessed by

Gene Expression Microarrays. For microarrays, purity of the RNA was further assessed by Supplemental Methods Gene Expression Microarrays. For microarrays, purity of the RNA was further assessed by an Agilent 2100 Bioanalyzer. 500 ng of RNA was reverse transcribed into crna and biotin-utp

More information

Data Intensive Scientific Discovery Vijay Chandru

Data Intensive Scientific Discovery Vijay Chandru Data Intensive Scientific Discovery Vijay Chandru Hon. Professor, NIAS Chairman, Strand Life Sciences chandru@alum.mit.edu The Promise Peta (10 15 )and Exa (10 18 ) scale Computing Astrophysics (Large

More information

NGS Approaches to Epigenomics

NGS Approaches to Epigenomics I519 Introduction to Bioinformatics, 2013 NGS Approaches to Epigenomics Yuzhen Ye (yye@indiana.edu) School of Informatics & Computing, IUB Contents Background: chromatin structure & DNA methylation Epigenomic

More information

April transmart v1.2 Case Study for PredicTox

April transmart v1.2 Case Study for PredicTox April 2015 transmart v1.2 Case Study for PredicTox Agenda Agenda! What is PredicTox?! Brief transmart overview! Answering scientific questions with transmart s help: A case study maximizing data value!

More information

Information Driven Biomedicine. Prof. Santosh K. Mishra Executive Director, BII CIAPR IV Shanghai, May

Information Driven Biomedicine. Prof. Santosh K. Mishra Executive Director, BII CIAPR IV Shanghai, May Information Driven Biomedicine Prof. Santosh K. Mishra Executive Director, BII CIAPR IV Shanghai, May 21 2004 What/How RNA Complexity of Data Information The Genetic Code DNA RNA Proteins Pathways Complexity

More information

DNA METHYLATION RESEARCH TOOLS

DNA METHYLATION RESEARCH TOOLS SeqCap Epi Enrichment System Revolutionize your epigenomic research DNA METHYLATION RESEARCH TOOLS Methylated DNA The SeqCap Epi System is a set of target enrichment tools for DNA methylation assessment

More information

2/10/17. Contents. Applications of HMMs in Epigenomics

2/10/17. Contents. Applications of HMMs in Epigenomics 2/10/17 I529: Machine Learning in Bioinformatics (Spring 2017) Contents Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2017 Background:

More information

Data representation for clinical data and metadata

Data representation for clinical data and metadata Data representation for clinical data and metadata WP1: Data representation for clinical data and metadata Inconsistent terminology creates barriers to identifying common clinical entities in disparate

More information

Textbook Reading Guidelines

Textbook Reading Guidelines Understanding Bioinformatics by Marketa Zvelebil and Jeremy Baum Last updated: May 1, 2009 Textbook Reading Guidelines Preface: Read the whole preface, and especially: For the students with Life Science

More information

sherwood - UltramiR shrna Collections

sherwood - UltramiR shrna Collections sherwood - UltramiR shrna Collections Incorporating advances in shrna design and processing for superior potency and specificity sherwood - UltramiR shrna Collections Enabling Discovery Across the Genome

More information

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE

ROAD TO STATISTICAL BIOINFORMATICS CHALLENGE 1: MULTIPLE-COMPARISONS ISSUE CHAPTER1 ROAD TO STATISTICAL BIOINFORMATICS Jae K. Lee Department of Public Health Science, University of Virginia, Charlottesville, Virginia, USA There has been a great explosion of biological data and

More information

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X

Microarray Data Analysis in GeneSpring GX 11. Month ##, 200X Microarray Data Analysis in GeneSpring GX 11 Month ##, 200X Agenda Genome Browser GO GSEA Pathway Analysis Network building Find significant pathways Extract relations via NLP Data Visualization Options

More information

Identifying Cooperativity Among Transcription Factors Controlling the Cell Cycle In Yeast

Identifying Cooperativity Among Transcription Factors Controlling the Cell Cycle In Yeast Identifying Cooperativity Among Transcription Factors Controlling the Cell Cycle In Yeast Nilanjana Banerjee 1,2 and Michael Q. Zhang 1 Nucleic Acids Research, 2003, Vol.31, No.23 1 Cold Spring Harbor

More information

Pioneering Clinical Omics

Pioneering Clinical Omics Pioneering Clinical Omics Clinical Genomics Strand NGS An analysis tool for data generated by cutting-edge Next Generation Sequencing(NGS) instruments. Strand NGS enables read alignment and analysis of

More information

Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach

Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach Mudge et al. BMC Bioinformatics (2017) 18:312 DOI 10.1186/s12859-017-1728-3 METHODOLOGY ARTICLE Open Access Optimal alpha reduces error rates in gene expression studies: a meta-analysis approach J. F.

More information

Learning Methods for DNA Binding in Computational Biology

Learning Methods for DNA Binding in Computational Biology Learning Methods for DNA Binding in Computational Biology Mark Kon Dustin Holloway Yue Fan Chaitanya Sai Charles DeLisi Boston University IJCNN Orlando August 16, 2007 Outline Background on Transcription

More information

Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization

Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization Gene List Enrichment Analysis - Statistics, Tools, Data Integration and Visualization Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu

More information

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter

Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter Identification of biological themes in microarray data from a mouse heart development time series using GeneSifter VizX Labs, LLC Seattle, WA 98119 Abstract Oligonucleotide microarrays were used to study

More information

Top 5 Lessons Learned From MAQC III/SEQC

Top 5 Lessons Learned From MAQC III/SEQC Top 5 Lessons Learned From MAQC III/SEQC Weida Tong, Ph.D Division of Bioinformatics and Biostatistics, NCTR/FDA Weida.tong@fda.hhs.gov; 870 543 7142 1 MicroArray Quality Control (MAQC) An FDA led community

More information

ExPlain Analysis of TAL1 ChIP-seq Intervals

ExPlain Analysis of TAL1 ChIP-seq Intervals ExPlain Analysis of TAL1 ChIP-seq Intervals Introduction Next generation sequencing (NGS) technologies have opened up novel research possibilities in many areas including cancer research, gene regulation,

More information

Gene Network Central (GNC) Pro Tutorial

Gene Network Central (GNC) Pro Tutorial Gene Network Central (GNC) Pro Tutorial.Enhancing Biological Research with Gene Networks Topics to be Discussed What is GNC Pro and what can it do for me? Gene Network Versus Canonical Pathway Entering

More information

Pathway analysis. Martina Kutmon Department of Bioinformatics Maastricht University

Pathway analysis. Martina Kutmon Department of Bioinformatics Maastricht University Pathway analysis Martina Kutmon Department of Bioinformatics Maastricht University Who are we? Department of Bioinformatics @ Maastricht University Martina Kutmon PhD student, 3rd year Anwesha Bohler PhD

More information

CyVerse Overview. National Academies Special Topics Summer Institute on Quantitative Biology

CyVerse Overview. National Academies Special Topics Summer Institute on Quantitative Biology Transforming Science Through Data-driven Discovery CyVerse Overview National Academies Special Topics Summer Institute on Quantitative Biology Jason Williams Lead, CyVerse Education, Outreach, Training

More information

BIOINFORMATICS THE MACHINE LEARNING APPROACH

BIOINFORMATICS THE MACHINE LEARNING APPROACH 88 Proceedings of the 4 th International Conference on Informatics and Information Technology BIOINFORMATICS THE MACHINE LEARNING APPROACH A. Madevska-Bogdanova Inst, Informatics, Fac. Natural Sc. and

More information

Motifs. BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin

Motifs. BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin Motifs BCH339N - Systems Biology / Bioinformatics Edward Marcotte, Univ of Texas at Austin An example transcriptional regulatory cascade Here, controlling Salmonella bacteria multidrug resistance Sequencespecific

More information

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11 Proteomics, functional genomics, and systems biology Biosciences 741: Genomics Fall, 2013 Week 11 1 Figure 6.1 The future of genomics Functional Genomics The field of functional genomics represents the

More information

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE

AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE ACCELERATING PROGRESS IS IN OUR GENES AGILENT S BIOINFORMATICS ANALYSIS SOFTWARE GENESPRING GENE EXPRESSION (GX) MASS PROFILER PROFESSIONAL (MPP) PATHWAY ARCHITECT (PA) See Deeper. Reach Further. BIOINFORMATICS

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

Gene Expression Analysis with Pathway-Centric DNA Microarrays

Gene Expression Analysis with Pathway-Centric DNA Microarrays Gene Expression Analysis with Pathway-Centric DNA Microarrays SuperArray Bioscience Corporation George J. Quellhorst, Jr. Ph.D. Manager, Customer Education Topics to be Covered Introduction to DNA Microarrays

More information

Charles Girardot, Furlong Lab. MACS, CisGenome, SISSRs and other peak calling algorithms: differences and practical use

Charles Girardot, Furlong Lab. MACS, CisGenome, SISSRs and other peak calling algorithms: differences and practical use Charles Girardot, Furlong Lab MACS, CisGenome, SISSRs and other peak calling algorithms: differences and practical use ChIP-Seq signal properties Only 5 ends of ChIPed fragments are sequenced Shifted read

More information

Developing an Accurate and Precise Companion Diagnostic Assay for Targeted Therapies in DLBCL

Developing an Accurate and Precise Companion Diagnostic Assay for Targeted Therapies in DLBCL Developing an Accurate and Precise Companion Diagnostic Assay for Targeted Therapies in DLBCL James Storhoff, Ph.D. Senior Manager, Diagnostic Test Development World Cdx, Boston, Sep. 10th Molecules That

More information

Cory Brouwer, Ph.D. Xiuxia Du, Ph.D. Anthony Fodor, Ph.D.

Cory Brouwer, Ph.D. Xiuxia Du, Ph.D. Anthony Fodor, Ph.D. Cory Brouwer, Ph.D. Dr. Cory R. Brouwer is Director of the Bioinformatics Services Division and Associate Professor of Bioinformatics and Genomics at UNC Charlotte. He and his team provide a wide range

More information

Applications of HMMs in Epigenomics

Applications of HMMs in Epigenomics I529: Machine Learning in Bioinformatics (Spring 2013) Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Contents Background:

More information

APA Version 3. Prerequisite. Checking dependencies

APA Version 3. Prerequisite. Checking dependencies APA Version 3 Altered Pathway Analyzer (APA) is a cross-platform and standalone tool for analyzing gene expression datasets to highlight significantly rewired pathways across case-vs-control conditions.

More information

AIT - Austrian Institute of Technology

AIT - Austrian Institute of Technology BIOMARKER DISCOVERY, BIOINFORMATICS, AND BIOSENSOR DEVELOPMENT Technology Experience AIT Austrian Institute of Technology Low-Emission Transport AIT - Austrian Institute of Technology Energy Health & Bioresources

More information

2/19/13. Contents. Applications of HMMs in Epigenomics

2/19/13. Contents. Applications of HMMs in Epigenomics 2/19/13 I529: Machine Learning in Bioinformatics (Spring 2013) Contents Applications of HMMs in Epigenomics Yuzhen Ye School of Informatics and Computing Indiana University, Bloomington Spring 2013 Background:

More information

Multiple Testing in RNA-Seq experiments

Multiple Testing in RNA-Seq experiments Multiple Testing in RNA-Seq experiments O. Muralidharan et al. 2012. Detecting mutations in mixed sample sequencing data using empirical Bayes. Bernd Klaus Institut für Medizinische Informatik, Statistik

More information

MANIFESTO OF STUDIES 2012

MANIFESTO OF STUDIES 2012 MANIFESTO OF STUDIES 2012 1st YEAR Course Teacher Hours Synopsis Evaluation procedure Laboratory Safety Course (Mandatory) Prof. Mancini I. Dr. Provenzani A. 12 General Laboratory Procedures, Equipment

More information

Learning Bayesian Network Models of Gene Regulation

Learning Bayesian Network Models of Gene Regulation Learning Bayesian Network Models of Gene Regulation CIBM Retreat October 3, 2003 Keith Noto Mark Craven s Group University of Wisconsin-Madison CIBM Retreat 2003 Poster Session p.1/18 Abstract Our knowledge

More information

Data Mining for Biological Data Analysis

Data Mining for Biological Data Analysis Data Mining for Biological Data Analysis Data Mining and Text Mining (UIC 583 @ Politecnico di Milano) References Data Mining Course by Gregory-Platesky Shapiro available at www.kdnuggets.com Jiawei Han

More information

Bioinformatics and Life Sciences Standards and Programming for Heterogeneous Architectures

Bioinformatics and Life Sciences Standards and Programming for Heterogeneous Architectures Bioinformatics and Life Sciences Standards and Programming for Heterogeneous Architectures Eric Stahlberg Ph.D. (SAIC-Frederick contractor) stahlbergea@mail.nih.gov SIAM Conference on Parallel Processing

More information

DNA Based Disease Prediction using pathway Analysis

DNA Based Disease Prediction using pathway Analysis 2017 IEEE 7th International Advance Computing Conference DNA Based Disease Prediction using pathway Analysis Syeeda Farah Dr.Asha T Cauvery B and Sushma M S Department of Computer Science and Shivanand

More information

Discovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies

Discovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies Discovering gene regulatory control using ChIP-chip and ChIP-seq Part 1 An introduction to gene regulatory control, concepts and methodologies Ian Simpson ian.simpson@.ed.ac.uk http://bit.ly/bio2links

More information

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson

Grundlagen der Bioinformatik Summer Lecturer: Prof. Daniel Huson Grundlagen der Bioinformatik, SoSe 11, D. Huson, April 11, 2011 1 1 Introduction Grundlagen der Bioinformatik Summer 2011 Lecturer: Prof. Daniel Huson Office hours: Thursdays 17-18h (Sand 14, C310a) 1.1

More information

Microarray Informatics

Microarray Informatics Microarray Informatics Donald Dunbar MSc Seminar 4 th February 2009 Aims To give a biologistʼs view of microarray experiments To explain the technologies involved To describe typical microarray experiments

More information

SureSilencing sirna Array Technology Overview

SureSilencing sirna Array Technology Overview SureSilencing sirna Array Technology Overview Pathway-Focused sirna-based RNA Interference Topics to be Covered Who is SuperArray? Brief Introduction to RNA Interference Challenges Facing RNA Interference

More information

A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING

A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING A WEB-BASED TOOL FOR GENOMIC FUNCTIONAL ANNOTATION, STATISTICAL ANALYSIS AND DATA MINING D. Martucci a, F. Pinciroli a,b, M. Masseroli a a Dipartimento di Bioingegneria, Politecnico di Milano, Milano,

More information

Progress and Future Directions in Integrated Systems Toxicology. Mary McBride Agilent Technologies

Progress and Future Directions in Integrated Systems Toxicology. Mary McBride Agilent Technologies Progress and Future Directions in Integrated Systems Toxicology Mary McBride Agilent Technologies 1 Toxicity testing tools of the late 20 th century Patchwork approach to testing dates back to the 1930

More information

BIOINFORMATICS Introduction

BIOINFORMATICS Introduction BIOINFORMATICS Introduction Mark Gerstein, Yale University bioinfo.mbb.yale.edu/mbb452a 1 (c) Mark Gerstein, 1999, Yale, bioinfo.mbb.yale.edu What is Bioinformatics? (Molecular) Bio -informatics One idea

More information

Seven Keys to Successful Microarray Data Analysis

Seven Keys to Successful Microarray Data Analysis Seven Keys to Successful Microarray Data Analysis Experiment Design Platform Selection Data Management System Access Differential Expression Biological Significance Data Publication Type of experiment

More information

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes

ACCELERATING GENOMIC ANALYSIS ON THE CLOUD. Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes ACCELERATING GENOMIC ANALYSIS ON THE CLOUD Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia to analyze thousands of genomes Enabling the PanCancer Analysis of Whole Genomes (PCAWG) consortia

More information