Bioinformatics: Opportunities and Challenges for Data Recovery, Analysis and Sustainability

Similar documents
Epigenetic control of tomato fruit development. Key Laboratories of Molecular Epigenetics of MOE, Northeast Normal University, Changchun, China

Virus-induced gene complementation reveals a transcription factor network in modulation of tomato fruit ripening

Genetic Regulation of Fruit Development and Ripening

Harnessing Genomics of Edible African Solanaceae Plants For Improved Nutritional and Food Security

Tansley review. A critical evaluation of the role of ethylene and MADS transcription factors in the network controlling fleshy fruit ripening.

MOLECULAR PHYSIOLOGY OF BANANA FRUIT RIPENING: Improvement of fruit quality

Using semantic web technology to accelerate plant breeding.

Agri-Science Art of Application: Bioinformatics and Agricultural Sciences Education. David Francis Horticulture and Crop Sciences

Chapter 1. from genomics to proteomics Ⅱ

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Characterisation of ethylene pathway components in non-climacteric capsicum

Gene expression analysis. Biosciences 741: Genomics Fall, 2013 Week 5. Gene expression analysis

CHAPTER 21 GENOMES AND THEIR EVOLUTION

MOLECULAR PHYSIOLOGY STUDIES OF BANANA FRUIT RIPENING Identification of candidate genes for improvement of fruit quality traits throughout breeding

Gene Annotation Project. Group 1. Tyler Tiede Yanzhu Ji Jenae Skelton

Genomes: What we know and what we don t know

Tomato Expression Database (TED): a suite of data presentation and analysis tools

ACC Synthase Genes Related to Cold-dependent Ripening in Pear Fruit

Refresher on gene expression - DNA: The stuff of life

Precision engineering of the plant genome. Sjef Smeekens Institute of Environmental Biology Utrecht University

Food Biotechnology: Agustin K Wardani

Pathway approach for candidate gene identification and introduction to metabolic pathway databases.

SUPPLEMENTARY INFORMATION

Tuning LeSPL-CNR expression by SlymiR157 affects tomato fruit

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

Orange Fruit Color in Capsicum due to Deletion of Capsanthin-capsorubin Synthesis Gene

CHAPTERS 16 & 17: DNA Technology

11/22/13. Proteomics, functional genomics, and systems biology. Biosciences 741: Genomics Fall, 2013 Week 11

Introduction to Bioinformatics

Introduction to BIOINFORMATICS

Biology 644: Bioinformatics

Supplemental Figure 1.

Large scale genome editing for. Senior Scientist, GenScript

Induced point mutations in the phytoene synthase 1 gene cause differences in carotenoid content during tomato fruit ripening

Genetics/Genomics in Public Health. Role of Clinical and Biochemical Geneticists in Public Health

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Proteomics and Cancer

Medicinal Chemistry of Modern Antibiotics

Medicinal Chemistry of Modern Antibiotics

The study of the structure, function, and interaction of cellular proteins is called. A) bioinformatics B) haplotypics C) genomics D) proteomics

Fleshy Fruit Expansion and Ripening Are Regulated by the Tomato SHATTERPROOF Gene TAGL1 W OA

Fleshy Fruit Expansion and Ripening Are Regulated by the Tomato SHATTERPROOF Gene TAGL1 W OA

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

The chimeric repressor version of an Ethylene Response Factor (ERF) family member, Sl-ERF.B3, shows contrasting effects on tomato fruit ripening

Transcriptomics. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Complete draft sequence 2001

Analysis of Microarray Data

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies

Soil invertebrates as a genomic model to study pollutants in the field

Wheat Genome Structural Annotation Using a Modular and Evidence-combined Annotation Pipeline

MICROARRAYS: CHIPPING AWAY AT THE MYSTERIES OF SCIENCE AND MEDICINE

Fundamentals of Bioinformatics: computation, biology, computational biology

Drosophila White Paper 2003 August 13, 2003

CRISPR GENOMIC SERVICES PRODUCT CATALOG

The University of California, Santa Cruz (UCSC) Genome Browser

CodeLink Human Whole Genome Bioarray

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

What are proteomics? And what can they tell us about seed maturation and germination?

Testing the ABC floral-organ identity model: cloning the genes

ABI3 Controls Embryo De-greening Through Mendel's I locus

How to Characterize Rust Isolates: Genome Sequencing, Molecular Markers and Differentials

Proposed experiments to study SAR

Analysis of Microarray Data

DNA & Protein Synthesis. The source and the process!

Highly efficient genome engineering in flowering plants ~ Development of a rapid method to knockout genes in Arabidopsis thaliana ~

Serial Analysis of Gene Expression

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Genomics-based approaches to improve drought tolerance of crops

Genetics of Plant-Pathogen Interactions (Plant Immunity) Topics on Systemic Acquired Resistance (SAR)

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Chemical treatments cause cells and nuclei to burst The DNA is inherently sticky, and can be pulled out of the mixture This is called spooling DNA

MCDB /15/13 Working with DNA and Biotechnology

Testing the ABC floral-organ identity model: cloning the genes

RNA-Seq analysis workshop

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Functional Genomics in Plants

What is Bioinformatics?

Mutagenesis for Studying Gene Function Spring, 2007 Guangyi Wang, Ph.D. POST103B

GenMiner: Mining Informative Association Rules from Genomic Data

Announcement Structure Analysis

Course Agenda. Day One

Introduction to Bioinformatics

earray 5.0 Create your own Custom Microarray Design

Genome annotation & EST

Sample to Insight. Dr. Bhagyashree S. Birla NGS Field Application Scientist

Host : Dr. Nobuyuki Nukina Tutor : Dr. Fumitaka Oyama

Motivation From Protein to Gene

Protein Bioinformatics Part I: Access to information

Capabilities & Services

Getting to the Roots FOR PHARMA & LIFE SCIENCES WHITEPAPER

The 150+ Tomato Genome (re-)sequence Project; Lessons Learned and Potential

Microarray Analysis of Gene Expression in Huntington's Disease Peripheral Blood - a Platform Comparison. CodeLink compatible

Bi Lecture 3 Loss-of-function (Ch. 4A) Monday, April 8, 13

Gene-centered resources at NCBI

The Molecular Basis of Bacterial Innate Immunity in Arabidopsis thaliana

Genetically Modified Organisms

Review of previous work: Overview of previous work on DNA methylation and chromatin dynamics

CRISPR and protoplasts: Hand-inhand towards high-throughput gene silencing in cassava

Transcription:

Bioinformatics: Opportunities and Challenges for Data Recovery, Analysis and Sustainability The changing pace of biology in the genomics era The Systems explosion Role of the informatics specialist Challenges of data stability Jim Giovannoni, Boyce Thompson Institute for Plant Research, USDA-ARS, Cornell, Ithaca NY

Giovannoni Plant Cell, 2004 nor/nor Cnr/Cnr Nr/Nr rin/rin Developmental Regulation NOR NR, PG, E8, LeEXP1 LeMADS-RIN Ethylene Signal Transduction NR, LeETR4 CNR LeACS1A LeACS4 Autocatalytic synthesis Ripening: Gr Pathway Softening amplification LeCTR1 Flavor/Aroma LeEINs Nutrition Chlorophyll Loss PSY, E4, TBG4 Carotenoid Accumulation Pathogen Susceptibility r/r WT B/B LeACO1 LeACO3 LeACS2 LeACS6 PME, LBC (+) LeMADS(?) hp1/hp1 HP1, HP2 (LeDET1), L1,L2 PHYs fri/fri, tri/ri, B72/B72 Light hp2/hp2

Normal and ripening-inhibitor (rin) nearly isogenic lines

8 figures 502 data points - all presented - many as primary data

TOM1 cdna Array Total elements (spots): 13,440 Non-redundant tomato sequences: 8,700 Re-sequencing: >80% have been sequence verified from both 3 and 5 ends (France). Access: Arrays www.bti.cornell.edu/cgep/cgep.html Clones http://ted.bti.cornell.edu/order/index.html/ Sequences http://www.sgn.cornell.edu/ Data http://ted.bti.cornell.edu/

Experimental Design for Transcriptome Analysis Time Point: 1 2 3 4 5 67 8 9 10 Stage Name: 7DAP 17 DAP 27 DAP MG B-1 B B+1 B+5 B+10 B+15 Expression Ratio: 17 7 27 17

1 0-1 1 0-1 WT Nr 6 14 22 30 38 46 54 Days Post Anthesis 14 22 30 38 46 54 Transcript Accumulation (Log2 of Fold Difference)

1,296,000 gene expression data points Alba et al., Plant Cell, 2005

www.ted.bti.cornell.edu

Informatics Specialist: - capabilities in computer science, statistical analysis and biology - typically a biologist with training in computer science or a computer scientist with exposure to biology Roles: Experimental design Data analysis (pushing the frontiers) Database development and management

Digital expression analysis of grape and tomato ripening Table IV: Genes induced both by tomato ripening and grape ripening Tomato TC Grape TC Annotation TC125305/TC125359 TC4377 MADS box protein TC124244/TC124112 TC4730 bzip transcription factor TC124196/TC125034 TC9044 zinc finger transcription factor TC116030 TC4282 xyloglucan endo-1,4-beta-d-glucanase TC123883/TC124274 TC4394 alcohol dehydrogenase TC115998 TC4249 Pathogenesis-related protein TC125239 TC4910 calcineurin B like protein TC123982 TC9086 Calmodulin TC116962/TC116318/TC116319 TC4046/TC4559/TC4193/ TC4064/TC4181/TC4034 heat shock protein TC124903/TC126297/TC126413 TC4348 heat shock protein TC124001 TC9134/TC4581 heat shock protein TC115895 TC4236/TC4822 ubiquitin TC123771 TC4209 elongation factor 1-alpha TC124929 TC9284 heavy metal ion transport protein TC124731 TC9962 endoplasmatic reticulum retrieval protein Candidates for RNAi

Challenge: Integration of large scale biological monitoring activities Proteomics Metabalomics 4.0 pi 9.0 kda EST Database Expression Profiling 200 116 97 66 55 36.5 31 21.5 Lycopene concentration (mean) 140.00 120.00 100.00 80.00 60.00 ppm FW 40.00 20.00 0.00 WW 3-2 12-2 12-4 M82 1 9-2-5 2-4 6-1 12-4-1 8-2-1 7-1 1-4-18 4-3-2 9-3-1 2-5 3-5 2-1 10-3 4-1 7-2 9-3 5-1 9-3-2 12-1-1 2-6 6-4 9-2-6 5-3 8-3-1 3-4 1-4 2-1-1 1-1 11-4 11-4-1 5-5 1-3 12-1 10-1 4-1-1 6-2 p < 0.05 IL line GENES mrna Protein Phenotype Informatics KNOWLEDGE Modification

WHY SOLANACEAE? Members of the family share a common genome that led to many diverse outcomes that benefit society Tomato Eggplant Chili pepper Potato Petunia Pachytene chromosomes

Microsynteny amongst solanaceous genomes Wang et al unpublished results pepper 122 kb 2700 Mb 1.85x potato 135 kb 800 Mb 1.42x tomato 105 kb 7 10 15 950 Mb 1.00x eggplant 106 kb 3 8 956 Mb 1.37x 0 50kb 100kb 130kb confirmed by Arabidopsis and EST hits confirmed only Arabidopsis hits confirmed only by EST hits transposon elements genes without Arabidopsis and EST hits

IPP GGPS PSY (r) PDS PDS ZDS + GGPP Phytoene Phytofluene ζ-caortene neurosporene DMAPP CRISTO ZDS lycopene LYCE LYCB (B) α- carotene ß-carotene r/r R/R; b/b B/B Phytoene synthase normal control lycopene-ß-cyclase (knock-out) (over-expresser) deficient in lycopene accumulates lycopene accumulates ß-carotene and ß-carotene and ß-carotene at the expense of lycopene

HPLC isolation of lycopene and β-carotene from IL ripe fruit A 450 β-c 10 15 20 Retention time/min parent IL 1-1 parent IL 6-2 parent IL 8-2 parent IL 4-2.1 cdna Array comparisons of IL and parental ripe fruit cdnas clustered based on expression patterns low high

Multiple lines may have similar phenotypic effects i.e increased B-carotene in comparison to parent IL lines A,B,C A 450 β-c Parental line A 450 β-c 10 15 20 10 15 20 Retention time/min Retention time/min These lines may also have similar expression patterns with respect to a subset of cdnas Candidate cdnas for carotenoid regulation will come from these intersections

1-1-2, 1-1-3, 1-3, 1-4, 1-4-18, 2-1-1, 2-3, 2-5, 2-6, 2-6-5, 3-1, 3-2, 3-4, 3-5, 4-1, 5-1, 5-2, 5-4, 6-2, 6-3, 6-4, 7-5, 8-2-1, 8-3, 8-3-1, 9-1, 9-1-2, 9-2-5, 9-3, 9-3-1, 9-3-2, 10-1, 10-2, 10-3, 11-1, 11-2, 11-3, 11-4, 12-1, 12-1-1, 12-3, 12-3-1, 12-4-1 1-1-2, 1-2, 1-3, 1-4, 2-1-1, 2-2, 2-4, 2-5, 2-6, 2-6-5, 3-1, 3-2, 3-4, 3-5, 4-1, 5-4, 6-3, 7-1, 7-5, 8-3-1, 9-1-2, 9-2-5, 9-3, 9-3-1, 9-3-2, 10-2-2, 11-1, 11-2, 12-1, 12-1-1, 12-2, 12-3, 12-3-1, Geranylgeranyl diphosphate(x2) Psy 1 Phytoene Pds Phytofluene Pds ζ-carotene CrtISO KEY: Negative extreme <75% control >150% control >200% control Positive extreme 1-1-3, 1-3, 1-4, 2-1-1, 2-2, 2-4, 2-6, 2-6-5, 3-2, 4-1, 4-1-1, 4-2, 6-3 6-4, 7-1, 7-4-1, 8-1, 9-1, 9-2, 9-2-6, 9-3-1, 10-1, 10-2-2, 11-1, 11-4-1, 12-1-1, 12-2, 12-3, 12-3-1, 12-4-1 CHY-e CHY-b Lutein CrtL-e δ-carotene CrtL-b α-carotene Zds Lycopene CrtL-b γ-carotene 1-1, 1-1-2, 1-1-3, 1-2, 2-1, 2-1-1, 2-2, 2-6-5, 3-5, 4-1, 4-1-1, 4-2, 4-3, 4-3-2, 5-4, 5-5, 6-1, 6-2, 6-4, 8-1, 8-1-1, 8-2, 8-2-1, 8-3-1, 9-1, 9-1-2, 9-2, 9-2-6, 10-2-2, 10-3, 11-4, 11-4-1, 12-1, 12-2, 12-3-1 2-2, 2-4, 2-6, 2-6-5, 3-2, 3-4, 5-1, 6-3, 9-2-5, 9-3-2, 10-2-2, 11-1, 11-2, 12-1, 12-1-1, 12-2 1-1, 1-1-2, 1-1-3, 1-2, 1-3, 1-4, 2-1-1, 2-3, 2-5, 2-6, 2-6-5, 3-1, 3-2, 3-4, 3-5, 4-1-1, 4-2, 4-3, 4-3-2, 4-4, 5-1, 5-2, 5-4, 5-5, 6-2, 6-3, 7-2, 7-4, 7-4-1, 7-5-5, 8-2-1, 8-3-1, 9-2, 9-3. 10-3, 12-1, 12-2, 12-3-1, 12-4-1, CrtL-b Fox, McQuinn and Giovannoni, unpublished 1-1-2, 1-1-3, 2-1, 2-1-1, 2-2, 2-5, 3-2, 4-2, 4-3, 4-3-2, 5-1, 6-1, 6-2, 6-3, 8-1, 9-1-2, 9-2-6, 10-1, 10-2, 10-2-2, 12-1, 12-1-1, 12-2, 12-4-1 β-carotene Cunningham et al. (2002) Hirschberg et al. (1997)

http://www.ncbi.nlm.nih.gov/

http://www.sgn.cornell.edu/

http://www.gramene.org/

http://www.arabidopsis.org/home.html

www.tigr.org

Systems approaches offer enormous opportunities to understand biology - medicine - food security and sustainability - environmental protection - renewable energy resources - basis of the future bio-based economy Informatics, informatics specialists and data managers and curators hold the key to the kingdom for the realization of these promises. Challeneges - long-term data maintenance and availability - standard methods of data presentation - standard tools for basic data analysis - uniform standards of data quality