Genomic region (ENCODE) Gene definitions

Similar documents
Genscan. The Genscan HMM model Training Genscan Validating Genscan. (c) Devika Subramanian,

From DNA to Protein: Genotype to Phenotype

Outline. Gene Finding Questions. Recap: Prokaryotic gene finding Eukaryotic gene finding The human gene complement Regulation

7.2 Protein Synthesis. From DNA to Protein Animation

PROTEIN SYNTHESIS Flow of Genetic Information The flow of genetic information can be symbolized as: DNA RNA Protein

Ch 10.4 Protein Synthesis

Gene Identification in silico

TRANSCRIPTION AND TRANSLATION

Gene Expression Transcription/Translation Protein Synthesis

Analysis of Biological Sequences SPH

MATH 5610, Computational Biology

Fig Ch 17: From Gene to Protein

Protein Synthesis. OpenStax College

PrimePCR Assay Validation Report

O C. 5 th C. 3 rd C. the national health museum

Computational gene finding. Devika Subramanian Comp 470

Protein Synthesis

Themes: RNA and RNA Processing. Messenger RNA (mrna) What is a gene? RNA is very versatile! RNA-RNA interactions are very important!

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

Bio11 Announcements. Ch 21: DNA Biology and Technology. DNA Functions. DNA and RNA Structure. How do DNA and RNA differ? What are genes?

measuring gene expression December 5, 2017

Eukaryotic Gene Structure

Student Exploration: RNA and Protein Synthesis Due Wednesday 11/27/13

Molecular Genetics. Before You Read. Read to Learn

Proteomics. Manickam Sugumaran. Department of Biology University of Massachusetts Boston, MA 02125

Chapter 10: Gene Expression and Regulation

Lecture for Wednesday. Dr. Prince BIOL 1408

PrimePCR Assay Validation Report

DNA Function: Information Transmission

Chapter 8: DNA and RNA

What happens after DNA Replication??? Transcription, translation, gene expression/protein synthesis!!!!

Chapter 10 - Molecular Biology of the Gene

RNA and Protein Synthesis

2. From the first paragraph in this section, find three ways in which RNA differs from DNA.

PrimePCR Assay Validation Report

Genome Sequence Assembly

Protein Synthesis & Gene Expression

PrimePCR Assay Validation Report

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

BIO 311C Spring Lecture 36 Wednesday 28 Apr.

Chapter 14 Active Reading Guide From Gene to Protein

Chapter 11. Gene Expression and Regulation. Lectures by Gregory Ahearn. University of North Florida. Copyright 2009 Pearson Education, Inc..

Neurospora mutants. Beadle & Tatum: Neurospora molds. Mutant A: Mutant B: HOW? Neurospora mutants

DNA Replication and Repair

Chimp Sequence Annotation: Region 2_3

Chapter 12 Packet DNA 1. What did Griffith conclude from his experiment? 2. Describe the process of transformation.

PrimePCR Assay Validation Report

Problem Set 8. Answer Key

PROTEIN SYNTHESIS. copyright cmassengale

Unit VII DNA to RNA to protein The Central Dogma

Transcription in Eukaryotes

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang

Solutions to Quiz II

Independent Study Guide The Blueprint of Life, from DNA to Protein (Chapter 7)

Chromosomes. Chromosomes. Genes. Strands of DNA that contain all of the genes an organism needs to survive and reproduce

Chapter 14: Gene Expression: From Gene to Protein

Algorithms in Bioinformatics

MCDB 1041 Class 21 Splicing and Gene Expression

Gene-centered resources at NCBI

Protein Synthesis. DNA to RNA to Protein

TIGR THE INSTITUTE FOR GENOMIC RESEARCH

BEADLE & TATUM EXPERIMENT

Wednesday, November 22, 17. Exons and Introns

COMPUTER RESOURCES II:

C. Incorrect! Threonine is an amino acid, not a nucleotide base.

Chapter 17: From Gene to Protein

RNA : functional role

produces an RNA copy of the coding region of a gene

Gene Expression: Transcription

Transcription Eukaryotic Cells

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genomics and Gene Recognition Genes and Blue Genes


Outline. Introduction to ab initio and evidence-based gene finding. Prokaryotic gene predictions

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

PROTEIN SYNTHESIS. copyright cmassengale

Protein Synthesis Making Proteins

NUCLEIC ACIDS AND PROTEIN SYNTHESIS

Do you remember. What is a gene? What is RNA? How does it differ from DNA? What is protein?

DNA- THE MOLECULE OF LIFE. Link

DNA- THE MOLECULE OF LIFE

DNA Structure and Protein synthesis

Answers to Module 1. An obligate aerobe is an organism that has an absolute requirement of oxygen for growth.

Activity A: Build a DNA molecule

SSA Signal Search Analysis II

Transcription. DNA to RNA

RNA and PROTEIN SYNTHESIS. Chapter 13

Chapter 13. From DNA to Protein

Gene Expression. Student:

Chapter 12. DNA TRANSCRIPTION and TRANSLATION

1. The diagram below shows an error in the transcription of a DNA template to messenger RNA (mrna).

Nucleic acids deoxyribonucleic acid (DNA) ribonucleic acid (RNA) nucleotide

The Genetic Code and Transcription. Chapter 12 Honors Genetics Ms. Susan Chabot

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Microbiology: The Blueprint of Life, from DNA to protein

Interpretation of sequence results

Methods and Algorithms for Gene Prediction

Annotating Fosmid 14p24 of D. Virilis chromosome 4

DNA - DEOXYRIBONUCLEIC ACID

Transcription:

DNA From genes to proteins Bioinformatics Methods RNA PROMOTER ELEMENTS TRANSCRIPTION Iosif Vaisman mrna SPLICE SITES SPLICING Email: ivaisman@gmu.edu START CODON STOP CODON TRANSLATION PROTEIN From genes to proteins Encyclopedia of DNA Elements (ENCODE) Darryl Leja (NHGRI), Ian Dunham (EBI), http://genome.ucsc.edu/encode/ Genomic region (ENCODE) Gene definitions Definition 1910s: Gene as a distinct locus Definition 1940s: Gene as a blueprint for a protein Definition 1950s: Gene as a physical molecule Definition 1960s: Gene as transcribed code Definition 1970s 1980s: Gene as open reading frame (ORF) sequence pattern Definition 1990s 2000s: Annotated genomic entity, enumerated in the databanks (current view, pre-encode) A current computational metaphor: Genes as subroutines in the genomic operating system

Chromosome 19 gene map Gene concept problems Gene concept problems ENCODE definition of gene Where the genes are unlikely to be located? How do transcription factors know where to bind a region of DNA? Where are the transcription, splicing, and translation start and stop signals? What does coding region do (and non-coding regions do not)? Can we learn from examples? Does this sequence look familiar?

LAGAN algorithm EcoParse Hidden Markov Model for individual codon Complete HMM Spliced Alignment (Procrustes) New genomic sequence Selection of candidate exons AUG --- GU initial exons AG --- GU internal exons AG --- UAA or UAG or UGA terminal exons Filtration (based on the codon usge statistics) Construction of all possible chains of candidate exons Finding a chain with the maximum global similarity to the target protein Spliced Alignment (Procrustes) Predicted Exon Assembly (Procrustes)

PREDICTION nc c PCR Primers Prediction (GenePrimer) GRAIL gene identification program POSSIBLE EXONS REFINED EXON POSITIONS FINAL EXON CANDIDATES Exon 1085..1182 (98) hit using first 2 primers Exon 1628..1676 (49) missed Exon 1900..2001 (102) hit using first 8 primers Exon 2110..2184 (75) missed Exon 2516..2722 (207) hit using first 4 primers Exon 3385..3472 (88) missed Exon 3546..3746 (201) hit using first primer... Suboptimal Solutions for the Human Growth Hormone Gene (GeneParser) Transposons Measures of Prediction Accuracy Nucleotide Level Measures of Prediction Accuracy Exon Level TN FN TP FP TN FN TP FN TN WRONG EXON CORRECT EXON MISSING EXON REALITY REALITY PREDICTION PREDICTION REALITY c nc TP FP FN TN Sensitivity S n = TP / (TP + FN) Specificity S p = TP / (TP + FP) Sensitivity Specificity S n = S p = number of correct exons number of actual exons number of correct exons number of predicted exons

GeneMark Accuracy Evaluation Gene Annotation Errors in genome annotation Brenner, 1999 Goals of structural genomics Sequence-structure correlations Provision of enough structural templates to facilitate homology modeling of most proteins Structures of all proteins in a complete proteome Structural elucidation of a complete biological pathway Structural elucidation of a complete disease Phil Bourne, 2005 Redfern and Orengo, 2005

Model structure coverage in sequence space Structural Genomics Project Organize known protein sequences into families. Select family representatives as targets. Solve the 3D structure of targets by X-ray crystallography or NMR spectroscopy. Build models for other proteins by homology to solved 3D structures. Adopted from Vitkup et al., 2001 Target selection Coverage of the Human Genome By Structure PDB Structural Genomics Targets a) realm of interest b) family exclusion - impossible c) family exclusion - known d) prioritization e) selection f) analysis and interpretation S.Brenner, 2000 Ensembl Human Genome Annotation Superfamily Xie and Bourne, 2005