The genetic code of gene regulatory elements

Similar documents
Predicting Tissue-Specific Enhancers in the Human Genome

Year III Pharm.D Dr. V. Chitra

Mouse Embryo : 10mM Tris-HCl (ph8.0), 1mM EDTA. : One year from date of receipt under proper storage condition.

Table S1. Primers used in the study

Des cellules-souches dans le poumon : pourquoi faire?

Molecular Medicine. Stem cell therapy Gene therapy. Immunotherapy Other therapies Vaccines. Medical genomics

Beyond Genomics: Detecting Codes and Signals in the Cellular Transcriptome

Lesson 7A Specialized Cells, Stem Cells & Cellular Differentiation

Isolation and Characterization of the Porcine Tissue Kallikrein Gene Family

Supplemental Material TARGETING THE HUMAN MUC1-C ONCOPROTEIN WITH AN ANTIBODY-DRUG CONJUGATE. Supplemental Tables S1 S3. and

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Personal and population genomics of human regulatory variation

Chapter 8 Healthcare Biotechnology

NPTEL Biotechnology Tissue Engineering. Stem cells

CHAPTER 21 LECTURE SLIDES

Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

From genome-wide association studies to disease relationships. Liqing Zhang Department of Computer Science Virginia Tech

ANNOUNCEMENTS. HW2 is due Thursday 2/4 by 12:00 pm. Office hours: Monday 12:50 1:20 (ECCH 134)

Gene Expression Evolves Faster in Narrowly Than in Broadly Expressed Mammalian Genes

Following text taken from Suresh Kumar. Bioinformatics Web - Comprehensive educational resource on Bioinformatics. 6th May.2005

Stem Cells. Part 1: What is a Stem Cell? STO Stem cells are unspecialized. What does this mean?

The Human Genome Project

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Computational methods in bioinformatics: Lecture 1

Stem Cells & Neurological Disorders. Said Ismail Faculty of Medicine University of Jordan

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

This place covers: Methods or systems for genetic or protein-related data processing in computational molecular biology.

CLASSIC BREEDING. Artificially select on already-present variation in the trait of interest

Characteristics. Capable of dividing and renewing themselves for long periods of time (proliferation and renewal)

In silico identification of transcriptional regulatory regions

Higher Human Biology Unit 1: Human Cells Pupils Learning Outcomes

Genomes: What we know and what we don t know

Transgenesis. Stable integration of foreign DNA into host genome Foreign DNA is passed to progeny germline transmission

Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh

Chapter 8 Cell Diversity

Guide to Clonetics and Poietics Cell Types

Stem cells in Development

Genetics Lecture 19 Stem Cells. Stem Cells 4/10/2012

C57BL/6N Female. C57BL/6N Male. Prl2 WT Allele. Prl2 Targeted Allele **** **** WT Het KO PRL bp. 230 bp CNX. C57BL/6N Male 30.

Cufflinks Scripture Both. GENCODE/ UCSC/ RefSeq 0.18% 0.92% Scripture

Nature Genetics: doi: /ng.3556 INTEGRATED SUPPLEMENTARY FIGURE TEMPLATE. Supplementary Figure 1

Daily Agenda. Make Checklist: Think Time Replication, Transcription, and Translation Quiz Mutation Notes Download Gene Screen for ipad

Comparative Genomics. Page 1. REMINDER: BMI 214 Industry Night. We ve already done some comparative genomics. Loose Definition. Human vs.

SUPPLEMENTARY INFORMATION

Chromosomal Mutations. 2. Gene Mutations

Cloning genes into animals. Transgenic animal carries foreign gene inserted into its genome.

Introduction to BIOINFORMATICS

Introduction to Transcription Factor Binding Sites (TFBS) Cells control the expression of genes using Transcription Factors.

Genomic resources for canine cancer research. Jessica Alföldi (for Kerstin Lindblad-Toh) Broad Institute of MIT and Harvard

Supporting Information

MicroRNAs Modulate Hematopoietic Lineage Differentiation

PV92 PCR Bio Informatics

Complete draft sequence 2001

Stem Cell Research 101

length (amino acids) 5597 Low, GNF Expression Atlas 2 Data from U133A and GNF1H Chips, (Merla et al 2002 Table 3 p. 434)

Meta-analysis discovery of. tissue-specific DNA sequence motifs. from mammalian gene expression data

Stem cells in Development

Il trascrittoma dei mammiferi

Unit 2: Biological basis of life, heredity, and genetics

MCDB 1041 Class 21 Splicing and Gene Expression

Genetic Dark Matter: Functional Medicine Decodes the Force Within

History of the CFTR chase

Interpreting Genome Data for Personalised Medicine. Professor Dame Janet Thornton EMBL-EBI

hpsc Growth Medium DXF Dr. Lorna Whyte

Genetic Basis of Development & Biotechnologies

Review of Transposable elements have rewired the core regulatory network of human embryonic stem cells

Just the Facts: A Basic Introduction to the Science Underlying NCBI Resources

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Genomes contain all of the information needed for an organism to grow and survive.

CHAPTER 21 GENOMES AND THEIR EVOLUTION

Introduction to human genomics and genome informatics

Many transcription factors! recognize DNA shape

Unit 1 Human cells. 1. Division and differentiation in human cells

ZBED6 the birth of a transcription factor in placental mammals. Wild Boar X domestic pig intercross initiated 1989

The Human Genome and its upcoming Dynamics

ACGS: Standardisation of variant interpretation and reports

Discovering gene regulatory control using ChIP-chip and ChIP-seq. Part 1. An introduction to gene regulatory control, concepts and methodologies

BLASTing through the kingdom of life

SENIOR BIOLOGY. Blueprint of life and Genetics: the Code Broken? INTRODUCTORY NOTES NAME SCHOOL / ORGANISATION DATE. Bay 12, 1417.

Single-Cell Genomics Deep insights into complex biology

Introduction to the UCSC genome browser

Recent Advances Towards An Intraspecific Theory of Human Variation for Digital Models

Radiation Effects in Tissues & Cells

Tumor Growth Suppression Through the Activation of p21, a Cyclin-Dependent Kinase Inhibitor

Bi8 Lecture 19. Review and Practice Questions March 8th 2016

SUPPLEMENTARY INFORMATION

What is RNA? Another type of nucleic acid A working copy of DNA Does not matter if it is damaged or destroyed

Cell membrane repair in acute lung injury: from bench to bedside. Disclosure. Cell membrane repair is an elemental physiological process

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Discovering gene regulatory control using ChIP-chip and ChIP-seq. An introduction to gene regulatory control, concepts and methodologies

BIOINFORMATICS FOR DUMMIES MB&C2017 WORKSHOP

Thyroid hormone transcriptome and cerebellum development.

Machine Learning. HMM applications in computational biology

Genetics - Problem Drill 19: Dissection of Gene Function: Mutational Analysis of Model Organisms

Introduction to genome biology

What Can the Epigenome Teach Us About Cellular States and Diseases?

Biologia Cellulare Molecolare Avanzata 2.2. Functional & regulatory genomics module

The University of California, Santa Cruz (UCSC) Genome Browser

Genetics Journal Club YoSon Park Feb 11, Announcement: Marco Trizzino s seminar March 25 th, 2016!

Transcription:

The genetic code of gene regulatory elements Ivan Ovcharenko Computational Biology Branch National Center for Biotechnology Information National Institutes of Health October 23, 2008

Outline Gene deserts and distant gene regulation Genetic encryption of gene regulation Heart regulatory code Regulation of regulators: how transcription factors regulate themselves

The Genome Sequence: The Ultimate Code of Life 3 billion letters ~ 45% is junk (repetitive elements) ~ 3% is coding for proteins gene regulatory elements (REs) reside SOMEWHERE in the rest ~50%

Hirschprung disease is associated with a noncoding SNP RET

Comparative Sequence Analysis Biologically functional regions in the genome tend to stay conserved throughout the evolution. Therefore, by aligning homologous sequences from different, but related species we can identify Evolutionary Conserved Regions (ECRs) with a putative functional importance 1880 th 1920 th 1950 th 2000 th

In vivo validation of a regulatory element IL4 E1 IL13 RAD50 IL5 E 1 Pg/ml 0 wild type E1 genome deletion wt wt wt deletion deletion 0 0 IL4 IL13 IL5 deletion Regulatory elements (REs) orchestrate temporal and spatial expression of genes Genetic deletion of E1 element decreased the expression of IL4, IL13, and IL5 3-fold REs were identified for a handful of genes in the human genome Knock out of a single candidate RE can take up to 2 years Loots et al., Science, 2001

Сomparative genomics to predict regulatory elements E1 Functionally important elements in genomes mutate at slower rates than the neutrally evolving background 1% of sequence is conserved between humans and fish 75% of genes are conserved between humans and fish RE E1 is highly conserved in human and mouse genomes. Comparative genomics can be utilized to prioritize functional elements

Gene Deserts in the Human Genome Gene deserts = 25% of the human genome sequence

Human chromosome 13 Gene deserts ~500 gene deserts in the human genome ~50% of the HSA13 consists of gene deserts Gene deserts are NOT enriched in repetitive elements

DACH gene deserts on chromosome 13 1.3Mb 430 kb 0.9Mb Over 1,000 human/mouse ncecrs

Phylogenetic conservation of DACH gene deserts Nobrega et al., Science, 2004 ncecr Prom Reporter gene

Dichotomy in the evolutionary conservation of gene deserts ~200 stable gene deserts - DACH - OTX2 - SOX2 -vs- - Deletion does not lead to a phenotype ~300 variable gene deserts

Function of genes flanking gene deserts stable gene deserts - transcription - regulation of transcription - regulation of metabolism - development etc. variable gene deserts

Distant gene regulation 1. Do stable gene deserts harbor distant regulatory elements? gene RE 2. Does the distance between a RE and a gene matter? gene RE gene RE gene OR RE

Chromosomal stability of gene deserts gene RE

Distant REs are distance-independent gene RE gene RE gene OR RE

Gene deserts :: Summary 25% of the human genome consists of gene deserts There are 2 classes of gene deserts stable and variable gene deserts Stable gene deserts are evolutionarily protected from chromosome rearrangements indicating presence of distant regulatory elements Experimental validation of deeply conserved sequences in some stable gene deserts confirms their enhancer activity Gene regulation based on distant regulatory elements does not probably depend on the distance between a regulatory element and the gene it regulates

Outline Gene deserts and distant gene regulation Genetic encryption of gene regulation Heart regulatory code Regulation of regulators: how transcription factors regulate themselves

Patterns of transcription factor binding sites define biological functions of REs Transcription factor (TF) proteins bind to very short (6 to 10 nucleotides) binding sites (TFBS) Combinatorial binding of multiple TFs to a RE defines a specific pattern of gene expression Correlating patterns of TFBS in REs with biological function will decode gene regulation Protein A Protein B Protein C DNA TFBS TFBS TFBS GENE actgactgaaaactgatattg CTGATATTGacagt acagtttgttgttgttaa REGULATORY ELEMENT (RE)

Limbs regulatory module A Reporter Gene CNS regulatory module B Reporter Gene

Computational identification of TFBS Protein A Protein B Protein C GENE TFBS TFBS TFBS actgact CTGACTgaaaa gaaaactgatattg CTGATATTGacagt acagtttgttgttg TTGTTGTTGttaa REGULATORY ELEMENT (RE) -- Transcription factor bindings sites are very short (~ 6-10 bp) -- Computational predictions of transcription factor binding sites are overwhelmed with false positives

Deciphering the genetic code of enhancers ncecr ncecr ncecr GENE ncecr ncecr

Enhancer Identification (EI) method GNF Expression Atlas 2 79 human tissues Tissue A 300 highly expressed genes Selecting candidate enhancers based on noncoding conservation Mapping conserved TFBS in candidate enhancers 15k human genes 15K genes 3,000 lowly expressed genes X Legend Highly expressed gene Lowly expressed gene Promoter 3+ top noncoding ECR No prediction TFBS TFBS signatures in candidate enhancers defining tissue-specific gene expression Predicting tissue-specific enhancers X X -vs- X X X TF 1 TF 2 TF N Assigning tissue weights to TFs 1.4 tissue 1 0.5 tissue 2-1.2 tissue M

Tissue-specificity noncoding signatures 100% 90% Overexpressed Neutral 80% 70% 60% 50% 40% 30% 20% 10% trachea trigeminal ganglion skin 0% thymus pancreas adrenal gland bone marrow uterus pancreatic islets tongue fetal liver lung skeletal muscle prostate heart adipocyte kidney thyroid cardiac myocytes liver bronchial epithelial cells BM-CD71+ early erythroid whole blood BM-CD34+ testis BM-CD33+ myeloid PB-BDCA4+ dentritic cells smooth muscle placenta PB-CD19+ Bcells PB-CD8+ Tcells spinal cord whole brain fetal brain

Known skeletal muscle enhancer

DACH brain/cns enhancers

EI Performance Precision (the fraction of predicted tissue-specific enhancers that are correct): 79 human tissues liver heart

Associating TFs with tissue specificities

Database of enhancers in the human genome ~4k heart candidate enhancers EI Optimization ~8k cancer candidate enhancers http://www.dcode.org/ei

Gene regulatory code :: Summary Combination of comparative genomics, gene expression data, and TFBS clustering provides a tool to define the sequence code of tissue-specific enhancers Over 7,000 candidate tissue-specific enhancers had been predicted in the human genome Enhancers for several tissues (including heart and liver) were predicted with high precision Prediction of TFs associated with the regulation of tissue-specific expression provides the means to move from microarray expression data directly to the identification of active TFs