The Human Genome. The raw data. The repeat content. Composition of the human genome bases. A s T s C s and G s and N s.

Similar documents
The University of California, Santa Cruz (UCSC) Genome Browser

The Diploid Genome Sequence of an Individual Human

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

CHAPTER 21 GENOMES AND THEIR EVOLUTION

Introduction to Plant Genomics and Online Resources. Manish Raizada University of Guelph

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

AP Biology. The BIG Questions. Chapter 19. Prokaryote vs. eukaryote genome. Prokaryote vs. eukaryote genome. Why turn genes on & off?

user s guide Question 1

The Ensembl Database. Dott.ssa Inga Prokopenko. Corso di Genomica

CHAPTER 21 LECTURE SLIDES

Chapter 20: The human genome

Molecular Genetics of Disease and the Human Genome Project

Haplotypes, linkage disequilibrium, and the HapMap

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

Genomes summary. Bacterial genome sizes

The Human Genome Project

ab initio and Evidence-Based Gene Finding

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Genome annotation & EST

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the

Chapter 19. Control of Eukaryotic Genome. AP Biology

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

CHAPTERS 16 & 17: DNA Technology

The Human Genome and its upcoming Dynamics

Genotyping Technology How to Analyze Your Own Genome Fall 2013

Crash-course in genomics

7.03, 2005, Lecture 20 EUKARYOTIC GENES AND GENOMES I

Genome research in eukaryotes

The HapMap Project and Haploview

How does the human genome stack up? Genomic Size. Genome Size. Number of Genes. Eukaryotic genomes are generally larger.

De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

Whole genome sequencing in the UK Biobank

3I03 - Eukaryotic Genetics Repetitive DNA

Analysing Alu inserts detected from high-throughput sequencing data

GENES AND CHROMOSOMES II

UCSC Genome Browser. Introduction to ab initio and evidence-based gene finding

Linking Genetic Variation to Important Phenotypes

Enzyme that uses RNA as a template to synthesize a complementary DNA

Training materials.

Genome Sequencing-- Strategies

Analysis of structural variation. Alistair Ward USTAR Center for Genetic Discovery University of Utah

user s guide Question 3

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm

Applied Bioinformatics

Resources at HapMap.Org

Lecture 20: Drosophila melanogaster

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Ensembl workshop. Thomas Randall, PhD bioinformatics.unc.edu. handouts, papers, datasets

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

Why learn linkage analysis?

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Aaditya Khatri. Abstract

Authors: Vivek Sharma and Ram Kunwar

Genome Projects. Part III. Assembly and sequencing of human genomes

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Introduction to human genomics and genome informatics

Technologies, resources and tools for the exploitation of the sheep and goat genomes.

Relationship of Gene s Types and Introns

Structure, Measurement & Analysis of Genetic Variation

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

user s guide Question 3

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

Motivation From Protein to Gene

Chimp Sequence Annotation: Region 2_3

Array-Ready Oligo Set for the Rat Genome Version 3.0

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Genomic resources. for non-model systems

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype?

Greene 1. Finishing of DEUG The entire genome of Drosophila eugracilis has recently been sequenced using Roche

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang

Guided tour to Ensembl

Course Overview. Objectives

Genomes and Their Evolution

The Human Genome Project has always been something of a misnomer, implying the existence of a single human genome

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Human genetic variation

INTRODUCTION TO MOLECULAR GENETICS. Andrew McQuillin Molecular Psychiatry Laboratory UCL Division of Psychiatry 22 Sept 2017

Computational Biology I LSM5191 (2003/4)

Analysis of structural variation. Alistair Ward - Boston College

Recombination, and haplotype structure

Pharmacogenetics: A SNPshot of the Future. Ani Khondkaryan Genomics, Bioinformatics, and Medicine Spring 2001

Chapter 2: Access to Information

DNA Evolution of knowledge about gene. Contains information about RNAs and proteins. Polynucleotide chains; Double stranded molecule;

Practical Of Genetics

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Introduction to Algorithms in Computational Biology Lecture 1

Human Chromosomes Section 14.1

Axiom mydesign Custom Array design guide for human genotyping applications

Lac Operon contains three structural genes and is controlled by the lac repressor: (1) LacY protein transports lactose into the cell.

BIOL 1030 Introduction to Biology: Organismal Biology. Fall 2009 Sections B & D. Steve Thompson:

Happy Monday! Have out: 15.1 Notes (due today) Pen or pencil. Upcoming: 15.1 Quiz on block day 15.2 Notes due Friday (2/1)

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype?

Overview of Human Genetics

Biol 478/595 Intro to Bioinformatics

Lecture 2: Biology Basics Continued

Transcription:

3000000000 bases The Human Genome The raw data GATCTGATAAGTCCCAGGACTTCAGAAGagctgtgagaccttggccaagt cacttcctccttcaggaacattgcagtgggcctaagtgcctcctctcggg ACTGGTATGGGGACGGTCATGCAATCTGGACAACATTCACCTTTAAAAGT TTATTGATCTTTTGTGACATGCACGTGGGTTCCCAGTAGCAAGAAACTAA AGGGTCGCAGGCCGGTTTCTGCTAATTTCTTTAATTCCAAGACAGTCTCA AATATTTTCTTATTAACTTCCTGGAGGGAGGCTTATCATTCTCTCTTTTG GATGATTCTAAGTACCAGCTAAAATACAGCTATCATTCATTTTCCTTGAT TTGGGAGCCTAATTTCTTTAATTTAGTATGCAAGAAAACCAATTTGGAAA TATCAACTGTTTTGGAAACCTTAGACCTAGGTCATCCTTAGTAAGATctt cccatttatataaatacttgcaagtagtagtgccataattaccaaacata aagccaactgagatgcccaaagggggccactctccttgcttttcctcctt tttagaggatttatttcccatttttcttaaaaaggaagaacaaactgtgc cctagggtttactgtgtcagaacagagtgtgccgattgtggtcaggactc catagcatttcaccattgagttatttccgcccccttacgtgtctctcttc agcggtctattatctccaagagggcataaaacactgagtaaacagctctt ttatatgtgtttcctggatgagccttcttttaattaattttgttaaggga tttcctctagggccactgcacgtcatggggagtcacccccagacactccc aattggccccttgtcacccaggggcacatttcagctatttgtaaaacctg aaatcactagaaaggaatgtctagtgacttgtgggggccaaggcccttgt tatggggatgaaggctcttaggtggtagccctccaagagaatagatggtg Aatgtctcttttcagacattaaaggtgtcagactctcagttaatctctcc tagatccaggaaaggcctagaaaaggaaggcctgactgcattaatggaga ttctctccatgtgcaaaatttcctccacaaaagaaatccttgcagggcca ttttaatgtgttggccctgtgacagccatttcaaaatatgtcaaaaaata tattttggagtaaaatactttcattttccttcagagtctgctgtcgtatg atgccataccagagtcaggttggaaagtaagccacattatacagcgttaa cctaaaaaaacaaaaaactgtctaacaagattttatggtttatagagcat gattccccggacacattagatagaaatctgggcaagagaagaaaaaaagg tcagagtttaatcctcattcctaagttatgtaaaccaaaaataaaattct gaagatgtcctgatcatctgaatggacccttcctctggaccagggcattc caaagttaacctgaaaattggtttgggccatgatgggaagggaggtttgg atatgcctcattatgccctcttccctttcagaattcaggaaaagccaacc agcattaacatcaacacagattttcagatcttaggtttctttccgatcta ttctctctgaaccctgctacctggaggcttcatctgcataataaaacttt agtctccacaaccccttatcttaccccagacattcctttctattgataat aactctttcaaccaattgccaatcagggtatgtttaaatctacctatgac ctggaagcccccactttgcaccctgagatcaaaccagtgcaaatcttata tgtattgatttgtcaatgaaaacagtcaaagccagtcaggcacagtggct catgcctgtaatcccagcactttgggaggctgaggcgggtagatcacctg aggtcaggagttcgacaccagcctggccaacatggtgaaaccccgtccct actaaaatacaaaaattagcccagcttggtggtgggcacctgtaatctta gctactgcagagactgaggcaggagaatcgcttgaacccaggaggtggag gttgcagtgacctgagattttgccattgcactccagcctgggcaacagag caagactctatctcaaaaaacaaacaaacaaacaaacaaacaaacaaact gtcaaaatctgtacagtatgtgaagagatttgttctgaaccaaatatgaa tgaccatggtccatgacacagccctcagaagaccctgagaacatgtgccc aaggtggtcacagtgcatcttagttttgtacattttagggagatatgaga cttcagtcaaatacatttttaaaaaatacattggttttgtccagaaagcc agaaccactcaaagcaggggtttccaggttataagtagatttaaaatttt tctgattgacaattggttgaaagagttgtcaatagaaaggaatgtctgca ttgtgacaagaggttgtggagaccaagtttctgtcatgcagatgaagcct tcaggtagcaggcttccaagataacaggttgtaaatagttcttatcagac ttaagttctgtggagacgtaaaatgaggcatatctgacctccacttccaa aaacatctgagacaggtctcagttaattaagaaagtttgttctgcctagt ttaaggacatgcccatgacactgcctcaggaggtcctgacagcatgtgcc caaggtggtcaggatacagcttgcttctatatattttagggagaaaatac atcagcctgtaaacaaaaaattaaattctaaggtccctgaaccatctgaa tgggctttcttctaggccagggcactctaaaattgaagaacctgaacatt cctttctattgataatactttcagccagttgagcccattcagaccacagc AAGGTGCCAGGCCAGGCAAGGGCTGACTTGAGATACCTGCCAGATGAGTC ACTGGCAAAAGGTGCTGCTCCCTGGTGAGGGAGAAACACCAGGGGCTGGG AGAGGCCCAGAAGGCTCTGAAGGAGTTTTGGTTTGGCTGGCCATGTGTGC AATTAGCGTGATGAGCTCTGACATGGCCTTGCATGGACGGATTGGGCAGG A s T s C s and G s and N s Composition of the human genome The repeat content Jumping -genes Nearly half the genome is repeats Only approximately 1.5% is known coding genes Unknown functional fraction?! 1. Transposition-derived repeats 2. Inactive retroposed cellular genes. 3. Simple repeats - microstats 4. Segmental duplications 5. Tandom repeats (telomere, centromere)

Few than expected genes Genome complexity GeneSweep Ewan Birney (Welcome Trust Sanger Institute) Alternative splicing 56% for Humans 22% for Worms The happy winner. Lee Rowen of the Institute for Systems Biology. 25,947 genes. Regulators elements Promoters, enhancers, repressors This is where it get complicated. Variation among chromosomes Variation within chromosomes GC Recombination Initial sequencing and analysis of the human genome International Human Genome Sequencing Consortium Nature 409, 860-921 (15 February 2001) Gene density Overall recombination rate dependent on chromosome length. Large variation in the gene density between chromosome. Difference in organisation The genome is non-random in its organisation Recombination High at telomere GC Variation at many scales - Isochores Gene Density Organisation by function

New observations Completing the Human Genome 2001 Variation at multiple scales within and between chromosomes Only twice as many genes as flies and worms but more proteins have arrived from bacteria and transposable elements Transposons inactive and LTR probably also (Alu s in GC rich regions) Most mutations occur in males (higher mutation rate) GC poor regions correspond to dark bands. Recombination rates are higher at telomeres Lots of between individual variation Humans Genome Project starts 1990 Draft Human Genome completed 2001 Fewer gaps 147,821 341 More continuity 81kb 38,500kb Gene rich regions completed 2003 Error rate of ~1 in per 100,000 bases 2.85 billion bases Covers ~99% of the euchromatic genome. Each chromosome compiled and annotated. 2006! Go home? Not quite finished New builds: Build 36, May 2006 Build 35, May 2004 Build 34, July 2003 Build 33, April 2003 Chromosome 1 Segmental duplications - allow genes to diversify and acquire novel functions. December 2001 - NCBI 28 July 2003 - NCBI 34 Duplication of a gene from one to many positions on the chromosome. A pericentric inversion follows a duplication of two genes

Chromosomes 2 and 4 Chromosomes 3 Gene deserts Megabase sized genomic segments containing no known coding genes. (some show conservation) Lowest rate of segmental duplication Large inversion from our ancestor with chimps. Role of these regions? Lowest recombination rates of all the autosomes Chromosomes 7 Chromosomes 10 Complex repeat patterns and fragile locations Multi-species alignment gene involved in cancer Williams-Beuren syndrome associated with a large deletion (1.6Mb). Lots of repetitive and duplicated DNA. What is the true sequences? It is characterized by a distinctive, "elfish" facial appearance, along with a low nasal bridge; an unusually cheerful demeanor and ease with strangers, coupled with unpredictably occurring negative outbursts; mental retardation coupled with an unusual facility with language; a love for music; and cardiovascular problems, such as supravalvular aortic stenosis and transient hypercalcemia. Conservation indicates the location of functional elements. Some are known genes. Others aren t higher levels of conservation!

Chromosomes 19 Chromosomes 12 and 3 Very high gene density Recombination rate variation Increase in all classes of known genes. 26 genes per megabase. Knowing the physical positions of variants allows recombination rates What is special about this chromosome? Male and female rates differ Fine scale variation Has high recombination rate. And repeat density And GC content. Where is the data available N.C.B.I. www.ncbi.nlm.nih.gov/genome/guide/human/ Part of the National Institute of Health. Has a number of important associated projects. Mr NCBI David Lipman. Ensembl www.ensembl.org/homo_sapiens/ A joint project between EMBL and the Sanger Institute. Primarily funded by the Welcome Trust. Mr Ensembl Ewan Birney UCSC genome.ucsc.edu/cgi-bin/hggateway Based at the University of California Santa Cruz. Largely funded by the NHGRI. Mr UCSC David Hassler What data available Compositional Base composition Insertion deletions Segmental duplications Repeats Transposable elements Functional Regulatory elements Gene expression Evolutionary Species comparison Variation data Population genetic analysis Use drop down controls below and press refresh to alter tracks displayed. Tracks with lots of items will automatically be displayed in more compact modes. Mapping and Sequencing Tracks Base Positio Chromoso STS n me Band Markers Map Contigs Fosmid End Pairs RGD QTL Known Vega N- SCAN August us sno/mi RNA mrna and EST Tracks Human mrna Spliced s ESTs H-Inv Assembly GC Percent Human Mutation CCDS Vega Pseudogen es SGP Retropose d ExonWalk TIGR Gene Index Gap WSSD Duplicatio n Phenotype and Disease Associations and Gene Prediction Tracks RefSeq Ensembl Geneid Superfami ly Human ESTs UniGene Expression and Regulation Allen Brain GNF Atlas 2 GNF Ratio FISH Clones Coverage Short Match Other RefSeq AceView Genscan Yale Pseudo Other mrnas Gene Bounds Affy HuEx 1.0 Recomb Rate BAC End Pairs Restr Enzymes MGC ECgene Exoniphy EvoFold Other ESTs Alt-Splicing Affy U133

Orientation Annotation - Repeats Transposable elements Make up a large proportion of the genome Human chromosomes are numbered Arms are labelled p and q Regions labelled ascending from centromere. Bases numbered from beginning of small arm to end of long arm. Microsatellites and repeats Important in many common diseases Some of the most polymorphic loci Annotation - genes Annotation Expression and Regulation Different levels of evidence for genes mrna evidence Based on homology Based on expression Based on prediction Expression Levels & Tissues Regulatory Elements Protein evidence Gene prediction EST evidence Predicted transcripts - Known Novel Manually annotated genes Regulatory elements might be important in complex diseases Micro array technology is generating expression data on a large scale Expression varies in space and time

Annotation Evolutionary Encylopedia of DNA Elements - Encode Cross Species (issues - alignment) 1% of genome 14 manually chosen regions (Alpha & beta globin, HOXA, FOXP2 and CFTR) Plus 26 random regions Within Humans (issues - ascertainment) Variation group SNPs indels Function group Promoters, transcription and binding Chromatin group Chromatin modification, replication origins Multiple sequence alignment Conservation vs Constraint Variation is the most important feature of the genome!? Aim: Understand everything possible about these regions. Human Variation HapMap Project SNPs most common variation in the human genome 2002 HapMap phase I begins Three populations (YRI) Yoruba in Ibadan, Nigeria 90 (CEU) Utah, USA 90 (CHB) Han Chinese in Beijing 45 (JPT) Japanese in Tokyo 44 Approximately 1 million SNPs 10 million common variants. Synonymous Non-synonymous variation Information in the density of SNPs. Information in the frequency of SNPs. Information in the correlation between SNPs. 2005 Phase I complete, phase II begins Increase from 1 million to ~ 4.6 million 2006 Phase II complete, phase III begins Additional 6 populations Kenya, African Americans, Mexican Americans, Italy, India

The International HapMap Learing from studies of human variation Can learn about how genetic diversity is structured across the globe Identify regions which have been under recent positive selection Identify recombination hotspots Linkage Disequilibrium information is an important tool Population genetic annotation is often sample specific Hot Topics Chromosomes X and Y Micro RNA s 20mers of RNA that form a diversity of roles e.g. regulating mrna levels Sex chromosomes Structural variation The genome of is full of polymorphic insertions and deletions, from 1kb to a Megabase Genome-wide association studies Millions of s being spend on scanning the genome for loci showing association with disease status.