Genomes: What we know and what we don t know

Size: px
Start display at page:

Download "Genomes: What we know and what we don t know"

Transcription

1 Genomes: What we know and what we don t know Complete draft sequence 2001 October 15, 2007 Dr. Stefan Maas, BioS Lehigh U.

2 What we know Raw genome data

3 The range of genome sizes in the animal & plant kingdoms! No correlation between genome size and complexity

4 What accounts for the often massive and seemingly arbitrary differences in genome size observed among eukaryotic organisms? The fruit fly Drosophila melanogaster The mountain grasshopper Podisma pedestris 180 Mb 18,000 Mb The difference in genome size of a factor of 100 is difficult to explain in view of the apparently similar levels of evolutionary, developmental and behavioral complexity of these organisms.

5 Complexity does not correlate with genome size bp Homo sapiens bp Amoeba dubia

6 Complexity does not correlate with gene number ~31,000 genes ~26,000 genes ~50,000 genes

7 Is an Expansion in Gene Number driving Evolution of Higher Organisms? Vertebrata Urochordata Arthropoda Nematoda 30,000 16,000 14,000 21,000 Fungi 2,000 13,000 Vascular plants Unicellular sps. 25,000 60,000 5,000 10,000 Prokaryotes 500-7,000

8 Structure of DNA Watson and Crick in 1953 proposed that DNA is a double helix in which the 4 bases are base paired, Adenine (A) with Thymine (T) and Guanine (G) with Cytosine (C).

9

10

11 Steps in the folding of DNA to create an eukaryotic chromosome 30 nm fiber (6 nucleosomes per turn) Factor of condensation: Ca. 10,000 fold Why?! Facilitates movement during cell division! Decreases error rate

12 Genes code for Proteins with RNA as an intermediate

13 The Genetic Code

14 The human karyotype short arm 23 pairs of chromosomes long arm

15 Gene splicing: Removal of non-coding introns Intron Exon mature splice product PROTEIN

16 Human genome facts ~ 30,000 genes On average: Coding length: 1.4 kb Gene extent: 30 kb 8 Exons (135 bp), 7 introns (2,200 bp) Gene density: 11.5/1 Mb

17 Is the genome empty? 1.5% Exons Introns (junk?) Intergenic regions (junk?)

18 Correlation between complexity and amount of non-coding DNA 1.0

19

20 DNA with low complexity is located in the middle and at the ends of a chromosome = gene poor region

21 Imagine a gene that could duplicate copies of itself within the genome time

22 Repeats dominate the human genome One megabase from chromosome 7 Interspersed repeats UCSC Genome Browser Human genome: ~ 50% repetitive sequences

23 Transposable elements in the human genome

24 Genome vs Transcriptome How much of the genomic DNA is converted into RNA sequences?

25 Transcriptome facts 97-98% of transcriptional output is non coding RNA (ncrna) Only 1.5% of genome are protein coding But: including introns, 30% of genome is transcribed Adding ncrna genes: >50% of genome is transcribed

26 Genotyping

27 Sequence variations between individuals

28 Sequence variations between individuals

29 The Genome Sequence -- an open book. (written in well-known language but poorly understood grammar) How make sense of whole genome sequences? Bioinformatics helps to: " Find regulatory sequences " Find protein coding sequences " Find related sequences " Compare sequences across species Main aim: understand what are the functions of all genes and how they work together

30

31 Functional classification of expressed genes Collection of expressed genes Randomly selected sequences from human cells grouped by function

32 Genes in the Mycoplasma genitalium classified by function Smallest genome of any known free living organism! What is the smallest number of genes needed for survival? 580 kb genome <> 471 genes

33 Functional genomics I Use of DNA microarrays (chips) normal brain vs Example: brain tumor Fluorescently tagged cdna probes are hybridized to DNA spots in the microarray for studying differential expression of thousands of genes at a time in two mrna samples

34 What is Proteomics

35

36 Steps in the creation of a transgenic mouse

37

38 Summary - Conclusions We do know: Complete human genome sequence Function of ca. 50% of protein-coding genes Some interactions and regulation of genes > 2700 disease-causing single gene mutations We don t know: Function of at least 50% of genes How do gene products work together? (within cells and within organism) What makes the human so special? (its not the gene number or genome size)