SUPPLEMENTARY INFORMATION

Similar documents
1.15 Influence of total read coverage on the detection of transcribed proteincoding. 2 Supplementary Note Tables Supplementary Note Figures 40

SUPPLEMENTARY INFORMATION

mrna Sequencing Quality Control (V6)

Figure S1 Correlation in size of analogous introns in mouse and teleost Piccolo genes. Mouse intron size was plotted against teleost intron size for t

SUPPLEMENTARY INFORMATION

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

How does the human genome stack up? Genomic Size. Genome Size. Number of Genes. Eukaryotic genomes are generally larger.

Nature Genetics: doi: /ng.3556 INTEGRATED SUPPLEMENTARY FIGURE TEMPLATE. Supplementary Figure 1

Outline. Computational Genomics and Molecular Biology. Genes Encode Proteins. DNA forms a double stranded helix. DNA replication T C A G

Nature Genetics: doi: /ng Supplementary Figure 1. Neighbor-joining tree of the 183 wild, cultivated, and weedy rice accessions.

SUPPLEMENTARY INFORMATION

Supplementary Figure 1 An overview of pirna biogenesis during fetal mouse reprogramming. (a) (b)

Higher Unit 1: DNA and the Genome Topic 1.1 The Structure and Organisation of DNA

The genome of Leishmania panamensis: insights into genomics of the L. (Viannia) subgenus.

Answer: Sequence overlap is required to align the sequenced segments relative to each other.

Molecular Genetics of Disease and the Human Genome Project

Guided tour to Ensembl

DNA Function: Information Transmission

CS 262 Lecture 14 Notes Human Genome Diversity, Coalescence and Haplotypes

Nature Methods: doi: /nmeth Supplementary Figure 1. Construction of a sensitive TetR mediated auxotrophic off-switch.

Lecture 12. Genomics. Mapping. Definition Species sequencing ESTs. Why? Types of mapping Markers p & Types

Evolutionary dynamics and tissue specificity of human long noncoding RNAs in six mammals

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

Cufflinks Scripture Both. GENCODE/ UCSC/ RefSeq 0.18% 0.92% Scripture

Unit 1: DNA and the Genome. Sub-Topic (1.3) Gene Expression

I. Gene Expression Figure 1: Central Dogma of Molecular Biology

Transcription. The sugar molecule found in RNA is ribose, rather than the deoxyribose found in DNA.

Lecture for Wednesday. Dr. Prince BIOL 1408

Course Information. Introduction to Algorithms in Computational Biology Lecture 1. Relations to Some Other Courses

Transcription and Translation. DANILO V. ROGAYAN JR. Faculty, Department of Natural Sciences

SUPPLEMENTARY INFORMATION

Lecture Summary: Regulation of transcription. General mechanisms-what are the major regulatory points?

Physical Anthropology 1 Milner-Rose

Introduction to Algorithms in Computational Biology Lecture 1

MODULE 5: TRANSLATION

CHAPTER 21 LECTURE SLIDES

Unit 2: Biological basis of life, heredity, and genetics

CHAPTERS , 17: Eukaryotic Genetics

30 Gene expression: Transcription

Hominoid-Specific De Novo Protein-Coding Genes Originating from Long Non-Coding RNAs

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype?

Result Tables The Result Table, which indicates chromosomal positions and annotated gene names, promoter regions and CpG islands, is the best way for

Temperature sensitive gene expression: Two rules.

COMPETITOR NAMES: TEAM NAME: TEAM NUMBER:

A primer on the structure and function of genes

Review of Protein (one or more polypeptide) A polypeptide is a long chain of..

Chimp Chunk 3-14 Annotation by Matthew Kwong, Ruth Howe, and Hao Yang

Differential Gene Expression

Supplemental Data. Zhou et al. (2016). Plant Cell /tpc

Genomes summary. Bacterial genome sizes

7.03, 2005, Lecture 20 EUKARYOTIC GENES AND GENOMES I

Identification of individual motifs on the genome scale. Some slides are from Mayukh Bhaowal

SUPPLEMENTARY FIGURES AND TABLES. Exploration of mirna families for hypotheses generation

Supplementary Fig. S1. Building a training set of cardiac enhancers. (A-E) Empirical validation of candidate enhancers containing matches to Twi and

Annotating the Genome (H)

Wednesday, November 22, 17. Exons and Introns

Mammalian non-cg methylations are conserved and cell-type specific and may have been involved in the evolution of transposon elements

The genetic code of gene regulatory elements

CHAPTER 21 GENOMES AND THEIR EVOLUTION

M1 - Biochemistry. Nucleic Acid Structure II/Transcription I

BLASTing through the kingdom of life

Differences between prokaryotes & eukaryotes. Gene function

Figure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By

Transcription is the first stage of gene expression

Enzyme that uses RNA as a template to synthesize a complementary DNA

The Structure of Proteins The Structure of Proteins. How Proteins are Made: Genetic Transcription, Translation, and Regulation

Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C

Genomic resources for canine cancer research. Jessica Alföldi (for Kerstin Lindblad-Toh) Broad Institute of MIT and Harvard

FINDING THE PAIN GENE How do geneticists connect a specific gene with a specific phenotype?

CS313 Exercise 1 Cover Page Fall 2017

03-511/711 Computational Genomics and Molecular Biology, Fall

Chapter 13. From DNA to Protein

SUPPLEMENTARY INFORMATION

T and B cell gene rearrangement October 17, Ram Savan

Differential Gene Expression

Key Area 1.3: Gene Expression

What would this eye color phenomenon be called?

Solutions will be posted on the web.

Nature Genetics: doi: /ng Supplementary Figure 1. ChIP-seq genome browser views of BRM occupancy at previously identified BRM targets.

Predicting Protein Structure and Examining Similarities of Protein Structure by Spectral Analysis Techniques

Solutions will be posted on the web.

DNA. Essential Question: How does the structure of the DNA molecule allow it to carry information?

DNA & Protein Synthesis. The source and the process!

DNA and Biotechnology Form of DNA Form of DNA Form of DNA Form of DNA Replication of DNA Replication of DNA

Molecular Genetics. The flow of genetic information from DNA. DNA Replication. Two kinds of nucleic acids in cells: DNA and RNA.

DNA. Using DNA to solve crimes

SUPPLEMENTARY INFORMATION

9/19/13. cdna libraries, EST clusters, gene prediction and functional annotation. Biosciences 741: Genomics Fall, 2013 Week 3

Chromosomal Mutations. 2. Gene Mutations

PrimePCR Assay Validation Report

BIOLOGY - CLUTCH CH.17 - GENE EXPRESSION.

Bio 101 Sample questions: Chapter 10

Supplementary Figure 1

The evolution of lncrna repertoires and expression patterns in tetrapods

PrimePCR Assay Validation Report

Genome-Wide Survey of MicroRNA - Transcription Factor Feed-Forward Regulatory Circuits in Human. Supporting Information

Systematic evaluation of spliced alignment programs for RNA- seq data

Selective constraints on noncoding DNA of mammals. Peter Keightley Institute of Evolutionary Biology University of Edinburgh

Mate-pair library data improves genome assembly

Introduction to Genome Biology

Transcription:

doi:138/nature10532 a Human b Platypus Density 0.0 0.2 0.4 0.6 0.8 Ensembl protein coding Ensembl lincrna New exons (protein coding) Intergenic multi exonic loci Density 0.0 0.1 0.2 0.3 0.4 0.5 0 5 10 15 20 0 5 10 15 c Total per base read coverage (log2 transform) d Total per base read coverage (log2 transform) Sequence conservation (PhastCons) 0.2 0.4 0.6 0.8 Sequence conservation (PhastCons) 0.2 0.4 0.6 0.8 Intron 5' Exon 3' Intron Intron 5' Exon 3' Intron Position along the gene Position along the gene Supplementary Figure 1. Expression level and sequence conservation of newly annotated regions in the human and platypus genomes. a, b Density plot of the total per-base read coverage (all samples combined) for four categories: exons of Ensemblannotated protein-coding genes (black) and large intervening noncoding RNA (lincrna) genes (green), new exons added to previously known (Ensembl) protein-coding genes (blue), and exons of intergenic multi-exonic transcribed loci (red). The read coverage was log2-transformed (an offset of 1 was added to preserve null values). c, d Variation of the mean PhastCons score around exon boundaries, for the same categories of exons. The PhastCons score was computed for 100 positions, divided into two 50 bp segments centered on the exon boundaries, and was averaged over all exons in a class. Only internal exons with a length of at least 50 bp and with flanking introns larger than 25 bp were considered for this analysis. WWW.NATURE.COM/NATURE 1

RESEARCH SUPPLEMENTARY INFORMATION brain chimp M 4 bonobo F 2 chimp M 5 chimp M 2 chimp M 3 human M 4 human M 5 human M 3 kidney cerebellum heart < 0.9 < 0.7 < 0.5 liver chicken M 2 testis 0.1 platypus M 2 platypus M 3 opossum M 2 Supplementary Figure 2. Mammalian gene expression phylogenies. Neighbor-joining trees based on distance matrices comprising all pairwise expression level distances (1 rho, Spearman s correlation coefficient) for the six different organs. Trees are drawn to the same scale (indicated by the scale bar). Bootstrap values (i.e., proportions of replicate trees that have the branching pattern as in the majority-rule consensus tree shown) are indicated by circles at the corresponding nodes: > 0.9 (white fill), < 0.9 (yellow), < 0.7 (red). Mammalian species names are color-coded according to the three major mammalian lineages: monotremes (i.e., platypus; light blue), marsupials (opossum; dark blue), and eutherians (mice and primates; black). 2 WWW.NATURE.COM/NATURE

RESEARCH brain cerebellum heart < 0.9 < 0.7 < 0.5 chimp M 4 bonobo F 2 chimp M 5 chimp M 2 chimp M 3 human M 4 human M 5 human M 3 kidney liver testis 10 chicken M 2 platypus M 2 platypus M 3 opossum M 2 Supplementary Figure 3. Mammalian gene expression phylogenies. Neighbor-joining trees based on distance matrices comprising all pairwise expression level Euclidean distances for the six different organs. Trees are drawn to the same scale (indicated by the scale bar). Bootstrap values (i.e., proportions of replicate trees that have the branching pattern as in the majority-rule consensus tree shown) are indicated by circles at the corresponding nodes: > 0.9 (white fill), < 0.9 (yellow), < 0.7 (red). Mammalian species names are color-coded according to the three major mammalian lineages: monotremes (i.e., platypus; light blue), marsupials (opossum; dark blue), and eutherians (mice and primates; black). WWW.NATURE.COM/NATURE 3

RESEARCH SUPPLEMENTARY INFORMATION Supplementary Figure 4. Sex-biased expression of the vitellogenin egg yolk genes in platypus and chicken livers. a, Median gene expression levels (based on read counts) of genes in livers of male (M) and female (F) chicken and platypus animals (X-axis) are plotted against expression differences (shown as log2 fold-changes) between males and females (Y-axis). Top and bottom boxes contain genes with zero expression in females or males, respectively. Vitellogenin genes or gene fragments are marked in green: in the chicken plot, the green plus sign indicates the VIT2 gene (further illustrated in subpanel b), the more central green circle indicates the VIT1 gene, and the less central green circle represents VIT3 (VIT2 and VIT3 have significantly higher expression levels in females, see also Supplementary Table 41); in the platypus plot, the plus sign indicates the contig containing the major part of the predicted platypus VIT gene, whereas the three green circles represent contigs containing the remaining platypus VIT exons. b, Ensembl annotation of chicken VIT2 and a major part of platypus VIT and the corresponding read coverage in females and males. 4 WWW.NATURE.COM/NATURE

RESEARCH brain human M 5 human M 3 chimp M 4 bonobo F 2 chimp M 2 chimp M 5 chimp M 3 human M 4 kidney cerebellum liver heart testis < 0.9 < 0.7 < 0.5 0.01 Supplementary Figure 5. Primate gene expression phylogenies. Neighbor-joining trees based on distance matrices comprising all pairwise expression level distances (1 rho, Spearman s correlation coefficient) for the six different organs. Note that the underlying gene expression values were calculated based on constitutive orthologous exons (from the 13,277 1 1 orthologous genes) that are perfectly aligned (no gaps permitted) between the six primate species (see Supplementary Note section 1.6 for details). Thus, we rule out any gene expression variation that might arise when gene expression levels are computed for potentially different sequences (e.g., as a result of annotation or biological/genomic differences) among 1 1 orthologs from different species. Bootstrap values (i.e., proportions of replicate trees that have the branching pattern as in the majority-rule consensus tree shown) are indicated by circles at the corresponding nodes: > 0.9 (white fill), < 0.9 (yellow), < 0.7 (red). WWW.NATURE.COM/NATURE 5

RESEARCH SUPPLEMENTARY INFORMATION a ape b platypus Supplementary Figure 6. Comparisons of s among mammalian lineages. a, b, Bootstrap pseudosamples (1,000) were generated that each comprise 5,636 sets of orthologues from one randomly sampled individual per species or group of species (in the case of apes). For each of these pseudosamples, Neighbor-joining trees of two species and the outgroup (chicken) were constructed; the total length of the branch (or branches) from the last common ancestral node to the respective terminal nodes was assessed for each of these trees. 95% confidence intervals are shown. Statistical significance was assessed by counting the number of trees in which one or the other species/lineage had longer s: [] indicates two-tailed P < (i.e., one lineage with longer branches in more than 97.5% of replicates; after Bonferroni correction for the six tests performed for the six tissues in each comparison). To factor out within-species differences, similar analyses were performed that measure s from the last common ancestral node of a given species-pair to the last common ancestral node of individuals for each of the species in the pair (i.e., terminal branches were not considered in this analysis; Supplementary Fig. 6). 6 WWW.NATURE.COM/NATURE

RESEARCH human macaque gorilla opossum br cb ht kd lv orangutan platypus br ht kd lv Supplementary Figure 7. Comparisons of s among mammalian lineages. Bootstrap pseudosamples (1,000) were generated that each comprise 5,636 sets of orthologues from one randomly sampled individual per species. For each of these pseudosamples, Neighbor-joining trees of two species and the outgroup (chicken) were constructed. To factor out within-species branch length differences, the total length of the branch (or branches) from the last common ancestral node of the given species pair to the most recent common ancestral nodes of individuals for each species in the pair, respectively, was assessed for each of these trees. 95% confidence intervals are shown. Statistical significance was assessed by counting the number of trees in which one or the other species/lineage had longer s: [] indicates two-tailed P < (i.e., one lineage with longer branches in more than 97.5% of replicates; after Bonferroni correction for the six tests performed for the six tissues in each comparison). WWW.NATURE.COM/NATURE 7

RESEARCH SUPPLEMENTARY INFORMATION TEX13B (testis) testis expressed 13B SHBG (testis) sex hormone-binding globulin MYL6B (heart) myosin, light chain 6B, alkali, smooth muscle FAM185A (brain) family with sequence similarity 185, member A ALX4 (cerbellum) ALX homeobox 4 A4GALT (cerebellum) alpha 1,4-galactosyltransferase max-min / log2(rpkm) SPO11 (testis) SPO11 meiotic protein covalently bound to DSB homolog (S. cerevisiae) ZNF512 (brain) zinc finger protein 512 ATRX (testis) alpha thalassemia/mental retardation syndrome X-linked (RAD54 homolog, S. cerevisiae) chicken platypus opossum macaque orangutan gorilla chimpanzee bonobo human Supplementary Figure 8. Genes with shifts in optimal expression values (individual tissues). Examples of genes that evolved new optimal expression levels in a given mammalian lineage or species (as detected using a phylogenetic maximum likelihood procedure see main text and Supplementary Note) are shown (details regarding these genes and all other statistically significant expression changes in mammals are provided in Supplementary Tables 9-24): TEX13B (ENSG00000170925), specifically expressed in spermatogonia (i.e., pre-meiotic cells), was significantly upregulated in human testis; SHBG (ENSG00000129214), a key gene in testosterone metabolism evolved high expression in chimpanzee and bonobo testes; MYL6B (ENSG00000196465), encoding a myosin light chain subunit, became upregulated in gorilla heart; FAM185A (ENSG00000222011), a gene with unknown function evolved significantly higher expression in orangutan brain; A4GALT (ENSG00000128274; involved in glycolipid metabolism) and ALX4 (ENSG00000052850; previously implicated in human cranial phenotypes) shifted to lower expression in great ape cerebellum; SPO11 (ENSG00000054796; key for double strand formation during meiotic recombination) evolved high expression in testes; ZNF512 (ENSG00000243943; zinc finger protein 512, a potential transcription factor) shifted to high expression in opossum brain; and ATRX (ENSG00000085224; various molecular functions associated with e.g. chromosome segregation; involved in mental disorders) became downregulated in therian testis. Information about all genes was derived from the OMIM database (http://www.ncbi.nlm.nih.gov/ omim). - 8 WWW.NATURE.COM/NATURE

RESEARCH Mitogen-activated protein kinase 7 interacting protein 3 (X-linked) chicken platypus opossum macaque orangutan gorilla chimpanzee bonobo human max-min / log2(rpkm) GNAI3 (therians): guanine nucleotide binding protein (G protein), alpha inhibiting activity polypeptide 3 brain ce rebellum heart kidney liver testis Supplementary Figure 9. Genes with expression level shifts (multiple tissues). Examples of genes that evolved new optimal expression levels in multiple therian tissues are shown (details regarding these genes and all other statistically significant expression changes are provided in Supplementary Tables 9-24). MAP3K7IP3 (an X-linked gene; ENSG00000157625) evolved significantly lower expression in therian heart and kidney and tends to have lower expression in eutherian/therian liver, testis, brain and cerebellum. GNAI3 (a Gi alpha-like protein; ENSG00000065135) became significantly downregulated in the brain and cerebellum and tends to show reduced expression in the other therian tissues as well. WWW.NATURE.COM/NATURE 9