Personal and population genomics of human regulatory variation

Size: px
Start display at page:

Download "Personal and population genomics of human regulatory variation"

Transcription

1 Personal and population genomics of human regulatory variation Benjamin Vernot,, and Joshua M. Akey Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA Ting WANG 1

2 Brief To provide new insights into the distribution and characteristics of human regulatory variation in individuals and populations. data genome-scale maps of regulatory DNA marked by DNase I hypersensitive sites (from 138 cell types) whole-genome sequences (of 53 geographically diverse individuals) Ting WANG 2

3 Background A significant amount of functionally important DNA is located in noncoding regions Genetic variation in such regions (regulatory variation) likely makes a significant contribution to phenotypic variation and disease susceptibility among individuals Some examples suggest that adaptive evolution of noncoding DNA leads to positive selection Ting WANG 3

4 Accurately localizing functional noncoding elements that regulate transcription. computationally predicted sites are often not functional evolutionary-based methods may miss many functional elements large-scale experimental studies of noncoding DNA, (ENCODE Project) are providing a detailed roadmap to the locations of regulatory DNA in the human genome. Ting WANG 4

5 DHSs and regulator footprints The binding of sequence-specific transcriptional regulators in place of canonical nucleosomes creates DNase I hypersensitive sites (DHSs). Nucleotide resolution analysis of DNase I cleavage patterns allows identification of the footprints of DNA-bound regulators. analyze patterns of genetic variation in regulatory DNA marked by DHSs and DNase I footprints Ting WANG 5

6 Overview of DNase I and whole-genome sequence data Ting WANG 6

7 Dnase I peaks and footprints With genome sequence data variants in peaks, footprints and exome Variants were filtered for deviations from Hardy- Weinberg equilibrium functional variation, GERP>=3 GERP scores, a measure of evolutionary constraint, positive values indicating greater conservation Ting WANG 7

8 regulatory variation across the human genome Peaks and footprints not only have an overall larger number of variants relative to exomes but also manifest more high GERP variants compared with proteincoding regions Note, protein-coding DNA contains proportionally more putatively functional variation compared with noncoding DNA 24.6%, 6.1%, and 3.8% of functional variants for exomes, footprints, and peaks Ting WANG 8

9 Distribution of the number of variants per individual Ting WANG 9

10 Patterns of nucleotide diversity in regulatory DNA sequence motifs Scan DNase I footprints for 732 known motifs Calculate nucleotide diversity, for each motif and for fourfold synonymous sites (a proxy for neutrally evolving). n is the number of chromosomes pi is the frequency of the major allele for the i th segregating site, S averaged across all instances of the motif in these regions Ting WANG 10

11 Ting WANG 11

12 heterogeneity in both selective constraint and mutation rate likely contribute to the differences in diversity observed among motifs. Normalized can mitigate variation in mutation rate. (dividing the per nucleotide estimate by the estimated neutral mutation rate) Ting WANG 12

13 Heterogeneity of functional constraint across cell types Calculate normalized averaged across all DNase I peaks for each cell lines Cell types with strong functional constraint core set of DHSs, present in more than one cell type category, exhibit the lowest levels of normalized diversity, consistent with stronger selective constraint, because they are necessary for proper transcriptional programs in multiple cell types Ting WANG 13

14 Evidence for ectopic activation of DHSs in malignant cell types immortalized and malignant cell lines may experience increased ectopic activation of DHSs (cell type-restricted DHSs, singleton) malignant cell types are significantly enriched (P < 10-4 ) for singleton DHSs 92 cell types with high-quality DNase I data. (Triangles) Observed proportion of singleton peaks. (Blue and green lines) Distribution (density histograms) of singleton peaks when randomly sampling 29 (blue) or five (green) cell types; this is the distribution of the number of singleton peaks we would expect if malignant or stem cells were similar to normal cells, respectively. Note the malignant category (blue) shows significantly more singleton peaks than expected given its sample size, but the stem cell category (green) falls within the expected range. Ting WANG 14

15 Signatures of positive selection A signature of geographically restricted selection DNase I peaks that contain variants with large allele frequency differences between populations Africans(16), Asians(13), and Europeans(8) Calculate locus-specific branch lengths (LSBLs) for variants in DNase I peaks. It is a function of pairwise FST between populations and helps isolate the direction of allele frequency change FST is calculated as 1-HS/HT, where HS and HT denote average subpopulation heterozygosity and total heterozygosity denote the F ST between Africans and Europeans, Africans and Asians, and Europeans and Asians as dab, dac, dbc, respectively. The LSBL for Africans is (dab + dac -dbc)/2 Ting WANG 15

16 Focus on peaks in the 1% tail of the empirical distribution of LSBLs in each population Pick out genes located within 50 kb of each of these peaks Ting WANG 16

17 Develop a list of putative target of recent positive selection 1. the most differentiated 1% of DHSs 2. contain one or more highly differentiated variants with a GERP >= 3 3. genes located within 50 kb of each of these peaks Example targets (C,D) Ting WANG 17

18 B: distribution of the proportion of highly differentiated DNase I peaks found for different categories of cell types C: distribution of African LSBL across intron 1 of VDR D: distribution of European LSBL across intron 4 of FTO. Ting WANG 18

19 Conclusions Regulatory variation is pervasive throughout the genome, and individuals likely harbor more functionally important variants in noncoding compared with proteincoding DNA. There is significant heterogeneity in the level of functional constraint in regulatory DNA among different cell types. Regulatory DNA present in multiple broad categories of cell types is significantly more constrained. Ting WANG 19

20 Ectopic activation of noncanonical cis-regulatory sequences contributes to the aberrant transcriptional changes that are observed in many cancers Describe a large compendium of DHSs that exhibit unusually large levels of population structure, consistent with the action of geographically restricted selection. Genes adjacent to these highly differentiated regulatory sequences are enriched for a number of biologically interesting pathways. Identify several hundred loci that contain signatures of local adaptation. Ting WANG 20