Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans

Size: px
Start display at page:

Download "Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans"

Transcription

1 Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans Alkes L. Price 1,2 *, Nick Patterson 3, Dustin C. Hancks 4, Simon Myers 5, David Reich 3,6, Vivian G. Cheung 4,7,8,9, Richard S. Spielman 4 * 1 Department of Epidemiology, Harvard School of Public Health, Boston, Massachusetts, United States of America, 2 Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts, United States of America, 3 Program in Medical and Population Genetics, Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America, 4 Department of Genetics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, United States of America, 5 Department of Statistics, University of Oxford, Oxford, United Kingdom, 6 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America, 7 Department of Pediatrics, University of Pennsylvania School of Medicine, Pennsylvania, United States of America, 8 The Children s Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America, 9 Howard Hughes Medical Institute, Philadelphia, Pennsylvania, United States of America Abstract Variation in gene expression is a fundamental aspect of human phenotypic variation. Several recent studies have analyzed gene expression levels in populations of different continental ancestry and reported population differences at a large number of genes. However, these differences could largely be due to non-genetic (e.g., environmental) effects. Here, we analyze gene expression levels in African American cell lines, which differ from previously analyzed cell lines in that individuals from this population inherit variable proportions of two continental ancestries. We first relate gene expression levels in individual African Americans to their genome-wide proportion of European ancestry. The results provide strong evidence of a genetic contribution to expression differences between European and African populations, validating previous findings. Second, we infer local ancestry (0, 1, or 2 European chromosomes) at each location in the genome and investigate the effects of ancestry proximal to the expressed gene (cis) versus ancestry elsewhere in the genome (trans). Both effects are highly significant, and we estimate that 1263% of all heritable variation in human gene expression is due to cis variants. Citation: Price AL, Patterson N, Hancks DC, Myers S, Reich D, et al. (2008) Effects of cis and trans Genetic Ancestry on Gene Expression in African Americans. PLoS Genet 4(12): e doi: /journal.pgen Editor: Greg Gibson, The University of Queensland, Australia Received July 10, 2008; Accepted November 4, 2008; Published December 5, 2008 Copyright: ß 2008 Price et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This work was supported by a Ruth Kirschstein National Research Service Award from the National Institutes of Health (ALP), by a Burroughs Wellcome Career Development Award in the Biomedical Sciences (DR), by National Institutes of Health grants (to VGC, RSS) and by the W. W. Smith Chair (VGC). Competing Interests: The authors have declared that no competing interests exist. * aprice@hsph.harvard.edu (ALP); spielman@pobox.upenn.edu (RSS) Introduction Admixed populations are uniquely useful for analyzing the genetic contribution to phenotypic differences among humans. Phenotypic differences that are observed among human populations may have systematic non-genetic causes, such as differences in environment [1,2]. However, in an admixed population such as African Americans, such differences are minimized and the only systematic differences among individuals are in the proportion of European ancestry, which can be accurately inferred using genetic data. Several recent epidemiological studies in African Americans have taken advantage of this, showing that many phenotypic traits vary with the proportion of European ancestry [3 5]. Here, we apply this idea to analyze population differences in gene expression. Gene expression is a fundamental determinant of cellular phenotypes, and understanding how gene expression variation is apportioned among human populations is an important aspect of biomedical research, as has been true for apportionment of human genetic variation at the DNA level [6]. Recently, four studies analyzed lymphoblastoid cell lines from HapMap samples and reported that a large number of expressed genes exhibit significant differences in gene expression among continental populations [7 10]. However, results of these studies may be affected by nongenetic factors such as differences in environment, differences in preparation of cell lines, or batch effects [2,9,11,12]. In particular, a recent review article has suggested that much of the expression variation across populations is caused by environmental factors [13]. On the other hand, analyses of expression differences that are correlated to ancestry within an admixed population are robust to all of these concerns. In this study, we analyzed lymphoblastoid cell lines from 89 African-American samples and investigated the relationship between expression levels of,4,200 genes and the proportion of European ancestry. We compared the results with those predicted from the differences in expression levels between 60 European samples (CEU from the International HapMap Project) and 60 African samples (YRI from HapMap) [6]. We confirmed the existence of heritable gene expression differences between CEU and YRI by showing a highly significant correspondence between observed CEU vs. YRI differences (i.e. differences between sample means) and the expression differences predicted by ancestry differences among African Americans. Notably, the correspondence holds regardless of whether differences between CEU and YRI are large or small. This suggests that the effects of heritable population differences on variation in gene expression are widespread across genes, mirroring population differences at the DNA level [6]. PLoS Genetics 1 December 2008 Volume 4 Issue 12 e

2 Author Summary Variation in gene expression is a fundamental aspect of human phenotypic variation, and understanding how this variation is apportioned among human populations is an important aim. Previous studies have compared gene expression levels between distinct populations, but it is unclear whether the differences that were observed have a genetic or nongenetic basis. Admixed populations, such as African Americans, offer a solution to this problem because individuals vary in their proportion of European ancestry while the analysis of a single population minimizes nongenetic factors. Here, we show that differences in gene expression among African Americans of different ancestry proportions validate gene expression differences between European and African populations. Furthermore, by drawing a distinction between an African American individual s ancestry at the location of a gene whose expression is being analyzed (cis) versus at distal locations (trans), we can use ancestry effects to quantify the relative contributions of cis and trans regulation to human gene expression. We estimate that 1263% of all heritable variation in human gene expression is due to cis variants. Heritable variation in gene expression may be due to cis or trans variants. Previous studies in humans have been successful in mapping both cis and trans effects, but the results they provide are far from complete, due to limited sample sizes [14,15,9,16 20]. In particular, the relative number of cis vs. trans associations that were reported varies widely across these studies, perhaps due to differences in power or choices of significance thresholds [13]. Thus, the overall extent of cis vs. trans regulatory variation in human gene expression has not yet been established. Here, by measuring how gene expression levels across all genes vary with local ancestry (0, 1 or 2 European chromosomes) either proximal to the expressed gene (cis) or elsewhere in the genome (trans), we estimate that 1263% of heritable variation in human gene expression is due to cis variants. Materials and Methods Genotype Data 100 African-American (AA) samples from the Coriell HD100AA panel were genotyped on the Affymetrix SNP 6.0 GeneChip. Genotyping was conducted at the Coriell Genotyping and Microarray Center, and the genotype data was obtained from the NIGMS Human Genetic Cell Repository at Coriell (see Web Resources). In addition, genotype data from 60 European (CEU), 60 African (YRI), 45 Chinese (CHB) and 44 Japanese (JPT) samples was obtained from Phase 2 HapMap [6] (see Web Resources). We restricted all analyses to 595,964 autosomal markers with,5% missing data in AA samples and,5% missing data in Phase 2 HapMap samples, with A/T and C/G markers excluded so as to preclude any ambiguity in strand complementarity. Our analyses were not sensitive to the number of markers used. Two AA samples which we identified as cryptically related to other AA samples were excluded from the set of samples used for principal components analysis. Genome-Wide and Local Ancestry Estimates of AA Samples Local ancestry (0, 1 or 2 European chromosomes) at each location in the genome was estimated for each AA sample using the HAPMIX program, a haplotype-based approach that has been shown to attain an r 2 of 0.98 between inferred local ancestry and true local ancestry in simulated African-American data sets (A.L.P., N.P., D.R. & S.M., unpublished data; see Web Resources, specifically The HAPMIX program inputs AA genotype data and phased CEU and YRI data from Phase II HapMap [6], and outputs the estimated probability of 0, 1 or 2 European chromosomes at each location in the genome. The weighted sum of these probabilities (multiplied by 0.00, 0.50 or 1.00, respectively) forms an estimate of local % European ancestry. Genome-wide ancestry was computed as the average of estimated local ancestry throughout the genome. Gene Expression Data Lymphoblastoid cell lines for 60 HapMap CEU, 60 HapMap YRI and the Coriell HD100AA samples were obtained from Coriell Cell Repositories (see Web Resources). Gene expression was assayed using the Affymetrix Genome Focus Array, as described previously [7]. We restricted our analysis to the 4,197 genes on the array that are expressed in lymphoblastoid cell lines [7]. The gene expression data is publicly available (GEO accession number GSE10824) (see Web Resources). For HD100AA samples, we excluded two cryptically related samples (see above), four samples identified as genetic outliers (see Results), and five samples for which gene expression measurements were not obtained, so that 89 AA samples were included in gene expression analyses. Validation Coefficient c of CEU versus YRI Gene Expression Differences in AA Samples For each gene g, we normalized gene expression measurements for CEU and YRI to have mean 0 and variance 1 across 120 CEU+YRI samples, and normalized gene expression measurements for AA by applying the same normalization for consistency. We implicitly assume an additive genetic model in which gene expression has genetic and non-genetic components, with part of the genetic component predicted by ancestry. Let e gs denote normalized gene expression of gene g for sample (i.e. individual) s. Let h s denote the genome-wide European ancestry proportion of sample s, so that h s has value 1 for CEU samples and 0 for YRI samples as above, and fractional values for AA samples. We consider a model in which e gs = a g h s +n gs for CEU and YRI samples and e gs = ca g h s +n gs for AA samples, where c is a global parameter and n gs represents the residual contribution to gene expression that is not predicted by ancestry. Thus, the parameter c represents a validation coefficient measuring the aggregate extent to which the observed gene expression differences a g between CEU and YRI (differences between sample means) are heritable. We implemented two different approaches for fitting the parameters c and a g of this model: (1) Starting with the initial guess c = 1, we alternated computing maximum likelihood estimates for a g (for all g) conditional on c, and computing a maximum likelihood estimate for c conditional on a g (for all g), and iterated to convergence. In each case, the maximum likelihood estimates were obtained via linear regression (with a separate linear regression for each g when estimating a g, and a single linear regression when estimating c). (2) For each g, we estimated values ã g,ceu+yri by regressing e gs against h s using CEU and YRI data only, and ã g,aa by regressing e gs against h s using AA data only. We then regressed ã g,aa against ã g,ceu+yri to obtain an estimate of c. In this computation, we scaled our estimates of ã g,ceu+yri using the sampling error correction j (described below in Computation of Q ST ) to remove the effect of sampling error on the denominator S g (ã g,ceu+yri ) 2 of our estimate of c. (On the other hand, we note that sampling noise in the AA data does not bias our computation of c, whose expected value does not change when noise is added to ã g,aa ). We observed that approaches (1) and (2) produced identical estimates of c, indicating that both approaches are effective in PLoS Genetics 2 December 2008 Volume 4 Issue 12 e

3 finding the best fit to the model. We followed approach (2) to plot ã g,aa vs. ã g,ceu+yri and to compute estimates of c specific to different values of ã g,ceu+yri. Validation Coefficient c using Genotype Data Instead of Gene Expression Data We repeated the above computation using genotype data instead of gene expression data. We restricted the analysis to markers in which the average of CEU and YRI frequencies was between 0.05 and Although AA genotypes at each marker were used twice in this computation both for estimating genome-wide ancestry using all markers and for measuring the effect of genome-wide ancestry on genotype at a specific marker we note that with hundreds of thousands of markers, our estimate of genome-wide ancestry is negligibly impacted by data from a specific marker. Validation Coefficients c cis and c trans We investigated the effects of cis ancestry and trans ancestry on gene expression in AA. Roughly, we define cis ancestry as the local ancestry at the gene whose expression is being analyzed, and trans ancestry as the average ancestry at non-cis regions. We extended our above model by letting e gs = c cis a g c gs +c trans a g h s +n gs for AA samples, where c gs denotes the estimated local ancestry of sample s at the SNP closest to the center of gene g (cis locus; average of transcription start and transcription end positions). We note that although trans ancestry is theoretically defined as the average ancestry at non-cis regions, this quantity is in practice virtually identical to h s because cis regions (regardless of the precise definition of cis) form an extremely small proportion of the genome. Because chromosomal segments of ancestry in AA typically span.10 Mb [21], it is nearly always the case that a gene lies completely within a single ancestry block, so that our analysis is not sensitive to the choice of genomic location used to define cis ancestry c gs. The probabilistic estimates of local ancestry produced by HAPMIX are extremely accurate (see above), so that c gs is typically close to 0.00, 0.50 or 1.00 (corresponding to 0, 1 or 2 copies of European ancestry). To avoid complications in local ancestry analyses on the X chromosome, we restricted this analysis to 4,015 autosomal genes. (Analyses involving global ancestry were not affected by inclusion or exclusion of genes on the X chromosome.) We estimated the global parameters c cis and c trans as above, accounting for the correlation between genome-wide and local ancestry by using residual values of c gs (adjusted for h s )to compute ã cis,g,aa (and conversely for ã trans,g,aa ). Computation of Q ST Let F denote the proportion of total variance in gene expression that is attributable to population differences. For quantitative traits with an additive genetic basis, the quantity that is analogous to single-locus estimates of F ST is not F, but rather Q ST = F/(22F) (reviewed in [22]). This is a consequence of the contributions of genetic variation on two distinct haploid chromosomes, magnifying the effect of population differences under an additive genetic model. We computed both F and Q ST. For each gene g, we normalized gene expression measurements for CEU and YRI to have mean 0 and variance 1 across 120 CEU+YRI samples. We defined the ancestry h s of sample s to be 1 if s is a CEU sample, and 0 if s is a YRI sample. As above, we modeled normalized expression of gene g for sample s as e gs = a g h s +n gs. Equivalently, under this definition, a g is equal to the difference in normalized gene expression between CEU and YRI samples. We defined F to be the quantity such that the true value of a g has mean 0 and variance 2F across genes [23]. For a specific gene, a g h s has variance 0.25a g 2 and n gs has variance a g 2 across CEU+YRI samples (these variances have expected value 0.5F and 1 0.5F, respectively). Due to sampling error, the observed difference ã g in normalized gene expression between CEU and YRI samples (i.e. the coefficient obtained from a regression of e gs on h s ) has variance 2F+(1 0.5F)/30, where 1/30 is the sum of reciprocals of CEU and YRI sample sizes. We thus estimated mean F as (Var g (ã g ) 1/30)/ (2 0.5/30). The ratio between mean F and Var g (ã g )/2 represents a sampling error correction that we call j. We estimated median F as the median value of ã g 2 /2 times j. The value of j was 0.93, indicating that the sampling error correction had only a minor effect on these computations. To account for differences between CEU and YRI due to non-genetic factors, we adjusted F by multiplying it by c. (We note that the scaled population differences ca g have variance that is c 2 times the variance of a g, but explain only the proportion c of the true component of variance that is attributable to ancestry.) We then computed Q ST = F/(22F). We calculated the standard error of our estimate of F via jackknife, repeating the computation of F 120 times with one of the 120 CEU+YRI samples excluded in each computation, and estimating the standard error as the standard deviation of the 120 estimates times the square root of 120. Web Resources N (Coriell Cell Repositories) N (The NIGMS Human Genetic Cell Repository at Coriell) N (International HapMap Project) N (Gene Expression Omnibus) N (HAPMIX program) Results Genetic Data Show that African Americans Are Accurately Modeled using CEU and YRI We analyzed Affymetrix 6.0 genotype data from the African- American panel of 100 samples from Coriell Cell Repositories, together with HapMap samples (see Materials and Methods). We first ran principal components analysis, using the EIGENSOFT software [24]. The top two principal components are displayed in Figure 1, in which most AA samples roughly lie on a straight line running from CEU to YRI (we excluded three genetic outliers with partial East Asian ancestry and one genetic outlier whose ancestry is very close to CEU from subsequent analyses). This suggests that the ancestry of the AA samples might be reasonably approximated as a mixture of varying amounts of CEU and YRI ancestry, as reported previously [21]. However, given the wide range of genetic diversity across Europe and particularly across Africa [23], we sought to test this hypothesis further. We removed related samples, genetic outliers, and samples without valid gene expression measurements to obtain a reduced set of 89 AA samples for subsequent analysis (see Materials and Methods). We computed F ST values between the set of 89 AA samples and possible linear combinations aceu+(12a)yri, adjusting for sample size. The lowest value of F ST = was obtained at a = Thus, the 89 AA samples are extremely well-modeled as a mix of CEU and YRI, with average ancestry proportions of 21% CEU and 79% YRI. Though this justifies our modeling approach using CEU and YRI, we caution against drawing historical inferences from this finding: because F ST scales with the square of admixture proportion, it is possible that PLoS Genetics 3 December 2008 Volume 4 Issue 12 e

4 Figure 1. Principal components analysis of AA samples from Coriell together with HapMap samples. We display the top two principal components. doi: /journal.pgen g001 African Americans inherit a small percentage of their ancestry from a more diverse set of populations. We estimated the genome-wide proportion of European ancestry for each the 89 AA samples (see Materials and Methods). Genome-wide ancestry proportions varied from 1% to 62% with a mean6sd of 21614%; this ancestry distribution is similar to that in other AA data sets [21,25]. Genome-wide ancestry estimates were strongly correlated (r ) with coordinates along the top principal component (eigenvector with largest eigenvalue) (Figure 1). Gene Expression Levels Vary with Genome-Wide Ancestry in African Americans We measured gene expression in lymphoblastoid cell lines from 60 CEU and 60 YRI samples from HapMap and 89 AA samples from Coriell, using the Affymetrix Genome Focus Array (see Materials and Methods). Our basic approach was to validate observed differences between CEU and YRI (differences between sample means) by analyzing the correlation between the genomewide proportion of European ancestry estimated from SNP genotyping and the gene expression levels we measured in the AA cell lines. A caveat is that the proportion of European ancestry in African Americans might in principle be correlated to environmental variables. However, such correlations would not affect our approach unless they specifically tracked environmental differences between CEU and YRI. An additional caveat is that the Coriell panel of AA samples is known to be sampled from several (unknown) cities in the United States; AA samples from different U.S. cities might differ systematically in both the average proportion of European ancestry [21,26] and in the preparation of cell lines. However, ancestry differences among AA populations in different U.S. cities are usually relatively small (standard deviation of 1% in Table 2 of [21]; standard deviation of 6% in Figure 2 of [26]), and in any case would not affect our approach unless differences in cell line preparation specifically tracked differences between CEU and YRI. Using the ancestry estimates and expression data at 4,197 genes for CEU, YRI and AA samples, we fit a model in which the effect of ancestry on gene expression at gene g is equal to a g per unit of European ancestry for CEU and YRI samples (so that a g is equal to the difference in mean expression level between CEU and YRI, which have ancestry 1 and 0 respectively), and equal to ca g per unit of European ancestry for AA samples, where c is constant across genes (see Materials and Methods). Thus, the global parameter c measures the extent to which observed gene expression differences between CEU and YRI are validated in AA, and therefore heritable. If systematic differences observed between CEU and YRI were entirely due to genetic factors, we would expect to see the same ancestry effects in AA samples, so that c = 1. On the other hand, under the hypothesis that observed differences between CEU and YRI are entirely due to non-genetic factors, we would expect c = 0. We note that our procedure for estimating c accounts for both experimental noise and sampling noise in the measurement of gene expression levels. Thus, assuming analogous normalizations for CEU, YRI and AA samples, our estimate of c is not dependent on the accuracy of our measurements; it is also independent of sampling effects. Fitting the above model, we obtained c = 0.43, the slope of the regression line in Figure 2. With 4,197 genes analyzed, this estimate of c is different from zero with overwhelming statistical significance PLoS Genetics 4 December 2008 Volume 4 Issue 12 e

5 Figure 2. Gene expression differences between CEU and YRI are validated in AA samples. The y-axis shows the difference in normalized gene expression due to ancestry estimated from AA samples (ã g,aa ) and the x-axis shows the difference in normalized gene expression due to ancestry estimated from CEU and YRI samples (ã g,ceu+yri ) (see Materials and Methods for details of normalization). (A) We plot each of the 4,197 genes separately. (B) For aid in visualization, the 4,197 genes are averaged into bins of 20 genes according to values of ã g,ceu+yri ; binning does not affect the slope of the plot. The slope of each plot is our estimate 0.43 of the parameter c. doi: /journal.pgen g002 (P-value, ; 95% confidence interval [0.38,0.47]). Thus, gene expression differences among AA samples of varying ancestry strongly confirm that heritable differences contribute to observed gene expression differences between CEU and YRI. Performing the analogous computation with genotype data, we obtained c =0.96, confirming that c is close to 1 for genetic effects (see Figure 3) and that modeling AA as a mix of CEU and YRI is appropriate for our analyses. The deviation between c = 0.96 and the expected value of 1 is discussed in Text S1. We investigated whether the correspondence between observed CEU vs. YRI gene expression differences and expression differences due to ancestry among AA is concentrated in genes with large differences between CEU and YRI. If only a fraction of genes were truly differentiated, as suggested by previous studies, then genes with large observed CEU vs. YRI differences would be more likely to be truly differentiated and would show stronger validation in AA. For example, when we simulated a mixture model in which c = 0.43 for the set of all genes but only 50% of genes are truly differentiated between CEU and YRI, we obtained a larger value of c = 0.53 for genes in the top 10% of observed CEU vs. YRI differences (see Text S1). However, Figure 2 shows no evidence of nonlinear effects. Indeed, we recomputed c using only genes in the top 10% of the magnitude of observed CEU vs. YRI differences, and obtained c = 0.44, which is similar to the value of 0.43 using all genes. These results suggest that population differences in gene expression are not restricted to a fraction of genes but in fact are widespread across genes, mirroring population differences at the DNA level [6]. We considered whether the alternative approach of analyzing the AA data independently, without regard to differences between Figure 3. Genetic differences between CEU and YRI are validated in AA samples. Plots are analogous to Figure 2 except that genetic (SNP) data were used instead of gene expression data. (A) We plot a random subset of 4,197 markers, for visual comparison to Figure 2. (B) We average into bins of 20 markers. The slope of each plot is our estimate 0.96 of the parameter c. doi: /journal.pgen g003 PLoS Genetics 5 December 2008 Volume 4 Issue 12 e

6 CEU and YRI, would be informative about differences in gene expression due to ancestry. We determined that the AA data analyzed separately contains too much sampling noise for that approach to be useful here (see Text S1). A related observation is that efforts to estimate the proportion of genes with population differences in gene expression, for example using the previously described [27] lower bound statistic 1 p 0, may produce substantial underestimates in the case of data sets affected by sampling noise (see Text S1). Effects of cis versus trans Ancestry on Gene Expression in African Americans The effect of ancestry on gene expression in African Americans may be due either to variation in regulatory variants proximal to the gene (cis) or to variants elsewhere in the genome (trans). We inferred the local ancestry of each AA sample at each location in the genome (see Materials and Methods). A description of how local ancestry varies across the genome (either across or within samples) is provided in Text S1. We quantified the extent to which the validation of CEU-YRI expression differences in AA was attributable to cis or trans effects in AA by computing validation coefficients c cis and c trans (see Materials and Methods). We obtained c cis = 0.05 and c trans = As expected, the sum c cis +c trans is very close to the validation coefficient c that was obtained using genome-wide ancestry only (see Text S1). Both c cis (P-value = ; 95% confidence interval [0.03,0.07]) and c trans (Pvalue, ; 95% confidence interval [0.33,0.43]) were significantly different from zero. Thus, only a small fraction of the effect of ancestry on gene expression is due to ancestry at the cis locus. On the other hand, performing the analogous computation with genotype data, we obtained c cis = 0.99 and c trans = 0.03, indicating as expected that the effect of ancestry on genotype is entirely due to ancestry at the cis locus, and confirming the high accuracy of our estimates of local ancestry. We estimate the proportion p cis of heritable gene expression variation between Europeans and Africans that is due to cis variants as c cis /(c cis +c trans ) = 12%, with a standard error of 3%. An important question is whether our estimate of p cis can be extended to all heritable variation in human gene expression. If the relative magnitude of cis vs. trans effects were different for all variation as compared to population variation equivalently, if the relative magnitude of population variation relative to all variation were different for cis vs. trans effects then the answer to this question would be no. To evaluate whether this is the case, we computed F ST (CEU,YRI) for,3,000 unique cis eqtl SNPs and,700 unique trans eqtl SNPs identified in a recent study of gene expression in human liver [20]. We obtained F ST values of for cis eqtls and for trans eqtls, which were not significantly different from for all HapMap SNPs (Pvalues = 0.79 and 0.51 respectively), based on standard errors computed using the EIGENSOFT software [6,24]. Although this analysis involved eqtls for liver tissue rather than lymphoblastoid cell lines, a reasonable assumption is that the same result holds for other tissue types. Thus, population variation does not appear to differ for cis vs. trans effects, implying that our estimate of p cis =1263% applies to all heritable variation in human gene expression. Proportion of Variation in Gene Expression Attributable to Population Differences We estimated both the proportion of gene expression variation attributable to population differences, which we call F, and the quantity Q ST = F/(22F) which is analogous to F ST for genetic (allele-frequency) data (see Materials and Methods). We obtained a mean F = 0.20 and median F = 0.12, similar to the median F = 0.15 from a previous analysis of CEU and YRI gene expression [8]. A jackknife calculation indicated that the standard error in our estimate of mean F was 0.02, corresponding to a 95% confidence interval of [0.15,0.25]. In our initial calculation of F, we ignored the possibility of non-genetic contributions to population differences. However, the fact that c is smaller than 1 implies that not all of the observed CEU vs. YRI differences are reflected in differences due to ancestry among AA. Some of these differences must reflect non-genetic factors. We therefore adjusted our estimates of F by multiplying them by c = 0.43 (see Materials and Methods). After this adjustment, we obtained a mean F = 0.09 and median F = These estimates of F are substantially lower than those reported previously [8]. Our mean F corresponds to a Q ST value of 0.05, which is lower than the F ST of 0.16 that is observed in genetic data [6]. The lower value of Q ST as compared to genetic data is unsurprising since Q ST represents a proportion of total gene expression variation, which is expected to include both genetic and non-genetic components. We also note that if measurement variation is substantial, then the use of technical replicates to correct for the effects of measurement variation would lead to a higher value of Q ST. Discussion We have shown how phenotypic variation in an admixed population can be coupled with variation in ancestry to shed light on differences between ancestral populations; our approach makes no assumptions about the population histories underlying the differences between the ancestral populations. We have applied this approach to gene expression in African Americans and shown that observed population differences (differences in sample means) between CEU and YRI in gene expression correspond, with overwhelming statistical significance, to differences among African Americans of varying ancestry, implying a substantial heritable component to the population differences. In reaching this conclusion via analysis of an admixed population, we eliminate confounding with non-genetic contributions to observed differences between the ancestral populations, which could result from differences in environment, differences in preparation of cell lines, or batch effects. The value of 0.43 for the validation coefficient c implies that both genetic and non-genetic effects contribute to observed population differences between CEU and YRI. Interestingly, the validation coefficient c did not vary appreciably as a function of the magnitude of observed gene expression differences between CEU and YRI. This suggests that the effects of ancestry on gene expression are widespread across genes, as opposed to affecting only a fraction of genes. Although there exist genes for which the observed effect of ancestry on expression levels is close to zero (Figure 2), this does not rule out small ancestry effects at these genes, as similar results are observed in genetic data (Figure 3) in which it is commonly believed that ancestry affects 100% of common SNP frequencies. Indeed, if ancestry affects genotype and genotype affects gene expression- (as indicated by previous studies reporting a substantial heritable component to gene expression [16,17]), then the presence of ancestry differences at almost all expressed genes seems a not unreasonable hypothesis, and one with which our results are entirely consistent. However, just as with DNA variation, it is clear that population differences in gene expression represent only a small fraction of the overall variance, most of which is due to variation within populations. In addition to validating the aggregate effects of ancestry on human gene expression, we were able to partition heritable PLoS Genetics 6 December 2008 Volume 4 Issue 12 e

7 variation into cis and trans effects, which would not be possible in a simple comparison of continental populations. Our admixture approach was fruitful despite the small magnitude of differences between human subpopulations. Our distinction between cis and trans effects is somewhat imprecise, due to the extended length (.10 Mb) of segments of continental ancestry in African Americans, but this has little effect on our conclusions, since a 10 Mb region represents a proportion of the genome that is much smaller than the 12% proportion of heritable variation in gene expression that we attribute to variation at the cis locus. Comparing our results to results obtained in other species, we note that two recent studies of gene expression in Drosophila also reported that cis effects explain a small fraction of heritable variation [28,29], although previous Drosophila studies had suggested a larger role for cis effects [30,31]. Our results have broad ramifications for future efforts to map the genetic regulation of gene expression. However, conclusions drawn from gene expression measured in lymphoblastoid cell lines do not necessarily extend to other tissue types, motivating further investigation. References 1. Holmes E, Loo RL, Stamler J, Bictash M, Yap IK, et al. (2008) Human metabolic phenotype diversity and its association with diet and blood pressure. Nature 453: Idaghdour Y, Storey JD, Jadallah SJ, Gibson G (2008) A genome-wide gene expression signature of environmental geography in leukocytes of Moroccan Amazighs. PLoS Genet 4: e Nalls MA, Wilson JG, Patterson NJ, Tandon A, Zmuda JM, et al. (2008) Admixture mapping of white cell count: genetic locus responsible for lower white blood cell count in the Health ABC and Jackson Heart studies. Am J Hum Genet 82: Reiner AP, Carlson CS, Ziv E, Iribarren C, Jaquish CE, et al. (2007) Genetic ancestry, population sub-structure, and cardiovascular disease-related traits among African-American participants in the CARDIA Study. Hum Genet 121: Wassel Fyr CL, Kanaya AM, Cummings SR, Reich D, Hsueh WC, et al. (2007) Genetic admixture, adipocytokines, and adiposity in Black Americans: the Health, Aging, and Body Composition study. Hum Genet 121: International Hapmap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: Spielman RS, Bastone LA, Burdick JT, Morley M, Ewens WJ, et al. (2007) Common genetic variants account for differences in gene expression among ethnic groups. Nat Genet 39: Storey JD, Madeoy J, Strout JL, Wurfel M, Ronald J, et al. (2007) Geneexpression variation within and among human populations. Am J Hum Genet 80: Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, et al. (2007) Population genomics of human gene expression. Nat Genet 39: Zhang W, Duan S, Kistner EO, Bleibel WK, Huang RS, et al. (2008) Evaluation of genetic variation contributing to differences in gene expression between populations. Am J Hum Genet 82: Akey JM, Biswas S, Leek JT, Storey JD (2007) On the design and analysis of gene expression studies in human populations. Nat Genet 39: Spielman RS, Cheung VG (2007) Reply to On the design and analysis of gene expression studies in human populations. Nat Genet 39: Gilad Y, Rifkin SA, Pritchard JK (2008) Revealing the architecture of gene regulation: the promise of eqtl studies. Trends Genet 24: Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, et al. (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430: Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, et al. (2005) Mapping determinants of human gene expression by regional and genome-wide association. Nature 437: Going forward, admixed populations will continue to be useful for understanding and mapping gene expression and other phenotypes. Supporting Information Text S1 Supplementary Note. Found at: doi: /journal.pgen s001 (0.02 MB PDF) Acknowledgments We are grateful to J. Neubauer and A. Waliszewska for assistance with SNP genotyping and to W.J. Ewens and R.A. Kittles for helpful comments. Author Contributions Conceived and designed the experiments: ALP NP DR VGC RSS. Performed the experiments: DCH VGC RSS. Analyzed the data: ALP NP SM DR VGC RSS. Wrote the paper: ALP DR VGC RSS. Contributed data: DCH VGC RSS. 16. Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, et al. (2007) A genome-wide association study of global gene expression. Nat Genet 39: Goring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, et al. (2007) Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nat Genet 39: Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, et al. (2007) A survey of genetic human cortical gene expression. Nat Genet 39: Emilsson V, Thorleifsson G, Zhang B, Leonardson AS, Zink F, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: Schadt EE, Molony C, Chudin E, Hao K, Yang X, et al. (2008) Mapping the genetic architecture of gene expression in human liver. PLoS Biol 6: e Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, et al. (2004) A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet 74: Merila JC, Crnokrak P (2001) Comparison of genetic differentiation at marker loci and quantitative traits. J Evol Biol 14: Cavalli-Sforza LL, Menozzi P, Piazza A (1994) The history and geography of human genes. Princeton: Princeton University Press. 428 p. 24. Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genet 2: e Parra EJ, Kittles RA, Shriver MD (2004) Implications of correlations between skin color and genetic ancestry for biomedical research. Nat Genet 36: S54 S Kittles RA, Santos ER, Oji-Njideka NS, Bonilla C (2007) Race, skin color and genetic ancestry: implications for biomedical research on health disparities. California Journal of Health Promotion 5: Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100: Wittkopp PJ, Haerum BK, Clark AG (2008) Regulatory changes underlying expression differences within and between Drosophila species. Nat Genet 40: Wang HY, Fu Y, McPeek MS, Lu X, Nuzhdin S, et al. (2008) Complex genetic interactions underlying expression differences between Drosophila races: analysis of chromosome substitutions. Proc Natl Acad Sci USA 105: Hughes KA, Ayroles JF, Reedy MM, Drnevich JM, Rowe KC, et al. (2006) Segregating variation in the transcriptome: cis regulation and additivity of effects. Genetics 173: Osada N, Kohn MH, Wu CI (2006) Genomic inferences of the cis-regulatory nucleotide polymorphisms underlying gene expression differences between Drosophila melanogaster mating races. Mol Biol Evol 23: PLoS Genetics 7 December 2008 Volume 4 Issue 12 e

Human Population Differentiation is Strongly Correlated With Local Recombination Rate

Human Population Differentiation is Strongly Correlated With Local Recombination Rate Human Population Differentiation is Strongly Correlated With Local Recombination Rate The Harvard community has made this article openly available. Please share how this access benefits you. Your story

More information

Supplementary Note: Detecting population structure in rare variant data

Supplementary Note: Detecting population structure in rare variant data Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to

More information

Genome-wide analyses in admixed populations: Challenges and opportunities

Genome-wide analyses in admixed populations: Challenges and opportunities Genome-wide analyses in admixed populations: Challenges and opportunities E-mail: esteban.parra@utoronto.ca Esteban J. Parra, Ph.D. Admixed populations: an invaluable resource to study the genetics of

More information

Supplementary Figures

Supplementary Figures Supplementary Figures 1 Supplementary Figure 1. Analyses of present-day population differentiation. (A, B) Enrichment of strongly differentiated genic alleles for all present-day population comparisons

More information

Human Population Differentiation Is Strongly Correlated with Local Recombination Rate

Human Population Differentiation Is Strongly Correlated with Local Recombination Rate Human Population Differentiation Is Strongly Correlated with Local Recombination Rate Alon Keinan 1,2,3 *, David Reich 1,2 1 Department of Genetics, Harvard Medical School, Boston, Massachusetts, United

More information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1 Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the

More information

Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information

Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information Ohad Manor 1,2, Eran Segal 1,2 * 1 Department of Computer Science and Applied Mathematics, Weizmann Institute

More information

Haplotypes, linkage disequilibrium, and the HapMap

Haplotypes, linkage disequilibrium, and the HapMap Haplotypes, linkage disequilibrium, and the HapMap Jeffrey Barrett Boulder, 2009 LD & HapMap Boulder, 2009 1 / 29 Outline 1 Haplotypes 2 Linkage disequilibrium 3 HapMap 4 Tag SNPs LD & HapMap Boulder,

More information

Lab 1: A review of linear models

Lab 1: A review of linear models Lab 1: A review of linear models The purpose of this lab is to help you review basic statistical methods in linear models and understanding the implementation of these methods in R. In general, we need

More information

Understanding genetic association studies. Peter Kamerman

Understanding genetic association studies. Peter Kamerman Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -

More information

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011 EPIB 668 Genetic association studies Aurélie LABBE - Winter 2011 1 / 71 OUTLINE Linkage vs association Linkage disequilibrium Case control studies Family-based association 2 / 71 RECAP ON GENETIC VARIANTS

More information

Statistical Tools for Predicting Ancestry from Genetic Data

Statistical Tools for Predicting Ancestry from Genetic Data Statistical Tools for Predicting Ancestry from Genetic Data Timothy Thornton Department of Biostatistics University of Washington March 1, 2015 1 / 33 Basic Genetic Terminology A gene is the most fundamental

More information

Population Genetics II. Bio

Population Genetics II. Bio Population Genetics II. Bio5488-2016 Don Conrad dconrad@genetics.wustl.edu Agenda Population Genetic Inference Mutation Selection Recombination The Coalescent Process ACTT T G C G ACGT ACGT ACTT ACTT AGTT

More information

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY

AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY AN EVALUATION OF POWER TO DETECT LOW-FREQUENCY VARIANT ASSOCIATIONS USING ALLELE-MATCHING TESTS THAT ACCOUNT FOR UNCERTAINTY E. ZEGGINI and J.L. ASIMIT Wellcome Trust Sanger Institute, Hinxton, CB10 1HH,

More information

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros DNA Collection Data Quality Control Suzanne M. Leal Baylor College of Medicine sleal@bcm.edu Copyrighted S.M. Leal 2016 Blood samples For unlimited supply of DNA Transformed cell lines Buccal Swabs Small

More information

SNPs - GWAS - eqtls. Sebastian Schmeier

SNPs - GWAS - eqtls. Sebastian Schmeier SNPs - GWAS - eqtls s.schmeier@gmail.com http://sschmeier.github.io/bioinf-workshop/ 17.08.2015 Overview Single nucleotide polymorphism (refresh) SNPs effect on genes (refresh) Genome-wide association

More information

Summary. Introduction

Summary. Introduction doi: 10.1111/j.1469-1809.2006.00305.x Variation of Estimates of SNP and Haplotype Diversity and Linkage Disequilibrium in Samples from the Same Population Due to Experimental and Evolutionary Sample Size

More information

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip

Designing Genome-Wide Association Studies: Sample Size, Power, Imputation, and the Choice of Genotyping Chip : Sample Size, Power, Imputation, and the Choice of Genotyping Chip Chris C. A. Spencer., Zhan Su., Peter Donnelly ", Jonathan Marchini " * Department of Statistics, University of Oxford, Oxford, United

More information

Supplementary Methods Illumina Genome-Wide Genotyping Single SNP and Microsatellite Genotyping. Supplementary Table 4a Supplementary Table 4b

Supplementary Methods Illumina Genome-Wide Genotyping Single SNP and Microsatellite Genotyping. Supplementary Table 4a Supplementary Table 4b Supplementary Methods Illumina Genome-Wide Genotyping All Icelandic case- and control-samples were assayed with the Infinium HumanHap300 SNP chips (Illumina, SanDiego, CA, USA), containing 317,503 haplotype

More information

Supplementary Methods 2. Supplementary Table 1: Bottleneck modeling estimates 5

Supplementary Methods 2. Supplementary Table 1: Bottleneck modeling estimates 5 Supplementary Information Accelerated genetic drift on chromosome X during the human dispersal out of Africa Keinan A, Mullikin JC, Patterson N, and Reich D Supplementary Methods 2 Supplementary Table

More information

A genome wide association study of metabolic traits in human urine

A genome wide association study of metabolic traits in human urine Supplementary material for A genome wide association study of metabolic traits in human urine Suhre et al. CONTENTS SUPPLEMENTARY FIGURES Supplementary Figure 1: Regional association plots surrounding

More information

H3A - Genome-Wide Association testing SOP

H3A - Genome-Wide Association testing SOP H3A - Genome-Wide Association testing SOP Introduction File format Strand errors Sample quality control Marker quality control Batch effects Population stratification Association testing Replication Meta

More information

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Introduction to Add Health GWAS Data Part I Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill Outline Introduction to genome-wide association studies (GWAS) Research

More information

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia

Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Kevin Galinsky Harvard T. H. Chan School of Public Health American Society

More information

Nature Genetics: doi: /ng.3254

Nature Genetics: doi: /ng.3254 Supplementary Figure 1 Comparing the inferred histories of the stairway plot and the PSMC method using simulated samples based on five models. (a) PSMC sim-1 model. (b) PSMC sim-2 model. (c) PSMC sim-3

More information

High-density SNP Genotyping Analysis of Broiler Breeding Lines

High-density SNP Genotyping Analysis of Broiler Breeding Lines Animal Industry Report AS 653 ASL R2219 2007 High-density SNP Genotyping Analysis of Broiler Breeding Lines Abebe T. Hassen Jack C.M. Dekkers Susan J. Lamont Rohan L. Fernando Santiago Avendano Aviagen

More information

Genotype Prediction with SVMs

Genotype Prediction with SVMs Genotype Prediction with SVMs Nicholas Johnson December 12, 2008 1 Summary A tuned SVM appears competitive with the FastPhase HMM (Stephens and Scheet, 2006), which is the current state of the art in genotype

More information

Office Hours. We will try to find a time

Office Hours.   We will try to find a time Office Hours We will try to find a time If you haven t done so yet, please mark times when you are available at: https://tinyurl.com/666-office-hours Thanks! Hardy Weinberg Equilibrium Biostatistics 666

More information

ARTICLE. Ke Hao 1, Cheng Li 1,2, Carsten Rosenow 3 and Wing H Wong*,1,4

ARTICLE. Ke Hao 1, Cheng Li 1,2, Carsten Rosenow 3 and Wing H Wong*,1,4 (2004) 12, 1001 1006 & 2004 Nature Publishing Group All rights reserved 1018-4813/04 $30.00 www.nature.com/ejhg ARTICLE Detect and adjust for population stratification in population-based association study

More information

Comparative eqtl analyses within and between seven tissue types suggest mechanisms underlying cell type specificity of eqtls

Comparative eqtl analyses within and between seven tissue types suggest mechanisms underlying cell type specificity of eqtls Comparative eqtl analyses within and between seven tissue types suggest mechanisms underlying cell type specificity of eqtls, Duke University Christopher D Brown, University of Pennsylvania November 9th,

More information

Computational Workflows for Genome-Wide Association Study: I

Computational Workflows for Genome-Wide Association Study: I Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases

More information

Experimental Design and Sample Size Requirement for QTL Mapping

Experimental Design and Sample Size Requirement for QTL Mapping Experimental Design and Sample Size Requirement for QTL Mapping Zhao-Bang Zeng Bioinformatics Research Center Departments of Statistics and Genetics North Carolina State University zeng@stat.ncsu.edu 1

More information

Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge Potential of human genome sequencing Paul Pharoah Reader in Cancer Epidemiology University of Cambridge Key considerations Strength of association Exposure Genetic model Outcome Quantitative trait Binary

More information

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study J.M. Comeron, M. Kreitman, F.M. De La Vega Pacific Symposium on Biocomputing 8:478-489(23)

More information

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls BMI/CS 776 www.biostat.wisc.edu/bmi776/ Colin Dewey cdewey@biostat.wisc.edu Spring 2012 1. Understanding Human Genetic Variation

More information

Analysis of genome-wide genotype data

Analysis of genome-wide genotype data Analysis of genome-wide genotype data Acknowledgement: Several slides based on a lecture course given by Jonathan Marchini & Chris Spencer, Cape Town 2007 Introduction & definitions - Allele: A version

More information

Lecture 6: GWAS in Samples with Structure. Summer Institute in Statistical Genetics 2015

Lecture 6: GWAS in Samples with Structure. Summer Institute in Statistical Genetics 2015 Lecture 6: GWAS in Samples with Structure Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 25 Introduction Genetic association studies are widely used for the identification

More information

Population stratification. Background & PLINK practical

Population stratification. Background & PLINK practical Population stratification Background & PLINK practical Variation between, within populations Any two humans differ ~0.1% of their genome (1 in ~1000bp) ~8% of this variation is accounted for by the major

More information

Exploring the Genetic Basis of Congenital Heart Defects

Exploring the Genetic Basis of Congenital Heart Defects Exploring the Genetic Basis of Congenital Heart Defects Sanjay Siddhanti Jordan Hannel Vineeth Gangaram szsiddh@stanford.edu jfhannel@stanford.edu vineethg@stanford.edu 1 Introduction The Human Genome

More information

Linking Genetic Variation to Important Phenotypes

Linking Genetic Variation to Important Phenotypes Linking Genetic Variation to Important Phenotypes BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under

More information

QTL Mapping Using Multiple Markers Simultaneously

QTL Mapping Using Multiple Markers Simultaneously SCI-PUBLICATIONS Author Manuscript American Journal of Agricultural and Biological Science (3): 195-01, 007 ISSN 1557-4989 007 Science Publications QTL Mapping Using Multiple Markers Simultaneously D.

More information

ESTIMATING GENETIC VARIABILITY WITH RESTRICTION ENDONUCLEASES RICHARD R. HUDSON1

ESTIMATING GENETIC VARIABILITY WITH RESTRICTION ENDONUCLEASES RICHARD R. HUDSON1 ESTIMATING GENETIC VARIABILITY WITH RESTRICTION ENDONUCLEASES RICHARD R. HUDSON1 Department of Biology, University of Pennsylvania, Philadelphia, Pennsylvania 19104 Manuscript received September 8, 1981

More information

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Genotype calling Genotyping methods for Affymetrix arrays Genotyping

More information

Genetics of human gene expression: mapping DNA variants that influence gene expression

Genetics of human gene expression: mapping DNA variants that influence gene expression Nature Reviews Genetics AOP, published online 28 July 2009; doi:10.1038/nrg2630 REVIEWS Genetics of human gene expression: mapping DNA variants that influence gene expression Vivian G. Cheung* and Richard

More information

Genome Wide Association Studies

Genome Wide Association Studies Genome Wide Association Studies Liz Speliotes M.D., Ph.D., M.P.H. Instructor of Medicine and Gastroenterology Massachusetts General Hospital Harvard Medical School Fellow Broad Institute Outline Introduction

More information

Genetic Analysis of Human Traits In Vitro: Drug Response and Gene. Expression in Lymphoblastoid Cell Lines

Genetic Analysis of Human Traits In Vitro: Drug Response and Gene. Expression in Lymphoblastoid Cell Lines Genetic Analysis of Human Traits In Vitro: Drug Response and Gene Expression in Lymphoblastoid Cell Lines The Harvard community has made this article openly available. Please share how this access benefits

More information

Algorithms for Genetics: Introduction, and sources of variation

Algorithms for Genetics: Introduction, and sources of variation Algorithms for Genetics: Introduction, and sources of variation Scribe: David Dean Instructor: Vineet Bafna 1 Terms Genotype: the genetic makeup of an individual. For example, we may refer to an individual

More information

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population.

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population. Supplementary Text eqtl mapping in the Bay x Sha recombinant population. Expression levels for 24,576 traits (Gene-specific Sequence Tags: GSTs, CATMA array version 2) was measured in RNA extracted from

More information

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer Multi-SNP Models for Fine-Mapping Studies: Application to an association study of the Kallikrein Region and Prostate Cancer November 11, 2014 Contents Background 1 Background 2 3 4 5 6 Study Motivation

More information

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public

More information

Huijuan Feng, Shining Ma,Chao Ye & Zhixing Feng

Huijuan Feng, Shining Ma,Chao Ye & Zhixing Feng Huijuan Feng, Shining Ma,Chao Ye & Zhixing Feng Background-Author introduction Research interest: Methods for gene mapping of complex traits Inference of population structure from genetic data Genome variation

More information

http://genemapping.org/ Epistasis in Association Studies David Evans Law of Independent Assortment Biological Epistasis Bateson (99) a masking effect whereby a variant or allele at one locus prevents

More information

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of

More information

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA. QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA www.biostat.jhsph.edu/ kbroman Outline Experiments, data, and goals Models ANOVA at marker

More information

Introduction to Quantitative Genomics / Genetics

Introduction to Quantitative Genomics / Genetics Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current

More information

Genetics of dairy production

Genetics of dairy production Genetics of dairy production E-learning course from ESA Charlotte DEZETTER ZBO101R11550 Table of contents I - Genetics of dairy production 3 1. Learning objectives... 3 2. Review of Mendelian genetics...

More information

PLINK gplink Haploview

PLINK gplink Haploview PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,

More information

Genetic data concepts and tests

Genetic data concepts and tests Genetic data concepts and tests Cavan Reilly September 21, 2018 Table of contents Overview Linkage disequilibrium Quantifying LD Heatmap for LD Hardy-Weinberg equilibrium Genotyping errors Population substructure

More information

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA. QTL mapping in mice Karl W Broman Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA www.biostat.jhsph.edu/ kbroman Outline Experiments, data, and goals Models ANOVA at marker

More information

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls

Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls Linking Genetic Variation to Important Phenotypes: SNPs, CNVs, GWAS, and eqtls BMI/CS 776 www.biostat.wisc.edu/bmi776/ Mark Craven craven@biostat.wisc.edu Spring 2011 1. Understanding Human Genetic Variation!

More information

Questions we are addressing. Hardy-Weinberg Theorem

Questions we are addressing. Hardy-Weinberg Theorem Factors causing genotype frequency changes or evolutionary principles Selection = variation in fitness; heritable Mutation = change in DNA of genes Migration = movement of genes across populations Vectors

More information

DETERMINING SIGNIFICANT FOLD DIFFERENCES IN GENE EXPRESSION ANALYSIS

DETERMINING SIGNIFICANT FOLD DIFFERENCES IN GENE EXPRESSION ANALYSIS DETERMINING SIGNIFICANT FOLD DIFFERENCES IN GENE EXPRESSION ANALYSIS A. J. BUTTE 1, J. YE, G. NIEDERFELLNER 3, K. RETT 3, H. U. HÄRING 3, M. F. WHITE, I. S. KOHANE 1 1 Children s Hospital Informatics Program,

More information

Statistical Methods for Quantitative Trait Loci (QTL) Mapping

Statistical Methods for Quantitative Trait Loci (QTL) Mapping Statistical Methods for Quantitative Trait Loci (QTL) Mapping Lectures 4 Oct 10, 011 CSE 57 Computational Biology, Fall 011 Instructor: Su-In Lee TA: Christopher Miles Monday & Wednesday 1:00-1:0 Johnson

More information

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Supplementary information

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Supplementary information Fast and accurate genotype imputation in genome-wide association studies through pre-phasing Supplementary information Bryan Howie 1,6, Christian Fuchsberger 2,6, Matthew Stephens 1,3, Jonathan Marchini

More information

Statistical Methods for Network Analysis of Biological Data

Statistical Methods for Network Analysis of Biological Data The Protein Interaction Workshop, 8 12 June 2015, IMS Statistical Methods for Network Analysis of Biological Data Minghua Deng, dengmh@pku.edu.cn School of Mathematical Sciences Center for Quantitative

More information

heritability problem Krishna Kumar et al. (1) claim that GCTA applied to current SNP

heritability problem Krishna Kumar et al. (1) claim that GCTA applied to current SNP Commentary on Limitations of GCTA as a solution to the missing heritability problem Jian Yang 1,2, S. Hong Lee 3, Naomi R. Wray 1, Michael E. Goddard 4,5, Peter M. Visscher 1,2 1. Queensland Brain Institute,

More information

Crash-course in genomics

Crash-course in genomics Crash-course in genomics Molecular biology : How does the genome code for function? Genetics: How is the genome passed on from parent to child? Genetic variation: How does the genome change when it is

More information

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573

Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Mark J. Rieder Department of Genome Sciences mrieder@u.washington washington.edu Epidemiology Studies Cohort Outcome Model to fit/explain

More information

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by

Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by Appendix 5: Details of statistical methods in the CRP CHD Genetics Collaboration (CCGC) [posted as supplied by author] Statistical methods: All hypothesis tests were conducted using two-sided P-values

More information

Supplementary Figure 1. Study design of a multi-stage GWAS of gout.

Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 2. Plot of the first two principal components from the analysis of the genome-wide study (after QC) combined with

More information

Phasing of 2-SNP Genotypes based on Non-Random Mating Model

Phasing of 2-SNP Genotypes based on Non-Random Mating Model Phasing of 2-SNP Genotypes based on Non-Random Mating Model Dumitru Brinza and Alexander Zelikovsky Department of Computer Science, Georgia State University, Atlanta, GA 30303 {dima,alexz}@cs.gsu.edu Abstract.

More information

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci SUPPLEMENTARY METHODS AND DATA General strategy for identifying deletion loci We used two approaches for identifying potentially suitable deletion loci for PDP-FISH analysis. In the first approach, we

More information

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs (3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable

More information

Leveraging admixture analysis to resolve missing and crosspopulation. Noah Zaitlen

Leveraging admixture analysis to resolve missing and crosspopulation. Noah Zaitlen Leveraging admixture analysis to resolve missing and crosspopulation heritability. Noah Zaitlen Outline Background Yes, those three slides again. Local-ancestry based heritability estimation Cross-Population

More information

Joint Study of Genetic Regulators for Expression

Joint Study of Genetic Regulators for Expression Joint Study of Genetic Regulators for Expression Traits Related to Breast Cancer Tian Zheng 1, Shuang Wang 2, Lei Cong 1, Yuejing Ding 1, Iuliana Ionita-Laza 3, and Shaw-Hwa Lo 1 1 Department of Statistics,

More information

A Bayesian Framework to Account for Complex Non- Genetic Factors in Gene Expression Levels Greatly Increases Power in eqtl Studies

A Bayesian Framework to Account for Complex Non- Genetic Factors in Gene Expression Levels Greatly Increases Power in eqtl Studies A Bayesian Framework to Account for Complex Non- Genetic Factors in Gene Expression Levels Greatly Increases Power in eqtl Studies Oliver Stegle 1,2 *., Leopold Parts 3., Richard Durbin 3, John Winn 4

More information

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012 Lecture 23: Causes and Consequences of Linkage Disequilibrium November 16, 2012 Last Time Signatures of selection based on synonymous and nonsynonymous substitutions Multiple loci and independent segregation

More information

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey

Genome-Wide Association Studies. Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey Genome-Wide Association Studies Ryan Collins, Gerissa Fowler, Sean Gamberg, Josselyn Hudasek & Victoria Mackey Introduction The next big advancement in the field of genetics after the Human Genome Project

More information

Association studies (Linkage disequilibrium)

Association studies (Linkage disequilibrium) Positional cloning: statistical approaches to gene mapping, i.e. locating genes on the genome Linkage analysis Association studies (Linkage disequilibrium) Linkage analysis Uses a genetic marker map (a

More information

Haplotype Association Mapping by Density-Based Clustering in Case-Control Studies (Work-in-Progress)

Haplotype Association Mapping by Density-Based Clustering in Case-Control Studies (Work-in-Progress) Haplotype Association Mapping by Density-Based Clustering in Case-Control Studies (Work-in-Progress) Jing Li 1 and Tao Jiang 1,2 1 Department of Computer Science and Engineering, University of California

More information

Quantitative Genetics

Quantitative Genetics Quantitative Genetics Polygenic traits Quantitative Genetics 1. Controlled by several to many genes 2. Continuous variation more variation not as easily characterized into classes; individuals fall into

More information

Themes. Homo erectus. Jin and Su, Nature Reviews Genetics (2000)

Themes. Homo erectus. Jin and Su, Nature Reviews Genetics (2000) HC70A & SAS70A Winter 2009 Genetic Engineering in Medicine, Agriculture, and Law Tracking Human Ancestry Professor John Novembre Themes Global patterns of human genetic diversity Tracing our ancient ancestry

More information

Improving the power of GWAS and avoiding confounding from. population stratification with PC-Select

Improving the power of GWAS and avoiding confounding from. population stratification with PC-Select Genetics: Early Online, published on April 9, 014 as 10.1534/genetics.114.16485 Improving the power of GWAS and avoiding confounding from population stratification with PC-Select George Tucker, Alkes L

More information

Genome-Wide Association Studies (GWAS): Computational Them

Genome-Wide Association Studies (GWAS): Computational Them Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus

More information

QTL Mapping, MAS, and Genomic Selection

QTL Mapping, MAS, and Genomic Selection QTL Mapping, MAS, and Genomic Selection Dr. Ben Hayes Department of Primary Industries Victoria, Australia A short-course organized by Animal Breeding & Genetics Department of Animal Science Iowa State

More information

Why do we need statistics to study genetics and evolution?

Why do we need statistics to study genetics and evolution? Why do we need statistics to study genetics and evolution? 1. Mapping traits to the genome [Linkage maps (incl. QTLs), LOD] 2. Quantifying genetic basis of complex traits [Concordance, heritability] 3.

More information

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get

More information

Genome-wide association studies (GWAS) Part 1

Genome-wide association studies (GWAS) Part 1 Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations

More information

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk Summer Review 7 Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk Jian Zhou 1,2,3, Chandra L. Theesfeld 1, Kevin Yao 3, Kathleen M. Chen 3, Aaron K. Wong

More information

Digital RNA allelotyping reveals tissue-specific and allelespecific gene expression in human

Digital RNA allelotyping reveals tissue-specific and allelespecific gene expression in human nature methods Digital RNA allelotyping reveals tissue-specific and allelespecific gene expression in human Kun Zhang, Jin Billy Li, Yuan Gao, Dieter Egli, Bin Xie, Jie Deng, Zhe Li, Je-Hyuk Lee, John

More information

Analysis of Microarray Data

Analysis of Microarray Data Analysis of Microarray Data Lecture 1: Experimental Design and Data Normalization George Bell, Ph.D. Senior Bioinformatics Scientist Bioinformatics and Research Computing Whitehead Institute Outline Introduction

More information

Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing

Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing André R. de Vries a, Ilja M. Nolte b, Geert T. Spijker c, Dumitru Brinza d, Alexander Zelikovsky d,

More information

Allele specific expression: How George Casella made me a Bayesian. Lauren McIntyre University of Florida

Allele specific expression: How George Casella made me a Bayesian. Lauren McIntyre University of Florida Allele specific expression: How George Casella made me a Bayesian Lauren McIntyre University of Florida Acknowledgments NIH,NSF, UF EPI, UF Opportunity Fund Allele specific expression: what is it? The

More information

Integrative Genomics 1a. Introduction

Integrative Genomics 1a. Introduction 2016 Course Outline Integrative Genomics 1a. Introduction ggibson.gt@gmail.com http://www.cig.gatech.edu 1a. Experimental Design and Hypothesis Testing (GG) 1b. Normalization (GG) 2a. RNASeq (MI) 2b. Clustering

More information

Iterated Conditional Modes for Cross-Hybridization Compensation in DNA Microarray Data

Iterated Conditional Modes for Cross-Hybridization Compensation in DNA Microarray Data http://www.psi.toronto.edu Iterated Conditional Modes for Cross-Hybridization Compensation in DNA Microarray Data Jim C. Huang, Quaid D. Morris, Brendan J. Frey October 06, 2004 PSI TR 2004 031 Iterated

More information

Nature Genetics: doi: /ng.3143

Nature Genetics: doi: /ng.3143 Supplementary Figure 1 Quantile-quantile plot of the association P values obtained in the discovery sample collection. The two clear outlying SNPs indicated for follow-up assessment are rs6841458 and rs7765379.

More information

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm

The Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm The Whole Genome TagSNP Selection and Transferability Among HapMap Populations Reedik Magi, Lauris Kaplinski, and Maido Remm Pacific Symposium on Biocomputing 11:535-543(2006) THE WHOLE GENOME TAGSNP SELECTION

More information

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping - from Darwin's time onward, it has been widely recognized that natural populations harbor a considerably degree of genetic

More information

Linkage Disequilibrium. Adele Crane & Angela Taravella

Linkage Disequilibrium. Adele Crane & Angela Taravella Linkage Disequilibrium Adele Crane & Angela Taravella Overview Introduction to linkage disequilibrium (LD) Measuring LD Genetic & demographic factors shaping LD Model predictions and expected LD decay

More information

Human Genetics and Gene Mapping of Complex Traits

Human Genetics and Gene Mapping of Complex Traits Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2015 Human Genetics Series Thursday 4/02/15 Nancy L. Saccone, nlims@genetics.wustl.edu ancestral chromosome present day chromosomes:

More information