A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL. Abstracts

Similar documents
Multiple Linkage Disequilibrium Mapping Methods to Validate Additive Quantitative Trait Loci in Korean Native Cattle (Hanwoo)

QTL Mapping, MAS, and Genomic Selection

QTL Mapping Using Multiple Markers Simultaneously

Application of Whole-Genome Prediction Methods for Genome-Wide Association Studies: A Bayesian Approach

Introduction to Quantitative Genomics / Genetics

QTL Mapping, MAS, and Genomic Selection

EFFICIENT DESIGNS FOR FINE-MAPPING OF QUANTITATIVE TRAIT LOCI USING LINKAGE DISEQUILIBRIUM AND LINKAGE

Genomic Selection Using Low-Density Marker Panels

Effects of Marker Density, Number of Quantitative Trait Loci and Heritability of Trait on Genomic Selection Accuracy

Genomic Selection using low-density marker panels

Genomic Selection with Linear Models and Rank Aggregation

Whole genome analyses accounting for structures in genotype data

Strategy for Applying Genome-Wide Selection in Dairy Cattle

Genetics Effective Use of New and Existing Methods

The use of multi-breed reference populations and multi-omic data to maximize accuracy of genomic prediction

GenSap Meeting, June 13-14, Aarhus. Genomic Selection with QTL information Didier Boichard

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Traditional Genetic Improvement. Genetic variation is due to differences in DNA sequence. Adding DNA sequence data to traditional breeding.

The use of multi-breed reference populations and multi-omic data to maximize accuracy of genomic prediction

High-density SNP Genotyping Analysis of Broiler Breeding Lines

Optimizing Traditional and Marker Assisted Evaluation in Beef Cattle

Application of MAS in French dairy cattle. Guillaume F., Fritz S., Boichard D., Druet T.

Multi-SNP Models for Fine-Mapping Studies: Application to an. Kallikrein Region and Prostate Cancer

MULTIVARIATE GENOME-WIDE ASSOCIATION STUDIES AND GENOMIC PREDICTIONS IN MULTIPLE BREEDS AND CROSSBRED ANIMALS

EAAP : 29/08-02/ A L I M E N T A T I O N A G R I C U L T U R E E N V I R O N N E M E N T

The promise of genomics for animal improvement. Daniela Lourenco

Strategy for applying genome-wide selection in dairy cattle

Using RNAseq data to improve genomic selection in dairy cattle

Obiettivi e recenti acquisizioni del progetto di ricerca SelMol

Genome wide association and genomic selection to speed up genetic improvement for meat quality in Hanwoo

Understanding genomic selection in poultry breeding

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

Authors: Yumin Xiao. Supervisor: Xia Shen

BLUP and Genomic Selection

HISTORICAL LINGUISTICS AND MOLECULAR ANTHROPOLOGY

Yniv Palti USDA-ARS National Center for Cool and Cold Water Aquaculture, Leetown, WV, USA

Genomic selection applies to synthetic breeds

Fundamentals of Genomic Selection

Lecture 2: Height in Plants, Animals, and Humans. Michael Gore lecture notes Tucson Winter Institute version 18 Jan 2013

Association studies (Linkage disequilibrium)

Marker Assisted Selection Where, When, and How. Lecture 18

Genetics Selection Evolution. Open Access RESEARCH ARTICLE. Saeed Hassani 1,2, Mahdi Saatchi 2, Rohan L. Fernando 2 and Dorian J.

Technical Challenges to Implementation of Genomic Selection

Lecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012

MAS using a dense SNP markers map: Application to the Normande and Montbéliarde breeds

The basis of genetic relationships in the era of genomic selection

Genome-wide association study for feed efficiency traits using SNP and haplotype models

Question. In the last 100 years. What is Feed Efficiency? Genetics of Feed Efficiency and Applications for the Dairy Industry

Improvement of Association-based Gene Mapping Accuracy by Selecting High Rank Features

Sequencing Millions of Animals for Genomic Selection 2.0

Genome-Wide Association Studies (GWAS): Computational Them

Efficient genomic prediction based on whole genome sequence data using split and merge Bayesian variable selection

Gene Mapping in Natural Plant Populations Guilt by Association

Power and precision of QTL mapping in simulated multiple porcine F2 crosses using whole-genome sequence information

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah

Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Impact of Genotype Imputation on the Performance of GBLUP and Bayesian Methods for Genomic Prediction

Identifying Genes Underlying QTLs

Genomic Selection in Breeding Programs BIOL 509 November 26, 2013

Workshop on Data Science in Biomedicine

Implementation of Genomic Selection in Pig Breeding Programs

Genomic Selection in Dairy Cattle

Overview. Methods for gene mapping and haplotype analysis. Haplotypes. Outline. acatactacataacatacaatagat. aaatactacctaacctacaagagat

2010 Annual Research Report

Genetic dissection of complex traits, crop improvement through markerassisted selection, and genomic selection

A Dissertation presented to. the Faculty of the Graduate School. at the University of Missouri-Columbia. In Partial Fulfillment

Accuracy of whole genome prediction using a genetic architecture enhanced variance-covariance matrix

Accuracy and Training Population Design for Genomic Selection on Quantitative Traits in Elite North American Oats

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS

Analysis of genome-wide genotype data

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Impact of QTL minor allele frequency on genomic evaluation using real genotype data and simulated phenotypes in Japanese Black cattle

Computational Workflows for Genome-Wide Association Study: I

High density marker panels, SNPs prioritizing and accuracy of genomic selection

The Accuracy of Genomic Selection in Norwegian Red. Cattle Assessed by Cross Validation

QTL mapping in mice. Karl W Broman. Department of Biostatistics Johns Hopkins University Baltimore, Maryland, USA.

The what, why and how of Genomics for the beef cattle breeder

A Whole Genome Association Study to Detect Single Nucleotide Polymorphisms for Carcass Traits in Hanwoo Populations

1b. How do people differ genetically?

A combined linkage disequilibrium and cosegregation method for fine mapping of QTL and approaches to study the long-term accuracy of genomic selection

Course Announcements

Genomic Selection in Sheep Breeding Programs

Whole Genome-Assisted Selection. Alison Van Eenennaam, Ph.D. Cooperative Extension Specialist

Whole-Genome Genetic Data Simulation Based on Mutation-Drift Equilibrium Model

Marker-assisted selection in livestock case studies

General aspects of genome-wide association studies

Domestic animal genomes meet animal breeding translational biology in the ag-biotech sector. Jerry Taylor University of Missouri-Columbia

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?

Proceedings, The Range Beef Cow Symposium XXII Nov. 29, 30 and Dec. 1, 2011, Mitchell, NE. IMPLEMENTATION OF MARKER ASSISTED EPDs

Genome-wide association studies (GWAS) Part 1

Module 1 Principles of plant breeding

The effect of genomic information on optimal contribution selection in livestock breeding programs

Why do we need statistics to study genetics and evolution?

Practical integration of genomic selection in dairy cattle breeding schemes

Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk

Proceedings of the World Congress on Genetics Applied to Livestock Production 11.64

Comparison of genomic predictions using genomic relationship matrices built with different weighting factors to account for locus-specific variances

From Genotype to Phenotype

The Dimensionality of Genomic Information and Its Effect on Genomic Prediction

Transcription:

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS03) p.3947 A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL Li Yi 1,4, Jong-Joo Kim, Kwan-Suk Kim 3 1 School of Statistics, Shanxi University of Finance & Economics, Taiyuan, Shanxi, China School of Biotechnology, Yeungnam University, Gyeongsan, Gyeongbuk, Korea 3 Department of Animal Science,Chungbuk National University, Cheongju, Korea 4 Corresponding author: Li Yi, e-mail: liyi334450@hotmail.com Abstracts The efficiency of genome-wide association analysis (GWAS) depends on power of detection for quantitative trait loci (QTL) and precision for QTL mapping. In this study, three different strategies for GWAS were applied to detect QTL for carcass quality traits in the Korean cattle, Hanwoo; a linkage disequilibrium single locus regression (LDRM), a combined linkage and linkage disequilibrium analysis (LDVCM) and a BayesCπ approach. Phenotypes of 486 steers were collected for carcass weight, backfat thickness, longissimus dorsi muscle area, and marbling score, and genotype data were also scored with the Illumina bovine 50k SNP chips for the steers. For the former two methods, threshold values were set FDR <0.01 on chromosome-wise level, while a cut-off such that top 5 variance of 10-SNP windows that were included in the Bayes Cπ model were determined for the latter model. A total of 1 and 10 QTL were detected by LDRM and LDVCM respectively, and after BayesCπ refining four consistent QTL were obtained, which corresponded to QTL regions previously reported. Some well-known candidate genes for the traits of interest were located close to these QTL. Our result suggests the use of combined different LD mapping approaches can provide more reliable chromosome regions to further pinpoint DNA makers or causative genes in these regions. Key words: QTL, Linkage Disequilibrium, SNP, Hanwoo 1.Introduction The availability of genome-wide SNP panels enables detection of statistical associations between a trait and any SNP in terms of a genome-wide association study (GWAS), enhancing the possibility of mapping QTL across the genome. In the literature, several statistical methods have been suggested and applied (Bayesian vs. frequentist methods) (Sun et al.011;zhao et al.007; Cierco-Ayrolles et al.010; Erbe et al.011;uleberg et al.010). These methods are based on assumption that saturated marker have increased the feasibility of QTL detection and mapping using historical population-wide linkage disequilibrium, which requires a marker allele to be in LD with the QTL allele across the entire population. The term linkage disequilibrium seems to imply disequilibrium of two loci being in physical linkage, but disequilibrium can also exist between two physically unlinked loci. Such associations may lead to collinearity among these SNPs and may impair reliability of LD-based QTL mapping as they may generate significant signals far from the real QTL position. With such a large number of tests being performed, we expected a large number of false positives.

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS03) p.3948 The aim of this study was to minimize the chance of collinearity and overcome the tendency to find significant signals from a causal variant by using the strategy for multiple linkage disequilibrium mapping methods to validate additive QTL. With this approach, Significant results were set FDR <0.01 on chromosome-wise level for the marker-trait association (LDRM) to be regarded as significant. Furthermore, we can use information from linkage to filter spurious associations in the GWAS and further refine intervals containing QTL by using combined linkage disequilibrium and linkage (LDVCM). Finally, we demonstrate this by refining the QTL position for predicting the effects of the SNPs which fits all the SNPs simultaneously (BayesCπ)..Methods Three complementary approaches were used: (i) linkage disequilibrium single locus analysis using the snp_a option of Qxpak (5.03 version) software(perez-enciso and Misztal 011); (ii) combined linkage disequilibrium and linkage analysis using LDVCM; (iii) Bayesian analysis using the BayesCπ option of GenSel (http://bigs.ansci.iastate.edu/bigsgui/login.html). LDRM First, linkage disequilibrium single locus regression (Zhao et al.007;grapes et al.004)were performed. Where y y X Z u e u is the phenotypic record, is the average phenotypic performance, X is the design matrix of SNP genotype (e.g. individuals with marker genotypes 11, 1 and are assumed to have genetic values μ kaa, 0 and μ kbb ), a is the fixed substitution effect for the SNP, Z u is the incidence matrix for animal effects, u is the infinitesimal genetic effect, which is distributed as N(0,A u ) (the numerator relationship matrix A and the additive genetic variance u ) and e is a random residual for animal i, which is distributed as N(0, I e ) (the identity matrix I and residual variance e ). Likelihood ratio tests were performed by removing the single locus SNP genotypic effects, and P-values were obtained assuming a distribution of the likelihood ratio test with one degree of freedom. Association with false discovery rate <0.01 on chromosome-wise level were considered significant. LDLA The LDVCM software has been essentially described (Kim and Georges 00; Blott et al.003). Briefly, the linkage phases of all sires and sons were determined by the approach of Druet and Georges (010). Then, identity-by-descent(ibd) probabilities( ) at the midpoint of each SNP interval (p) were computed for all pairs of haplotypes conditional on the identity-by-state status of flanking markers (Meuwissen and Goddard, 001). A dendrogram was generated by using the unweighted pair group method with arithmetic mean (UPGMA) hierarchical clustering algorithm with 1- as the distance measure at QTL location (p).starting at the ancestral node and sequentially descending into the dendrogram, all possible combinations of haplotype clusters were analyzed in place of individual haplotypes. This process identified the set of nodes at which the likelihood of the data were p p

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS03) p.3949 maximized. To jointly exploit linkage disequilibrium (female meiosis) and linkage (male meiosis) information, the following mixed linear was used: y Z h Z u e h Where h is the vector of random QTL effects corresponding to the defined haplotype clusters. Z h is an incidence matrix relating maternal haplotypes of sons and sire haplotypes to individual sons. Likelihood ratio tests were performed by removing the haplotype cluster u effects, and P-values were obtained assuming a distribution of the likelihood ratio test with one degree of freedom. Association with FDR <0.01on chromosome-wise level were considered significant. BayesCπ BayesCπ were developed from the BayesB and GBLUP approaches (Meuwissen et al.001) and was described in detail by Habier et al.(011). K y X e j 1 j j j Where K is the number of SNPs, X j is the vector of genotypes at SNP j, a j is the random substitution effect for the SNP j, which condition on the variance, is assumed normally distributed N(0, ) when δ j =1, while =0, when δ j =0, is a random 0/1 variable indicating the absence (with probability π) or presence (with probability 1-π) of SNP j in the model. The prior for π was treated as unknown with uniform (0, 1).Gibbs sampling was applied to calculate the posterior means of model parameters, a j, e,, and π. The MCMC algorithms were run for 50,000 samples, with the first 0,000 samples discarded as burn in. A window size of 10 was used and the variance of each 10-SNP window was used as the criterion to detect QTL. Several windows that shared the same SNP with a large effect were considered to identify the same QTL region. Within each region, because windows were overlapping, the window with the highest variance of GEBV was used and the SNP within this window that explained the largest proportion of genetic variance was used to denote the position of the QTL. To select which positions should be major QTL, we first ranked them by estimated variance. We then chose a cut-off such that top5 QTL were detected. Information about particular genes, located near SNP significantly associated with each trait, was extracted from online sources (http://www.ensembl.org/index.html, http://www.genecards.org/cgi-bin/cardsearch.pl#top, and http://www.uniprot.org). 3. Application to the carcass quality in Hanwoo We applied this strategy to the carcass quality in Hanwoo (486 steers, Six traits and 39,506 SNPs).The 17 QTL detected by LDRM or LDVCM were integrated using the BayesCπ to remove possible redundancy among those QTL. Four QTL from these three methods had high concordance including 64.1-64.9Mb for BTA7 for WWT, 4.3-5.4Mb for BTA14 for CWT, 0.5-1.5Mb for BTA6 for BFT and 6.3-33.4Mb for BTA9 for BFT (Figure 1). The focus of our study was the detection of major QTL position rather than the precise estimation of their

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS03) p.3950 effects. Previous work has shown that these three methods belong to three distinct analytical, i.e. LDRM test the single marker at a time and regards the markers as independent of all other markers (Zhao et al.007; Grapes et al.004; MacLeod et al.010), LDVCM test the mid-point of the marker brackets, which corresponded best when the QTL was masked between analyzed markers (Kim and Georges 00; Blott et al.003), and BayesCπ test the effects of all markers are fitted simultaneously. However, QTL with large effects can be detected by both the bayesian shrinkage and linear regression mapping methods. By specifying proper prior distributions for SNP effects, the ignorable small SNP effects are coerced to zero and only SNPs with larger effects on phenotype are fitted in the model, hence Bayesian shrinkage analysis could reduce possible spurious QTL effects by adjusting all other QTL effects. This was also explained by Xu (003) and Sun et al. (011). Therefore, here we attempted to minimize the number of false positives by combined these three methods result. The establishment of a threshold is a complicated issue and has a profound influence on the results, especially when methods with different nature of scores are compared. False discovery rate derived threshold takes into account the number of tests that are performed as well as how significant one test is relative to the others in multiple comparison procedures. Two levels of significant controls were used in LDRM and LDVCM analysis based on genome-wise (FDR<0.05) and chromosome-wise (FDR<0.01) type I errors, which were computed for all SNP(Fernando et al.004). Usually, significance tests are not required in Bayesian analysis; only frequentists emphasize significance tests. However, to facilitate comparison with other methods, we accumulated the effects of adjacent SNPs together into a genomic window and then chose a cut-off such that top 5 as the criterion to detect QTL in BayesCπ. Figure 1 clearly shows the major peaks in BayesCπ coincide with LDRM and LDVCM. Therefore, These present findings together with those of Sun et al.(011) suggest that the genetic variances estimated by BayesCπ including all markers without polygenic effects can exploit the gene discovery knowledge generated. Our Study uncovered successfully many variants related to QTLs. However, the traits YWT, CWT and BFT measured in Hanwoo have QTL with larger additive effects than WWT, LMA and Marb. Several explanations have been proposed for our scant outcomes from genetic markers. First, the currently identified SNPs might not fully describe genetic diversity. For instance, these SNPs may not capture some forms of genetic variability that are due to copy number variation. Second, genetic mechanisms might involve complex interactions among genes and between genes and environmental conditions, or epigenetic mechanisms which are not fully captured by additive models. However, opportunities may exist for improving predictions by exploiting additive genetic variation. A third explanation the one we focus on here lies in the limitations posed by the genetic models and statistical methods. 4. Conclusions In this work, no linkage disequilibrium mapping method tested in this study outperformed the others, but it is interesting to that this position of the SNP selected by the different models (LDRM, LDVCM, BayesCπ) were close. Some well-known candidate genes for the traits of interest were located close to these QTL. We suggest the use of combined different LD mapping approaches can provide more reliable chromosome regions to further pinpoint DNA makers or causative genes in these regions.

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS03) p.3951 5. Acknowledgements This research was supported by Shanxi Scholarship Council of China(013-07). Figure 1.QTL profile (upper) and annotation (lower) of the detected SNPs associated with BFT on BTA6, WWT on BTA7, CWT on BTA14 and BFT on BTA9

Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS03) p.395 References Blott, S., et al. (003), "Molecular Dissection of a Quantitative Trait Locus: A Phenylalanine-to-Tyrosine Substitution in the Transmembrane Domain of the Bovine Growth Hormone Receptor Is Associated with a Major Effect on Milk Yield and Composition," Genetics, 163, 53-66. Cierco-Ayrolles, C., et al. (010), "Does Probabilistic Modelling of Linkage Disequilibrium Evolution Improve the Accuracy of Qtl Location in Animal Pedigree?," Genet Sel Evol, 4, 38. Druet, T., and Georges, M. (010), "A Hidden Markov Model Combining Linkage and Linkage Disequilibrium Information for Haplotype Reconstruction and Quantitative Trait Locus Fine Mapping," Genetics, 184, 789-798. Erbe, M., Ytournel, F., Pimentel, E. C., Sharifi, A. R., and Simianer, H. (011), "Power and Robustness of Three Whole Genome Association Mapping Approaches in Selected Populations," J Anim Breed Genet, 18, 3-14. Fernando, R. L., et al. (004), "Controlling the Proportion of False Positives in Multiple Dependent Tests," Genetics, 166, 611-619. Grapes, L., Dekkers, J. C., Rothschild, M. F., and Fernando, R. L. (004), "Comparing Linkage Disequilibrium-Based Methods for Fine Mapping Quantitative Trait Loci," Genetics, 166, 1561-1570. Habier, D., Fernando, R. L., Kizilkaya, K., and Garrick, D. J. (011), "Extension of the Bayesian Alphabet for Genomic Selection," BMC Bioinformatics, 1, 186. Kim, J. J., and Georges, M. (00), "Evaluation of a New Fine-Mapping Method Exploiting Linkage Disequilibrium: A Case Study Analysing a Qtl with Major Effect on Milk Composition on Bovine Chromosome 14," Asian-Australasian Journal of Animal Sciences, 15, 150-156. MacLeod, I. M., et al. (010), "Power of a Genome Scan to Detect and Locate Quantitative Trait Loci in Cattle Using Dense Single Nucleotide Polymorphisms," J Anim Breed Genet, 17, 133-14. Meuwissen, T. H., and Goddard, M. E. (001), "Prediction of Identity by Descent Probabilities from Marker-Haplotypes," Genet Sel Evol, 33, 605-634. Meuwissen, T. H., Hayes, B. J., and Goddard, M. E. (001), "Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps," Genetics, 157, 1819-189. Perez-Enciso, M., and Misztal, I. (011), "Qxpak.5: Old Mixed Model Solutions for New Genomics Problems," BMC Bioinformatics, 1, 0. Sun, X., Habier, D., Fernando, R. L., Garrick, D. J., and Dekkers, J. C. (011), "Genomic Breeding Value Prediction and Qtl Mapping of Qtlmas010 Data Using Bayesian Methods," BMC Proc, 5 Suppl 3, S13. Uleberg, E., and Meuwissen, T. H. (010), "Fine Mapping and Detection of the Causative Mutation Underlying Quantitative Trait Loci," J Anim Breed Genet, 17, 404-410. Xu, S. (003), "Estimating Polygenic Effects Using Markers of the Entire Genome," Genetics, 163, 789-801. Zhao, H. H., Fernando, R. L., and Dekkers, J. C. (007), "Power and Precision of Alternate Methods for Linkage Disequilibrium Mapping of Quantitative Trait Loci," Genetics, 175, 1975-1986.