Haplotyping and imputation provide novel sources for innovative breeding strategies beyond genomic selection

Size: px
Start display at page:

Download "Haplotyping and imputation provide novel sources for innovative breeding strategies beyond genomic selection"

Transcription

1 Aus dem Institut für Tierzucht und Tierhaltung der Agrar- und Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel Haplotyping and imputation provide novel sources for innovative breeding strategies beyond genomic selection Dissertation zur Erlangung des Doktorgrades der Agrar- und Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel vorgelegt von M.Sc. agr. Dierck Segelke aus Darmstadt Kiel, 2015 Dekan: Prof. Dr. Eberhard Hartung 1. Berichterstatter: Prof. Dr. Georg Thaller 2. Berichterstatter: Prof. Dr. Bernt Guldbrandtsen Tag der mündlichen Prüfung: 8. Juli 2015 Die Dissertation wurde mit dankenswerter finanzieller Unterstützung des Förderverein Bioökonomieforschung e.v. (FBF) angefertigt.

2 Gedruckt mit der Genehmigung der Agrar- und Ernährungswissenschaftlichen Fakultät der Christian-Albrechts-Universität zu Kiel

3 Für meine Kinder

4 Table of Contents General Introduction 5 Chapter 1 12 Reliability of genomic prediction for German Holsteins using imputed genotypes from low density chips Chapter 2 32 Chancen und Grenzen der Hornloszucht für die Rasse Deutsche Holstein Chapter 3 61 Considering genetic characteristics in German Holstein breeding programs Chapter 4 82 Prediction of expected genetic variation within groups of offspring for innovative mating schemes General Discussion 109 General Summary 125 Zusammenfassung 127

5 General Introduction Within the last years worldwide animal and plant breeding schemes were substantially improved with the introduction of genomically enhanced breeding values. Meuwissen et al. (2001) were the first, demonstrating the feasibility of the estimation of these genomically enhanced breeding values based on around 50,000 markers and a limited number of phenotypes because of the linkage disequilibrium (LD) between quantitative trait loci and surrounding markers. Genomic breeding value estimation bases upon prediction equations derived from a reference population with genotype and phenotype information. Equations are applied to genotyped individuals or even embryos without the availability of phenotype information (Bouquet and Juga, 2013; Meuwissen et al., 2001). Calculations by Schaeffer (2006) supported the conclusion of Meuwissen et al. (2001) that selection decisions based on genomically enhanced breeding values can substantially increase genetic gain in animals and plants and will shorten generation interval. The availability of a whole genome assembly of cattle (Zimin et al., 2009) and simultaneous improvements in the single nucleotide polymorphism (SNP) array technologies enables a high throughput genotyping and permits a practical implementation of the Meuwissen et al. (2001) paper for cattle. With the use of the Illumina BovineSNP50 chip (including ~54,000 SNPs; Matukumalli et al., 2009) many countries implemented genome enabled breeding value estimation for especially Holstein cattle within a short period of time (Hayes et al., 2009; Reinhardt et al., 2009; VanDoormal et al., 2009; VanRaden et al., 2009). The reliabilities of all routinely estimated complex traits could be increased by using genomic breeding values compared to the pedigree index (Liu et al., 2011). The reliability gain makes bull testing schemes obsolete. This decreases the overall breeding schemes costs and artificial insemination companies successfully implement genomic selection in their breeding programs. High throughput genotyping platforms produce numerous missing calls (Fu et al., 2009). Human geneticists introduced methods which can statistically estimate missing genotypes caused by the genotyping platforms based on the correlation between SNPs within a LD block (so called imputation) (Scheet and Stephens, 2006). In 2008, Kong et al. published a method to impute long haplotypes across multiple LD blocks with the use of the genotype information from relatives. Afterwards different imputation and phasing methods and software tools were developed mainly 5

6 based on either the use of family information, the utilization of LD information or a combination of both (reviewed by Calus et al., 2014). These software tools enable the reduction of the marker density (e.g. Illumina Bovine3K chip) and afterwards the imputation of the missing markers to the golden standard Illumina BovineSNP50 chip with the aid of a 50K based imputation reference population. This procedure enables breeders to genotype more animals for the same costs because of low imputation error rate and slightly loss of the reliability of genomic enhanced breeding values (Wiggans et al., 2012). Expected further increase of the reliability of genomic breeding values by imputing the genomic prediction reference population to higher density chips (e.g. Illumina BovineHD, around 777K) was not realized (Erbe et al., 2012; Ertl et al. 2013; VanRaden et al., 2013a). However, different research groups showed the possibility of the imputation from 50K to HD or even to sequence level is possible within and across breeds with low imputation error rate (Brøndum et al., 2012; Daetwyler et al., 2014; Erbe et al., 2012; Ertl et al., 2013; Pausch et al., 2013; Schrooten et al., 2014; VanRaden et al., 2013a). Beside genomic evaluation and imputation, genotype information can be used to check the pedigree information. A method described by Hayes (2011) can easily be applied to confirm and discover the parentage of an animal. The use of haplotypes can help to verify the maternal grandsire and/or great-grandsires when close relatives are not genotyped (VanRaden et al., 2013b). VanRaden et al. (2011a) firstly showed the application of routinely available genotyped pools for the discover of haplotypes which are absent in the homozygote state. These missing homozygous haplotypes are clearly associated with non-return-rates indicating that the haplotype may cause embryonic loss in the homozygote state. Following studies confirmed these findings and found additional deficient haplotypes in their Holstein populations and in other breeds (Fritz et al., 2013; Sahana et al., 2013; Cooper et al., 2013). For some of these harmful haplotypes the presence of a fertility defect could be verified by the identification of the causal mutation (Adams et al., 2012; Fritz et al., 2013; Daetwyler et al., 2014; McClure et al., 2014). Pausch et al. (2015) used the missing homozygote approach to successfully identify genetic disorders causing calf mortality in Fleckvieh cattle. 6

7 The use of haplotypes can also improve mating schemes. Cole and VanRaden (2011) showed the optimization of mating by considering haplotypes for the estimation of Mendelian sampling effects. Haplotypes can be used to simulate mating between potential sires and dams to identify the mating increasing the probability of the desired progeny genotype. This enables specific mating types with outstanding average breeding values and high Mendelian sampling variances to maximize the probability of extreme positive candidates. This is interesting for breeders and artificial insemination organizations especially in combination with embryo transfer. Cole and Null (2013) visualize the transmission of direct genomic values for paternal und maternal haplotypes for different traits and breeds. The present study demonstrates the application of haplotyping and imputation as a novel tool beyond genomic selection in Holstein breeding strategies. Chapter 1 focuses on low density chips which help breeders to evaluate their herds genomically for lower costs. To investigate the imputation accuracy and the effect of imputation on the reliability of genomic breeding values, two different dense marker chips (Illumina3K and IlluminaLD) and two different imputation software packages (Beagle Version 3.3 (Browning and Browning, 2007) and Findhap Version 2 (VanRaden et al., 2011b)) were compared. Chapter 2 shows that the imputation algorithms can be used to determine the polled state of animals. By simulation consequences of different breeding strategies in the frequencies of the desired polled allele, genetic level in male and female as well as the development of inbreeding is shown. Further research in deriving genetic characteristics and their impact on breeding schemes was done in Chapter 3. In recent years several research groups showed that some haplotypes may cause embryonic loss in the homozygous state. Carriers of genetic disorders were excluded from mating resulting in a reduced number of sires available for the breeding program and in a decrease of genetic gain. The amount of known defects will increase in the near future. New methods to consider genetic defects and positive traits like polled in dairy breeding schemes are needed. Chapter 4 shows an approach to estimate the variation of offspring groups. This approach can be used to predict the variation in offspring breeding values. On the one hand artificial insemination (AI) association can select candidates with a high variation to increase the probability of extreme 7

8 positive candidates. On the other hand average farmers can choose AI bulls with a more uniform progenies for simplified management. References: Adams, H. A., T. Sonstegard, P. M. VanRaden, D. J. Null, C. Van Tassell, and H. Lewin Identification of a nonsense mutation in APAF1 that is causal for a decrease in reproductive efficiency in dairy cattle. Proc. Plant Anim. Genome XX Conf., abstr. P0555. Bouquet, A., and J. Juga Integrating genomic selection into dairy cattle breeding programmes: a review. Animal 7: Brøndum, R. F., P. Ma, M. S. Lund, and G. Su Short communication: Genotype imputation within and across Nordic cattle breeds. J. Dairy Sci. 95: Browning, S. R., and B. L. Browning Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81: Calus, M. P., A. C. Bouwman, J. M. Hickey, R. F. Veerkamp, and H. A. Mulder Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal 11: Cole, J. B., and P. M. VanRaden Use of haplotypes to estimate Mendelian sampling effects and selection limits. J. Anim. Breed. Genet. 128: Cole, J. B., and D. J. Null Visualization of the transmission of direct genomic values for paternal and maternal chromosomes for 15 traits in US Brown Swiss, Holstein, and Jersey cattle. J. Dairy Sci. 96: Cooper, T. A., G. R. Wiggans, P. M. VanRaden, J. L. Hutchison, J. B. Cole, and D. J. Null Genomic evaluation of Ayrshire dairy cattle and new haplotypes affecting fertility and stillbirth in Holstein, Brown Swiss and Ayrshire breeds. ADSA-ASAS Joint Annual Meeting, poster T206. Daetwyler, H. D., A. Capitan, H. Pausch, P. Stothard, R. van Binsbergen, R. F. Brøndum, X. Liao, A. Djari, S. C. Rodriguez, C. Grohs, D. Esquerré, O. Bouchez, M.N. Rossignol, C. Klopp, D. Richa, S. Fritz, A. Eggen, P. J. Bowman, D. Coote, A. J. Chamberlain, C. Anderson, C. P. Van Tassell, I. Hulsegge, M. E. Goddard, B. Guldbrandtsen, M. S. Lund, 8

9 R. F. Veerkamp, D. A. Boichard, R. Fries, and B. J. Hayes Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genet. 46: Erbe, M., B. J. Hayes, L. K. Matukumalli, S. Goswami, P. J. Bowman, C. M. Reich, B. A. Mason, and M. E. Goddard Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J. Dairy Sci. 95: Ertl J., C. Edel, R. Emmerling, H. Pausch, R. Fries, and K. U. Götz On the limited increase in validation reliability using high-density genotypes in genomic best linear unbiased prediction: Observations from Fleckvieh cattle. J. Dairy Sci. 97: Fu, W., Y. Wang, Y. Wang, R. Li, R. Lin, and Li Jin Missing call bias in high-throughput genotyping. BMC Genomics 10:106. Fritz, S, A. Capitan, A. Djari, S. C. Rodrigue, A. Barbat, A. Baur, C. Grohs, B. Weiss, M. Boussaha, D. Esquerré, C. Klopp, D. Rocha, and D. Boichard Detection of Haplotypes Associated with Prenatal Death in Dairy Cattle and Identification of Deleterious Mutations in GART, SHBG and SLC37A2. PLoS ONE 8:e Hayes, B. J., P. J. Bowman, A. J. Chamberlain, and M. E. Goddard Invited review: Genomic selection in dairy cattle: Progress and challenges. J. Dairy Sci. 92: Hayes, B. J Technical note: Efficient parentage assignment and pedigree reconstruction with dense single nucleotide polymorphism data. J. Dairy Sci. 94: Kong A, G. Masson, M. L. Frigge, A. Gylfason, P. Zusmanovich, G. Thorleifsson, P. I. Olason, A. Ingason, S. Steinberg, T. Rafnar, P. Sulem, M. Mouy, F. Jonsson, U. Thorsteinsdottir, D. F. Gudbjartsson, H. Stefansson, and K. Stefansson Detection of sharing by descent, long-range phasing and haplotype imputation. Nature Genet. 40: Liu, Z., F. R. Seefried, F. Reinhardt, S. Rensing, G. Thaller, and R. Reents Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction. Genet. Sel. Evol. 43:19. 9

10 Matukumalli, L. K., C. T. Lawley, R. D. Schnabel, J. F. Taylor, M. F. Allan, M. P. Heaton, J. O'Connell, S. S. Moore, T. P. L. Smith, T. S. Sonstegard, and C. P. Van Tassell Development and Characterization of a High Density SNP Genotyping Assay for Cattle. PLoS ONE 4:e5350. McClure, M. C., D. Bickhart, D. Null, P. M. VanRaden, L. Xu, G. Wiggans, G. Liu, S. Schroeder, J. Glasscock, J. Armstrong, J. B. Cole, C. P. Van Tassell, and T.S. Sonstegard Bovine exome sequence analysis and targeted SNP genotyping of recessive fertility defects BH1, HH2, and HH3 reveal causative mutation in SMC2 for HH3. PLoS ONE 9:e Meuwissen, T. H. E.,.B. J. Hayes, and M. E. Goddard Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: Pausch, H., B. Aigner, R. Emmerling, C. Edel, K. U. Götz, and R. Fries Imputation of high-density genotypes in the Fleckvieh cattle population. Genet. Sel. Evol. 45:3. Pausch, H., H. Schwarzenbacher, J. Burgstaller, K. Flisikowski, C. Wurmser, S. Jansen, S. Jung, A. Schnieke, T. Wittek, and R. Fries Homozygous haplotype deficiency reveals deleterious mutations compromising reproductive and rearing success in cattle. BMC Genomics 16:312. Reinhardt, F., Z. Liu, F. Seefried, and G. Thaller Implementation of genomic evaluation in German Holsteins. Interbull Bull 40: Sahana, G., U. S. Nielsen, G. P. Aamand, M. S. Lund, and B. Guldbrandtsen Novel Harmful Recessive Haplotypes Identified for Fertility Traits in Nordic Holstein Cattle. PLoS ONE 12:e Schaeffer, L. R Strategy for applying genome-wide selection in dairy cattle. J. Anim Breed. Genet. 123: Scheet, P., and M. Stephens A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78: Schrooten, C, R. Dassonneville, V. Ducrocq, R. F. Brøndum, M. S Lund, J. Chen, Z. Liu, O. González-Recio, J. Pena, and T. Druet Error rate for imputation from the Illumina BovineSNP50 chip to the Illumina BovineHD chip. Genet. Sel. Evol. 46:10. 10

11 Van Doormaal B. J., G. J. Kistemaker, P. G. Sullivan, M. Sargolzaei, and F. S. Schenkel Canadian implementation of genomic evaluations. Interbull Bull 40: VanRaden, P. M.,C. P. Van Tassel, G. W. Wiggans, T. S. Sonstegard, R. D. Schnabel, J. F. Taylor, and F. Schenkel Invited review: Reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92: VanRaden, P. M., K. M. Olson, D. J. Null, and J. L. Hutchison. 2011a. Harmful recessive effects on fertility detected by absence of homozygous haplotypes. J. Dairy Sci. 94: VanRaden, P. M., J. R. O'Connell, G. R. Wiggans, and K. A. Weigel. 2011b. Genomic evaluations with many more genotypes. Genet. Sel. Evol. 43:10. VanRaden, P. M., D. J. Null, M. Sargolzaei, G. R. Wiggans, M. E. Tooker, J. B. Cole, T. S. Sonstegard, E. E. Connor, M. Winters, J. B. C. H. M. van Kaam, A. Valentini, B. J. Van Doormaal, M. A. Faust, and G. A. Doak. 2013a. Genomic imputation and evaluation using high-density Holstein genotypes. J. Dairy Sci. 96: VanRaden, P. M., T. A. Cooper, G. R. Wiggans, J. R. O Connell, and L. R. Bacheller. 2013b. Confirmation and discovery of maternal grandsires and great-grandsires in dairy cattle. J. Dairy Sci. 96: Wiggans, G. R., T. A. Cooper, P. M. VanRaden, K. M. Olson, and M. E. Tooker Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation. J. Dairy Sci. 95: Zimin, A. V., A. L. Delcher, L. Florea, D. R. Kelley, M. C. Schatz, D. Puiu, F. Hanrahan, G. Pertea, C. P. Van Tassek, T. S. Sonstegard, G. Marcais, M. Roberts, P. Subramanian, J. A. Yorke, and S. L. Salzberg A whole genome assembly of the domestic cow, Bos taurus. Genome Biol. 10:R42. 11

12 Chapter 1 Reliability of genomic prediction for German Holsteins using imputed genotypes from low density chips D. Segelke*, J. Chen *, Z. Liu*, F. Reinhardt*, G. Thaller, and R. Reents* *vit w.v., Heideweg 1, Verden, Germany Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany Published in Journal of Dairy Science, Vol. 95, Issue 9 12

13 ABSTRACT With the availability of single nucleotide polymorphisms (SNP) marker chips like the Illumina BovineSNP50 BeadChip (50K) genomic evaluation has been routinely implemented in dairy cattle breeding. However, for an average dairy producer, total costs associated with the 50K chip are still too high to have all of his cows genotyped and genomically evaluated. In order to study accuracy of cheaper low density chips, genotypes were simulated for two low density chips: Illumina Bovine3K BeadChip (3K) and BovineLD BeadChip (6K) according to their original marker maps. Simulated missing genotypes of the 50K chip were imputed using the programs Beagle and Findhap. Three genotype data sets were used to study imputation accuracy: EuroGenomics data set with 14,405 reference bulls, smaller EuroGenomics data set with 11,670 older reference bulls, and all genotyped German Holstein data set with 31,597 reference animals. Imputed genotypes were compared to their original ones to calculate allele error rate for validation animals of the three data sets. To evaluate the loss in accuracy of genomic prediction using imputed genotypes a genomic evaluation was conducted only for the EuroGenomics data set II. Furthermore, combined genome-enhanced breeding values (GEBV) calculated from original and imputed genotypes were compared. Allele error rate for the EuroGenomics data II set was highest for Findhap program on 3K chip (3.3 %) and lowest for Beagle program on 6K chip (0.6 %). Across the data sets Beagle has been shown to be about two times as accurate as Findhap. Compared to the real 50K genotypes, reduction in reliability of the genomic prediction using the imputed genotypes was highest for Findhap 3K (5.3%) and lowest for Beagle 6K (1%) averaged over the 12 evaluated traits. Differences in GEBV of original and imputed genotypes were largest for Findhap on 3K chip, whereas Beagle on 6K chip had the smallest difference. The low density chip 6K gave markedly higher imputation accuracy and more accurate genomic prediction than the 3K chip. Based on the relatively small reduction in accuracy of genomic prediction, we would recommend the BovineLD 6K chip for a large-scale genotyping as long as its costs are acceptable for breeders. Keywords: genomic evaluation, low density chip, imputation, dairy cattle 13

14 INTRODUCTION Since the release of Illumina SNP chip 50K, genomic evaluation and selection has been implemented or being implemented in an ever increasing number of countries for dairy cattle breeding (Liu et al., 2011; Lund et al., 2011; VanRaden, 2008). As a result, many potential AI bulls and bull dams around the world has been genotyped with the standard 50K chip (Wiggans et al., 2012). Due to its relatively high costs, widespread large-scale genotyping has still not been realized for average dairy producers in Germany. Thus there is a clear demand for cheaper low density chips by dairy producers and breeders. With the low density chips dairy breeders can screen their herd to identify genetically superior cows or bull calves for breeding. The 50K chip can also be used to identify haplotypes and gene defects (VanRaden et al., 2011). Meanwhile new chips of higher density or even complete genome sequencing are available for dairy cattle breeding. Dairy geneticists are challenged to work with even more diverse SNP chips (VanRaden et al., 2011). The use of low density marker panels for genomic evaluation based on e.g. the standard 50K chip requires statistical methods for transferring genotype information from individuals genotyped at a higher density. Imputation can be performed to fill in SNP genotypes of a higher density reference panel from a smaller SNP panel genotypes (Druet et al., 2010; VanRaden et al., 2011; Weigel et al., 2009; Zhang and Druet, 2010). Several statistical methods for imputing SNP marker genotypes have been recommended (Browning and Browning, 2007; Druet et al., 2010; VanRaden et al., 2011). Imputation programs like Findhap Version 2 (VanRaden et al., 2011) or Beagle Version 3.3 (Browning and Browning, 2007, 2011), are relying on linkage based on family information or linkage disequilibrium (LD) based on population information (Browning and Browning, 2007; Scheet and Stephens, 2006). For the population-based imputation algorithms, Weigel et al. (2010) showed that using 4,000 SNPs can provide approximately 95% of the predictive ability achieved using the real 50K chip. By exploring both family and population information, Findhap was designed to combine population and pedigree haplotyping (VanRaden et al., 2011) to achieve higher accuracy of imputation. A study of Dassonneville et al. (2011) showed that using Illumina Bovine3K BeadChip (3K) gave a mean imputation error rate of 5.5% for a Nordic countries data set and 3.9% for a French data set. Prediction of GEBV based on genotypes imputed with a smaller national reference data set gave an average loss of 0.05 in 14

15 reliability of GEBV in the French study, whereas a loss of 0.03 was obtained for reliability of direct genomic values (DGV) in the Nordic study. Chen et al. (2011) investigated the accuracy of genomic prediction for the 3K low density chip and found a loss in the reliability of genomic prediction for German Holsteins. Due to a relatively high cost of the 3K chip in comparison to the 50K chip, the low density chip 3K was not implemented in routine genomic evaluation in Germany. However, the same chip has been widely and routinely used in the US since September 2010 (Wiggans et al., 2012). In September 2011 Illumina released a new low density chip, called BovineLD BeadChip (Illumina, 2011b; Boichard et al., 2012; 6K), with more than two times more SNPs on the chip, as on the 3K chip (Table 1). The objectives of this study were to compare the imputation accuracy of the two programs Beagle and Findhap for both the 3K and 6K chips, and additionally to quantify the loss in reliability of genomic prediction using imputed 50K genotypes for German Holsteins. MATERIALS AND METHODS Genotype data sets for imputation A total of 33,802 German Holstein animals from routine genomic evaluation in February 2011, genotyped with the standard 50K chip, were selected for the 3K and 6K imputation study. Genotypes of 3K and 6K were simulated by using their original marker maps from Illumina for all the genotyped animals. For the commercial 3K chip, Illumina selected only 2,900 SNPs out of the its marker map which contains 3,200 SNPs. For the imputation study 1,672 of all 54,001 SNP markers on the 50K chip were excluded from the imputation study, since their chromosomal locations were not known, although 1,342 of the unassigned SNP markers have been used in routine genomic evaluation for German Holsteins (Liu et al., 2011). Table 1 shows numbers of SNPs on all chromosomes and those used by the two imputation programs. In total, Beagle (Browning and Browning, 2010, version 3.3) used 51,882, 6,594 and 3,032 SNP for the 50K, 6K and the 3K imputing. Because Findhap (VanRaden et al., 2011, version 2) can consider additionally SNP markers on sex chromosomes for genotype imputation, more SNPs were able to be used: 52,329, 6,763 and 3,208 SNP for the 50K, 6K and 3K imputing, respectively. 15

16 Table 1 SNP markers used for this study. Chromosome 50K Chip DGV calculation Findhap Beagle 3K markers, also on 50K chip 6K markers, also on 50K chip 1 3,343 2,816 3,343 3, ,764 2,286 2,764 2, ,566 2,169 2,566 2, ,541 2,117 2,541 2, ,181 1,807 2,181 2, ,535 2,140 2,535 2, ,294 1,883 2,294 2, ,362 1,999 2,362 2, ,036 1,701 2,036 2, ,179 1,838 2,179 2, ,267 1,903 2,267 2, ,683 1,391 1,683 1, ,802 1,477 1,802 1, ,722 1,442 1,722 1, ,688 1,421 1,688 1, ,606 1,342 1,606 1, ,585 1,355 1,585 1, ,351 1,141 1,351 1, ,378 1,154 1,378 1, ,564 1,363 1,564 1, ,419 1,147 1,419 1, ,299 1,076 1,299 1, , ,083 1, ,294 1,080 1,294 1, , ,086 1, , ,048 1, X pseudoautosomal unassigned 1,672 1, sum 54,001 45,181 52,329 51,582 3,208 6,874 16

17 Three different genotype data sets (Table 2) were chosen to study the imputation accuracy and reliability of genomic prediction: EuroGenomics (Lund et al., 2011) reference population with older bulls for late-measured traits like days open (data set I), EuroGenomics reference population (data set II), and all genotyped German Holstein animals (data set III). Validation bulls of late-measured traits for genomic prediction were older (data set I) than those for the traits in data set II, therefore the reference population for genomic validation as well as for imputing had also older bulls for the late-measured traits. For validating the genotype imputation and genomic prediction, lower density 3K and 6K genotypes were simulated from original 50K genotypes of validation animals. The number of validation animals for the imputation accuracy and genomic prediction study were, 534 Holstein bulls born between January 2002 and December 2002 for data set I and 1,374 bulls born between September 2003 and December 2004 for data set II. For data set III all genotyped animals born after June 30 th 2010, were considered to simulate a routine imputation scenario where youngest candidates are genotyped with a low density chip. As genotyped dams contribute equally as genotyped sires for imputation, we included also genotyped female animals in the data set III. For the genomic validation the Blackand-White Holsteins validation animals must have their sire included in the reference population. Table 2 Genotype data sets for the 3K and 6K imputation. Reference population 11,670 EuroGenomics Data Ⅰ: Holstein bulls (born before EuroGenomics old bulls Jan 2002) 14,405 EuroGenomics Data Ⅱ: Holstein bulls (born before EuroGenomics bulls Sep 2003) Data Ⅲ: 31,597 animals born before All genotyped Holstein July 2010 animals Validation population 534 bulls (born between Jan 2002 and Dec 2002) 1,374 bulls (born between Sep 2003 and Dec 2004) 2,205 animals born after July

18 Imputed 50K SNP genotypes from 3K or 6K chips were compared to their original genotypes of the validation animals for assessing the accuracy of imputation. The imputation accuracy was defined as allele imputation error rates (Druet and Georges 2010; Druet et al., 2010; Dassonneville et al., 2011; Chen et al., 2011), expressed as ratio of wrong imputed alleles to all alleles.. The allele error rate equals approximately half of the genotype error rate. The genotype error rate is equal to 1 - genotype imputation accuracy. In order to compare the accuracy between Findhap and Beagle, sex chromosomes were not considered. Additionally, we assessed the impact of relationship of validation animals to the reference population on imputation accuracy by grouping the validation animals into four classes: neither sire nor maternal grandsire (N=29), only maternal grandsire (N=138), only sire (N=149), or both sire and maternal grandsire (N=870) included in the reference population. Data set for genomic validation using the imputed genotypes To evaluate the loss in reliability of genomic prediction using imputed genotypes a special genomic evaluation using genotype data set II was conducted. Liu et al. (2011) made a genomic validation study based on real 50K genotypes which was based on the same data set and allowed a direct comparison of genomic prediction using real and imputed genotypes. SNP markers which were not imputed, like unassigned markers and SNP markers on sex chromosomes for Beagle, were set to missing in DGV calculation (Table 1). For the calculation of the DGV and GEBV routine procedures of genomic evaluation for German Holsteins as described in Liu et al. (2011) were used. SNP effects were estimated with a BLUP model assuming trait-specific residual polygenic variance and DGV were combined with conventional EBV or pedigree index using a selection index procedure to obtain GEBV for genotyped animals. Correlations between DGV of imputed genotypes from different scenarios were calculated for milk yield. Furthermore, correlations of DGV of imputed genotypes with real 50K genotypes, deregressed EBV and conventional EBV were calculated, too. From April 2010 German national conventional evaluation, the following traits were evaluated: milk production (milk, fat, protein yields), udder health as somatic cell score (SCS), fertility (days open, non-return rate 56 days heifer) and conformation (angularity, stature, chest width, teat length front, udder depth and udder support), in terms of r-square (R²) and reliabilities of genomic prediction for the validation bulls. 18

19 Additionally, differences in GEBV between imputed and original 50K genotypes were investigated. The differences were divided by genetic standard deviation of the respective trait for allowing a direct comparison between traits. A 99% quintile range of the GEBV differences was calculated in order to exclude extreme values possibly caused by wrong genotyping. All the calculations were conducted on a cluster of 64-bit Linux servers with multiple AMD Opteron processors each. RESULTS AND DISCUSSION Accuracy of imputation Computing time differed markedly between Beagle and Findhap, with a clear advantage for Findhap. Because of high computational demands of Beagle, only Findhap was used for the 3K and 6K imputation of the first and third data sets with 11,670 bulls and 31,597 animals, respectively in the imputation reference population. Table 3 shows genome-wide allele error rates of the 3K and 6K imputations for the data sets using the two programs, averaged over the validation animals. 19

20 Table 3 Genome-wide mean allele error rates of the 3K and 6K imputation for Black and White Holsteins with a sire in the reference population. Program Data set Nb. of validation Mean allele error rate (%) animals 3K 6K Data Ⅰ: EuroGenomics old bulls Findhap Data Ⅱ: 1, EuroGenomics Data Ⅲ: All 1, genotyped animals Beagle Data Ⅱ: 1, EuroGenomics As the number of reference bulls/animals increased from data set I to III, allele error rate decreased, showing larger reference populations result in higher imputation accuracy. For the programs Findhap allele error rate declined about one percent from data set I to III with the 3K chip, almost equal reduction was seen for the 6K data as well. The results clearly demonstrate that Beagle is more accurate than Findhap for imputing missing genotypes, though Beagle does not explore pedigree information in imputation. The difference in the error rates between Beagle and Findhap for the 3K chip amounted to 1.7%, whereas for the 6K chip the difference was 1.1%. For Findhap the difference in allele error rates between the 3K and 6K chip was 1.6% between the data sets I and II, and 1.5% for the data set III. The decrease in allele error rate from 3K to 6K imputation is less for Beagle than for Findhap, with a reduction of 1.0% for the data set II. Because of increased SNP density, resulting in higher imputation accuracy, Illumina no longer sells the 3K chip. With an imputation error rate of 1.6% for Beagle and 3.3% for Findhap using the EuroGenomics data set II, our results for the 3K imputation in this study are comparable to those obtained by Dassonneville et al. (2011). Mean imputation error rates were 4.0% and 2.1% for the French and 20

21 Nordic population based on the EuroGenomics reference bull population, and higher imputation error rates were found using their respective national reference populations, 5.5% and 3.9%. Table 4 demonstrates that imputation accuracy depended on the relationship of validation animals to the reference population in all the investigated scenarios. When validation bulls had more relatives in the reference population, their missing genotypes tended to be filled in more accurately. For example, in case of neither sire nor maternal grandsire (MGS) included in the reference population, average allele error rate reached 5.5% using Findhap for the 3K chip. Adding MGS to the reference population, allele error rate declined 0.9%. Similarly, a reduction of allele error rate by 1.8% for Findhap and the 3K chip was achieved when sire of the validation bull was a reference animal. Imputation was most accurate with the lowest error rate of 3.2% for the scenario of Findhap and 3K chip when both sire and MGS were added to the reference population. Even though Beagle does not explicitly explore the familiar information in imputing missing genotypes, we do observe higher accuracy of imputation with sire or MGS included in the reference population as found for Findhap. However, the improvement in imputation accuracy is much less evident for Beagle than for Findhap with regard to increasing relatedness of the validation bulls to the reference population. Having more relatives included in reference population resulted in greater increase in accuracy for 3K than for 6K chips. As Druet et al. (2010) also found in their study, that imputation had the highest accuracy when both sire and MGS belonged to the reference population for all the four scenarios in Table 4. 21

22 Table 4 Genome-wide mean allele error rate for Black and White validation Holstein bulls, by their relationship to the reference population (data Ⅱ). Presence of relatives in reference population Nb. of bulls Findhap (%) 3K Beagle (%) Findhap (%) 6K Beagle (%) Neither Sire nor maternal grandsire Maternal grandsire only Sire only Both sire and maternal grandsire Figure 1 shows the mean allele error rates of Findhap per chromosome for the three data sets and the two chips. It can be clearly seen that imputation accuracy differed between chromosomes, data sets and chip densities. The imputation error rates tended to be higher for shorter than for longer chromosomes, which is more evident for the 3K chip than for the 6K chip data sets. Another reason for a smaller 6K allele error rate is an increased concentration of SNP at the ends of the chromosomes. Additionally high allele error rates, like for chromosome 19, might be caused by map errors. 22

23 6 5 Mean allele erro r ate (%) Data Ⅰ, Old bulls 3k Data Ⅱ, EuroG 3k Data Ⅲ, ALLG 3k Data Ⅰ, Old Bulls 6k Data Ⅱ, EuroG 6k Data Ⅲ, AllG 6k Chromosome Figure 1 Mean allele error rate by chromosome for the three data sets using Findhap. Figure 2 compares allele error rates, by chromosome, between the two programs Findhap and Beagle. The allele error rates for Beagle 3K chip were comparable to the case of Findhap 6K chip. In contrast to larger variation and stronger trend of the error rate in the other scenarios, applying Beagle to imputation 6K genotypes led to more uniform error rates across chromosomes. 23

24 Mean allele error rate (%) 5,5 5,0 4,5 4,0 3,5 3,0 2,5 2,0 1,5 1,0 0,5 Findhap 3k Findhap 6k Beagle 3k Beagle 6k 0, Chromosome Figure 2 Mean allele error rate by chromosome using EuroGenomics data set II. Accuracy of genomic prediction using imputed genotypes Table 5 shows observed correlations between DGV of validation bulls of EuroGenomics data set II for milk yield. Findhap 3K has the lowest correlation with the real 50K genotypes among all the studied scenarios. DGV of scenarios Findhap 6K and Beagle 6K had equal correlations of 0.98 with the real 50K genotypes. The DGV correlation between Beagle 6K and Findhap 6K was the same, 0.97, as between Beagle 6K to Beagle 3K. Additionally the correlation of DGV of scenario Beagle 3K with deregressed EBV (DRP) and conventional EBV was 0.73 and 0.74, respectively. Equal correlations were found for scenario Findhap 6K and for Beagle 3K. DGV using the imputed genotypes of the case Beagle 6K had equal correlations with DRP and EBV as did DGV from using real 50K genotypes. Although Beagle did not consider the sex chromosomes in the imputation, it was more accurate, in terms of the correlations to DRP or EBV of the validation bulls, than Findhap which explicitly took the SNP markers on the sex chromosomes into consideration. The correlation between 50K original genotypes and Findhap 3K for milk yield was lower than found in Wiggans et al. (2012), which can be perhaps explained by the smaller reference population in our study. 24

25 Table 5 Observed correlations of DGV for validation bulls of EuroGenomics data (data Ⅱ) for milk yield. Findhap 3K Beagle 3K Findhap 6K Beagle 6K Real 50K genotypes Findhap 3K Beagle 3K Findhap 6K Beagle 6K Real 50K genotypes 1 Deregressed EBV Conventional EBV Table 6 reports R² values of regressing deregressed EBV of validation bulls on their GEBV using the imputed 50K genotypes. In comparison to the R² gain by genomics of the real 50K genotypes, R² GEBV - R² PI (Liu et al., 2011), genomic prediction using the imputed genotypes had a mean R² decrease, across the 12 traits, of 4.0 % for scenario Findhap 3K, 2.0 % Beagle 3K, 1.4 % Findhap 6K, and 0.8 % for Beagle 6K. For all the traits R² value of genomic prediction based on genotypes imputed with Findhap 3K decreased most. Especially for traits, such as milk and fat yields, with a major gene, the R² reduction was high. In general, the loss of R² value was higher for traits with higher accuracy of genomic prediction like SCS, stature and udder depth. The imputation with Beagle gave more accurate genomic prediction than Findhap in both 3K and 6K cases. Clearly, imputation with 6K chip gave, on average, more accurate genomic prediction than imputation with 3K chip. However, imputation with Beagle 3K yielded more accurate genomic prediction than imputation with Findhap 6K for traits like milk yield and angularity. Over all the traits, R² decreases for Beagle 6K were the lowest, except the trait front teat length. 25

26 Table 6 R² values (%), expressed as R² GEBV - R² PI, of original 50K genotypes and reduction in R² values, using imputed 50K genotypes. R² values Reductions in R² values Trait using real 50K genotypes Findhap 3K Beagle 3K Findhap 6K Beagle 6K Milk, kg Fat, kg Protein, kg SCS Days open NR56 heifer² Angularity Stature Chest width Front teat length Udder depth Udder support Average SCS= Somatic Cell Score ²NR = Non-return rate The reductions in reliability of genomic prediction using the imputed 50K genotypes are shown in Table 7. The R² value gain by genomics, R² GEBV R² PI, was divided by average reliability (REL) of conventional EBV of the validation bulls and this value, (R² GEBV R² PI )/REL, was considered as reliability gain due to genomics for using real 50K genotypes. Compared to the real 50K genotypes, reliability of genomic prediction decreased about 5.3% for scenario Findhap 3K, 2.6% for Beagle 3K, 1.9% for Findhap 6K, and 1% for Beagle 6K, respectively. Reliability loss was highest for SCS using Findhap 3K, 8.2%. For milk yield and days open using Beagle 6K, no reliability reduction was observed. For some traits like milk yield, days open, non-return rate 56 heifer or udder support imputation of 3K genotypes with Beagle gave better results than 26

27 imputation of 6K with Findhap. For the conformation trait front teat length the scenario Findhap 6K gave a more accurate genomic prediction than the scenario Beagle 6K. Table 7 Reliabilities (%), expressed as (R² GEBV - R² PI )/REL, of original 50K genotypes and reduction in reliabilities using imputed 50K genotypes. Reliability Reductions in reliability Trait using real 50K Findhap Findhap genotypes 3K Beagle 3K 6K Beagle 6K Milk, kg Fat, kg Protein, kg SCS Days open NR56 heifer² Angularity Stature Chest width Front teat length Udder depth Udder support Average SCS= Somatic Cell Score ²NR = Non-return rate Table 8 shows differences in GEBV of the validation bulls based on the original 50K and the imputed genotypes. Average and 99 % quintile range of the differences were divided by genetic standard deviation of respective trait. It can be clearly seen that genomic prediction based on Findhap 3K gave largest difference to GEBV based on real 50K genotypes. Additionally, there seems to be an overestimation of GEBV for most traits under this scenario. In contrast, average 27

28 GEBV differences were close to zero for the remaining three scenarios Beagle 3K, Findhap 6K and Beagle 6K. Across all traits, the range of the 99% quintile was largest for Findhap 3K, followed by Beagle 3K and Findhap 6K, and smallest for Beagle 6K. Comparing our results for scenario Findhap 3K to those by Wiggans et al. (2012) we found the imputation accuracy in both studies were comparable. However, Wiggans et al. (2012) did not find the GEBV differences deviating from zero as we experienced here. Furthermore, they obtained smaller ranges of the GEBV differences. These differences may be explained by our smaller data set for genotype imputing and genomic validation. Table 8 Mean and 99% quintile range of the GEBV difference between original and imputed 50K genotypes of the validation bulls. All values were divided by genetic standard deviations of the respective traits. Trait Findhap 3K Beagle 3K Findhap 6K Beagle 6K Milk, kg 0.21 ( ) ( ) ( ) ( ) Fat, kg 0.15 ( ) ( ) ( ) ( ) Protein, kg 0.28 ( ) ( ) ( ) ( ) SCS ( ) 0.02 ( ) 0.01 ( ) 0.01 ( ) Days open 0.06 ( ) 0 ( ) 0 ( ) 0 ( ) NR56 heifer² 0.02 ( ) 0 ( ) 0 ( ) 0 ( ) Angularity 0.16 ( ) ( ) 0 ( ) ( ) Stature 0.17 ( ) 0.03 ( ) 0.01 ( ) 0 ( ) Chest width 0.03 ( ) ( ) 0 ( ) ( ) Front teat length 0.01 ( ) 0.05 ( ) 0.01 ( ) 0.06 ( ) Udder depth 0.03 ( ) 0.06 ( ) 0.01 ( ) 0.01 ( ) Udder support 1 SCS= Somatic Cell Score ²NR = Non-return rate 0.04 ( ) 0.03 ( ) 0.01( ) 0.01 ( ) 28

29 Equal regression coefficients of deregressed EBV on GEBV were obtained using the imputed genotypes as using the real 54K genotypes. This indicates that using the imputed genotypes did not lead to biased GEBV. In reality, the reduction in genomic reliability due to the use of the imputed 54K genotype may not be as high as in this study, because the reference population of the 54K chip is significantly larger than in this study and candidates with low density genotypes tend to have more relatives genotyped with the 54K chip. CONCLUSIONS We investigated the accuracy of genotype imputation from low density chips Illumina 3K and 6K chip to the standard 50K chip based on German Holstein animals. The two imputation programs Beagle and Findhap differed in computing time, with a clear advantage of Findhap for large genotype population. However, for data set II and the 6K chip Beagle was more accurate than Findhap in imputation missing genotypes. Allele imputation error rates depended on the size of reference population, imputation programs, chip density and the relationship between reference and validation animals. Using the 6K chip gave twice as accurate imputed genotypes than using the 3K chip. Based on the imputed genotypes, a genomic validation was conducted for scenarios with regard to the two chip densities and imputation programs. Accuracy of genomic prediction was reduced, in comparison to that from the real 50K genotypes. The scenario Findhap 3K gave the least accurate genomic prediction, with an average reduction in reliability of 5.3% and the highest differences in GEBV. The most accurate genomic prediction using the imputed genotypes was the scenario Beagle 6K, with an average reduction in reliability of 1%. Based on the results, we conclude that the low density chip BovineLD 6K could be used routinely for large-scale genotyping. ACKNOWLEDMENTS German national organization FBF is thanked for financial support. The EuroGenomics consortium is kindly acknowledged for providing genomic data. We thank the University of Göttingen for its support of J. Chen, as a postdoc fellow, for this project. The first two authors D. Segelke and J. Chen contributed equally to this study. 29

30 REFERENCES Boichard, D., H. Chung, R. Dassonneville, X. David, A. Eggen, S. Fritz, K. J. Gietzen, B. J. Hayes, C. T. Lawley, T. S. Sonstegard, C. P. Van Tassell, P. M. VanRaden, K. A. Viaud- Martinez, George R. Wiggans, for the Bovine LD Consortium Design of a Bovine Low-Density SNP Array Optimized for Imputation. PLoS One 7:e Browning, S. R. and B. L. Browning Rapid and accurate haplotype phasing and missingdata inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81: Browning, B. L., and S. R. Browning High-resolution detection of identity by decent in unrelated individuals. Am J Hum Genet 86: Chen, J., Z. Liu, F. Reinhardt, and R. Reents Reliability of genomic prediction using imputed genotypes for German Holsteins: Illumina 3K to 50K bovine chip. Interbull Bulletin 44: Dassonneville, R., R. F. Brondum, T. Druet, S. Fritz, F. Guillaume, B. Guldbrandtsen, M. S. Lund, V. Ducrocq, and G. Su Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J. Dairy Sci. 94: Druet, T. and M. Georges A hidden markov model combining linkage and linkage disequilibrium information for haplotype reconstruction and quantitative trait locus fine mapping. Genetics 184: Druet, T., C. Schrooten, and A. P. de Roos Imputation of genotypes from different single nucleotide polymorphism panels in dairy cattle. J. Dairy Sci. 93: Illumina. 2011a. BovineSNP50 Genotyping BeadChip. Accessed November 4, Illumina. 2011b. BovineLD Genotyping BeadChip. Accessed November 4, Illumina. 2011c. GoldenGate Bovine3K Genotyping BeadChip. Accessed November 4,

31 Liu, Z., F. Seefried, F. Reinhardt, S. Rensing, G. Thaller, and R. Reents Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction. Genet. Sel. Evol. 43:19. Lund, M. S., S. P.W. de Ross, A. G. de Vries, T. Druet, V. Ducrocq, S. Fritz, F. Guillaume, B. Guldbrandtsen, Z. Liu, R. Reents, C. Schrooten, F. Seefried, and G. Su A common reference population from four European Holstein populations increases reliability of genomic predictions. Genet. Sel. Evol. 43:43. Meuwissen, T. H. E.,.B. J. Hayes, and M. E. Goddard Prediction of total genetic value using genome-wide dense marker maps. Genetics 157: Scheet, P. and M. Stephens A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78: VanRaden, P.M Efficient methods to compute genomic predictions. J. Dairy Sci. 91: VanRaden, P.M., K. M. Olson, D. J. Null, and J. L. Hutchison Harmful recessive effects on fertility detected by absence of homozygous haplotypes. J. Dairy Sci. 94: Weigel, K. A., G. de los Campos, O. Gonzalez-Recio, H. Naya, X. L. Wu, N. Long, G. J. Rosa, and D. Gianola Predictive ability of direct genomic values for lifetime net merit of Holstein sires using selected subsets of single nucleotide polymorphism markers. J. Dairy Sci. 92: Weigel, K. A., G. de Los Campos, A. I. Vazquez, G. J. Rosa, D. Gianola, and C. P. Van Tassell Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J. Dairy Sci. 93: Wiggans, G. R., T.A. Cooper, P. M. VanRaden K. M. Olson, and M. E. Tooker Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation. J. Dairy Sci. 95: Zhang, Z. and T. Druet Marker imputation with low-density marker panels in Dutch Holstein cattle. J. Dairy Sci. 93:

32 Chapter 2 Chancen und Grenzen der Hornloszucht für die Rasse Deutsche Holstein D. Segelke 1, H. Täubert 1, F. Reinhardt 1 und G. Thaller 2 1 Vereinigte Informationssysteme Tierhaltung w.v. (vit), Heideweg 1, Verden 2 Institut für Tierzucht und Tierhaltung Christian-Albrechts-Universität, Olshausenstraße 40, Kiel Published in Züchtungskunde, 85 Ausgabe 4 32

33 Zusammenfassung Ziel dieser Arbeit war es, die Chancen und Grenzen der Hornloszucht aufzuzeigen. Mittels der innovativen Methode Imputation besteht die Chance mit geringer Fehlerrate zusätzliche, im Herdbuch nicht als hornlos identifizierte Tiere zu ermitteln. Hierdurch wird die genetische Basis verbreitert. Es stehen somit zusätzliche Tiere für die Selektion zur Verfügung. Ebenso können mittels der SNP- Typisierung frühzeitig Zuchtwerte und rezessive Erbfehler ermittelt und in den Selektionsschritten berücksichtigt werden. Der wichtigste begrenzende Faktor für eine Intensivierung der Zucht auf hornlose Tiere ist, dass nur zwei Hornlos-Vererber und deren direkten Nachkommen mit guten Zuchtwerten in Leistungs- und funktionalen Merkmalen für die Holstein Friesian Zucht verfügbar sind. Mittels verschiedener Simulationsszenarien wurden die Auswirkungen verschiedener Anpaarungsstrategien auf die Frequenz des gewünschten Alleles, der Zuchtfortschritt der männlichen Tiere sowie der Kuhpopulation und die Entwicklung der Inzucht betrachtet. Es zeigt sich, dass für SNP- typisierte Tiere eine Selektion auf den imputierten Hornlos Genotyp die beste Möglichkeit darstellt, um die Frequenz des Hornlosalleles zu steigern und gleichzeitig den Zuchtfortschritt aufrecht zu erhalten. Falls keine Typisierung vorliegt, ist eine Selektion auf den Phänotyp ratsam. Schlüsselwörter Hornlos, Imputation, Holstein Friesian, Zuchtprogramme, genomische Zuchtwertschätzung Summary Goal of this study was to investigate chances and limits of breeding polled cattle. Based on the new innovative method of imputation there is a possibility to identify additional polled animals in the herdbook with a low error rate. This can be used to extend the genetic base and additional animals for selection are available. Analyzing SNP-genotypes gives information on breeding values and recessive gene-defects, which can be considered in the different selection steps. The biggest limitation for an intensive breeding on polled animals at the moment is the use of two polled sires and their direct offspring with high breeding values in performance and functional traits. 33

34 Different simulations were performed to analyze the consequences of different breeding strategies on the frequency of the desired polled allele, the genetic gain in male and female animals as well as the development of inbreeding. It can be shown how the selection of the homozygote polled genotype for already SNP-genotyped animals is the most effective way to increase the polled-allele in the population with simultaneous increase of genetic gain. For non- SNP-genotyped selection on the phenotype is sufficient, genotyping the polled status is not mandatory. Keywords polled, imputation, Holstein Friesian, breeding programs, genomic evaluation 34

35 Einleitung Tierwohl und Hornlosigkeit sind im öffentlichen Diskurs, welcher nicht erst seit der geplanten Novellierung des Tierschutzes und der Düsseldorfer Erklärung zur Verstärkung der Zucht auf Hornlosigkeit in der Rinderhaltung an Bedeutung gewinnt, untrennbar miteinander verbunden. Der unbestreitbare Vorteil hornloser Rinder besteht in der verringerten Verletzungsgefahr für Menschen und Artgenossen. Um den Ansprüchen seitens des Verbrauchers, der Politik und zuallererst des Tierwohls gerecht zu werden, bietet neben dem Enthornen unter Schmerzbehandlung die Zucht auf hornlose Rinder eine Alternative. Aus historischen Darstellungen ist ersichtlich, dass hornlose Tiere in allen Rinderrassen mit hoher Frequenz vorhanden waren. Es gilt die Frage zu beantworten, ob und wie schnell in den Rinderrassen wieder auf ein seit Jahrtausenden existierendes natürliches Erscheinungsbild des Hausrindes gezüchtet werden kann. Vorteilhaft für die gezielte Verbreitung der Hornlosigkeit ist die Dominanz des Hornlosalleles (Dove, 1935), welches auf Chromosom eins lokalisiert ist (Medugorac et al., 2012). Um die Verbreitung zu forcieren, ist die eindeutige Identifikation des Allelstatus der Tiere erforderlich. Da herkömmliche Verfahren wie die rein phänotypische Notation im Herdbuch fehleranfällig sein können, stellt das neue Verfahren der erbgutbasierten Datenerhebung eine zusätzliche und zuverlässige Option dar, den Hornstatus von Tieren eindeutig zu ermitteln. Die verzahnte Nutzung von herkömmlichen und innovativen Anwendungen erlaubt neben einer Steigerung der Sicherheit der Informationen auch eine signifikante Steigerung der Anzahl bekannter und damit für die Zucht verfügbarer Tiere. Ziel dieser Arbeit war es, den Hornstatus anhand des Illumina BovineSNP50 Chips vorherzusagen um die genetische Basis für die Zucht auf Hornlosigkeit zu verbreitern und optimale Anpaarungsstrategien zu entwickeln. 35

36 Material und Methoden Die Vorhersage des Hornstatus beruht auf einer Lernstichprobe die sich aus gehörnten (pp), heterozygot nicht gehörnten (Pp) und homozygot nicht gehörnten (PP) Tieren zusammensetzt. Auf Chromosom 1 wurde in der Region des Hornlosgens (Medugorac et al., 2012) ein zusätzlicher Marker hinzugefügt. Dieser enthielt den umcodierten Herdbucheintrag für PP und Pp Tiere. Für pp Tiere wurde mittels Pedigreeanalyse ein seit mehreren Generationen ausschließliches Vorhandensein von gehörnten Tieren sichergestellt. Für diese Tiere war in der Regel kein Herdbucheintrag vorhanden. Mittels Imputations- Programmen konnte für Tiere, für die kein Eintrag im Herdbuch vorlag, dieser fehlende Marker vorhergesagt werden. Imputation bedeutet, dass Haplotypen, beispielsweise mittels populationsweitem linkage disequilibrium (LD) (Browning und Browning, 2007, 2010) oder familienweitem linkage (VanRaden et al., 2011) anhand der Lernstichprobe abgeleitet werden können. Die hieraus gewonnenen Informationen sind auf eine Kandidatenbeziehungsweise Validierungspopulation übertragbar. Große Bedeutung hat das Imputatieren in der Hochrechnung von sogenannten low density Chips, welche nur eine Dichte von 6K single nucleotide polymorphisms (SNP) besitzen, auf den routinemäßig verwendeten 50K Chip. In vorhergehenden Studien konnte gezeigt werden, dass diese Art des Imputens mit geringer Fehlerrate möglich ist (Segelke et al., 2012). Statistische Auswertung und Datengrundlage Die nachstehenden Berechnungen basieren auf 69 SNP im Bereich von 0,7 bis 4,0 Megabasen (BTAU 4.0) des Chromosoms 1 (Cargill et al., 2008). Die SNPs liegen alle auf dem Illumina BovineSNP50 Chip und werden in der routinemäßigen Zuchtwertschätzung des vit berücksichtigt (Liu et al., 2011). Insgesamt lagen für Tiere SNP Informationen aus der genomischen Zuchtwertschätzung (Stand August 2012) vor. Die Lernstichprobe für die Vorhersage des Hornstatus setzte sich auf Grund von Herdbuchinformationen aus 417 heterozygot und 20 homozygot hornlosen Tieren zusammen typisierte Tiere konnten mittels Pedigreeanalyse als gehörnt identifiziert werden. Bei Tieren war der Hornstatus unbekannt. Tiere, die im Herdbuch mit Wackelhorn (S) 36

37 eingetragen waren, wurden nicht in die Lernstichprobe aufgenommen. Phänotypisch hornlos registrierte Tiere (P) wurden als Pp Tiere in der Lernstichprobe betrachtet. Für die Imputation wurde das Softwarepaket Beagle (Version 3.3; Browning und Browning, 2010) verwendet. Zur Beschreibung des Zuchtniveaus der mittels Herdbuch und Imputation identifizierten Tiere wurden die genomisch unterstützen Zuchtwerte (gzw) (Liu et al., 2011; Seefried et al., 2010) mit Stand August 2012 herangezogen. Eine Überprüfung bezüglich Hardy-Weinberg-Gleichgewicht (Falconer, 1984) für den Hornstatus erfolgte für beide Geschlechter der Rassen Rotbunt und Schwarzbunt. Die genetische Diversität der gehörnten und hornlosen Tiere wurde mittels der Inzuchtkoeffizienten aus dem Pedigree sowie mit Hilfe der genomischen Inzuchtkoeffizienten analysiert. Da nur für rotbunte Tiere der Jahrgänge 2008 bis 2012 (n=3.946 Tieren) hinreichend hornlose Tiere zur Verfügung standen, wurde für diese Tiere das Pedigree mittels des Softwarepaketes PEDIG (Boichard, 2002) analysiert. Die Pedigreedatei setzte sich aus den Tieren und deren Vorfahren zusammen. Die Tiere der Jahrgänge 2008 bis 2012 gingen auf 297 verschiedene Väter und verschiedene Mütter zurück. Bei 9 Tieren war der Vater unbekannt, bei 3 Tieren die Mutter. Im Durchschnitt hatten die Tiere über neun Generationen eine bekannte Abstammung. 5% der Tiere hatten ein Pedigree, das über mehr als 12 Generationen verfolgt werden konnte. 1,5% der Tiere hatten ein Pedigree, das weniger als 4 Generationen umfasste. Die genomische Verwandtschaftsmatrix zur Berechnung der genomischen Inzuchtkoeffizienten wurde nach der von VanRaden (2008) beschriebenen Methode aufgestellt. Die Allelfrequenzen der Basispopulation wurden nach der Methode von Gengler et al. (2007) geschätzt. Die genomische Verwandtschaftsmatrix wurde auf die Pedigree Verwandtschaftsmatrix skaliert. Die unbewusste Anreicherung genetischer Erbdefekte, die eine Folge eines verstärken Einsatzes einzelner dominierender Vererber sein kann, wurde mittels der Haplotypen HH1, HH2 und HH3 (VanRaden et al., 2011) beispielhaft analysiert. HH1, HH2 und HH3 stellen drei Haplotypen dar, die bei homozygoten Vorkommen zum Absterben des Embryos führen und sich somit negativ auf die Fruchtbarkeit auswirken. Lokalisiert sind diese drei Haplotypen auf den Chromosomen fünf, 37

38 eins und acht (VanRaden et al., 2011). Um die drei Haplotypen zu detektieren wurde überprüft, welcher Haplotyp in der jeweiligen Genomregion nicht homozygot auftrat. Validierung des Verfahrens Anhand einer Validierungsstudie sollte die Vorhersagekraft des Imputationsverfahrens ermittelt werden. Hierfür wurde die aus Tieren bestehende Lernstichprobe, geteilt. Die 75% ältesten Tiere stellten die neue Lernstichprobe dar. Die 25% jüngsten Tiere bildeten die Validierungsgruppe. Unter den Tieren der reduzierten Lernstichprobe waren zirka 1% der Tiere hornlos (165 Tiere Pp, 11 Tiere PP), unter den Validierungstieren waren 4,5% der Tiere hornlos (241 Tiere Pp, 9 Tiere PP). Nach dem imputatieren wurde für die Validierungstiere der bekannte Herdbuchstatus mit dem Imputationsergebnis verglichen und hieraus die Fehlerrate, welche das Verhältnis von falsch imputierten Tieren zu allen Tieren angibt, berechnet. Simulation hornloser Zuchtprograme Um die Auswirkungen einer konsequenten Umstellung der Zucht auf Hornlosigkeit zu analysieren, wurden verschiedene Zuchtszenarien simuliert. Abbildung 1 zeigt ein rein genomisches Zuchtprogramm, wie es in Täubert et al. (2011) beschrieben ist. 38

39 Milchkühe unter Milchkontrolle Bullenväter 500 typisierte Bullenkälber Kuhväter 95% 30 Vererber 94% 5% 1 WE-Vererber 6% Abb. 1 Schema eines rein genomischen Zuchtprogramms, jeder Pfeil symbolisiert eine Selektionsentscheidung Scheme of a genomic breeding program, each arrow represents a selection step Hierbei wird auf Wartebullen verzichtet. Stattdessen werden genomisch selektierte Jungbullen ab einem Alter von 15 Monaten eingesetzt. Basis für die Zucht ist eine Kuhpopulation mit Kühen unter Milchkontrolle. Aus deren Nachkommen werden je nach Szenario die 500 besten Tiere nach Zuchtwert oder Genotyp selektiert. Hieraus werden wiederum 30 Bullen selektiert und als Vererber in der Population eingesetzt. Ein Bulle wird als Wiedereinsatzbulle beibehalten. Um einen möglichst hohen Zuchtfortschritt in allen Szenarien zu erzielen, sind 95% der Bullenväter junge genomisch untersuchte Tiere. 5% der Bullen gehen auf Wiedereinsatzbullen zurück. Als Kuhväter dienen zu 94% junge genomisch selektierte Bullen und zu 6% Bullen aus dem Wiedereinsatz. Für die Kuhpopulation wurde in der Ausgangsgeneration ein mittlerer Zuchtwert von 0 sowie eine Streuung von eins angenommen. Die Überlegenheit der Besamungsbullen wurde simuliert, indem aus einer Anfangspopulation von 500 Bullen mit einem mittleren Zuchtwert von 0 und einer Streuung von 1 die 30 besten Bullen selektiert wurden. Als Frequenzen der Genotypen wurden die Tiere der Jahrgänge 2008 bis 2010 gewählt, da unterstellt wird, dass diese Tiere 39

40 aktuell zur Zucht eingesetzt werden. Für männliche rotbunte Tiere betrugen die Genotypfrequenzen der Ausgangspopulation PP 0,44%; Pp 8,43% und pp 91,13%. Für weibliche Tiere betrugen die Genotypfrequenzen PP 0,47%; Pp 7,67% und pp 91,86%. Die Sicherheit der Kuhzuchtwerte betrug 50%. Die Sicherheit der Zuchtwerte der Vererber betrug 89% und die der Wiedereinsatzvererber 99%. Der Zuchtwert des jeweiligen Tieres setzte sich aus dem wahren Zuchtwert, einem Mendelian Sampling Teil und einem zufälligen nicht heritablen Rest zusammen. Grundsätzlich wurde keine Verpaarung stark verwandter Tiere zugelassen. Die nächste Vererbergeneration wurde immer nach Zuchtwert ausgewählt, da unterstellt wurde, dass in einem praktischen Zuchtprogramm die Vererber stets nach Gesamtzuchtwert selektiert werden. Insgesamt wurden fünf Szenarien miteinander verglichen. Im ersten Szenario (A) wurden zur Erzeugung der neuen Bullenmüttergeneration die Bullen und Kühe assortativ angepaart. Entsprechend wurde jeweils der beste Bulle ohne Berücksichtigung des Genotyps an die beste Kuh angepaart. Im zweiten Szenario (A + BM) wurden zusätzlich zu den 500 Bullenmüttern, die nach Zuchtwert ausgesucht wurden, 100 weitere Bullenmütter nach Hornlos Genotyp ausgewählt. Entsprechend waren dies vorwiegend homozygot hornlose Tiere. Im nächsten Szenario (G) wurden die Bullenmütter generell nach dem Hornlos Genotyp ausgewählt. Dies bedeutet, dass vor allem homozygot hornlose Tiere zur Anpaarung verwendet wurden, unabhängig von deren Zuchtwert. Wenn nicht genügend hornlose Tiere zur Verfügung standen, wurden zusätzlich heterozygot hornlose Tiere verwendet. Im vierten Szenario (GA) wurden die Tiere abermals nach Genotyp ausgewählt. Innerhalb des Genotypes wurde zusätzlich nach Zuchtwert sortiert, sodass nur die genetisch besten hornlosen Tiere verwendet wurden. Im letzten Szenario (P) wurden die Bullenmütter nach Phänotyp selektiert. Dies bedeutet, dass die Tiere hornlos waren, jedoch keine Unterscheidung in heterozygot oder homozygot hornlos getroffen wurde. Innerhalb des Phänotyps wurden die potentiellen Bullenmütter nach ihrem Zuchtwert sortiert. 40

41 Jedes Szenario wurde 200 mal wiederholt und hieraus die mittlere Allelfrequenz P, die mittleren Zuchtwerte der männlichen und weiblichen Tiere sowie die Entwicklung der Inzucht über 15 Generationen hinweg betrachtet. Ergebnisse Imputation der Validierungstiere und der Kandidaten Obwohl der Anteil der hornlosen Tiere in der Validierungsstichprobe deutlich höher war als in der Lernstichprobe, wurden von Validierungstieren nur 12 Tiere falsch vorhergesagt. Dies entspricht einer Fehlerrate von 0,2%. Tabelle 1 stellt die Einträge des Herdbuches den Ergebnissen der Imputation gegenüber. Alle PP Tiere wurden richtig identifiziert. Der deutlichste Unterschied zwischen dem Ergebnis im Herdbuch und dem Imputations- Ergebnis ist bei den heterozygoten hornlosen Tieren zu finden. Tab. 1 Gegenüberstellung der Herdbucheintragungen und der Imputationsergebnisse für Validierungstiere Comparison of herdbook entries and imputation results of validation animals Herdbucheintrag Imputed pp Imputed Pp Imputed PP pp* Pp PP P * Ermittlung auf Grund der Pedigreeinfomationen Tabelle 2 gibt die Ergebnisse der Vorhersage des Hornstatus für Kandidaten wieder. Von möglichen Tieren wurden 18 als homozygot hornlos und 545 als heterozygot hornlos identifiziert. Der Anteil hornloser Tiere an allen Kandidatentieren lag bei 1,5% und entsprach damit etwa dem Anteil hornloser Tiere im Referenzdatensatz. Es zeigt sich, dass mit Hilfe der Imputation die Basis für eine Zucht auf Hornlosigkeit mehr als verdoppelt werden kann. 41

42 Tab. 2 Ergebnisse der Vorhersage für Kandidatentiere Results of prediction for candidates Anzahl Referenztiere Anteil hornloser unter den Referenztieren (%) 1,9 Anzahl Kandidatentiere Imputed PP 18 Imputed Pp 545 Imputed pp Anteil hornloser unter den Kandidatentieren (%) 1,5 Beschreibung der Population Imputation ist mit sehr geringer Fehlerrate möglich. Daher werden die Ergebnisse der Imputation und die Eintragungen im Herdbuch zusammengefasst, sodass für alle SNP- Typisierten Tiere ein Hornstatus vorliegt. Tabelle 3 gibt die Verteilung der Tiere nach Geschlecht und Rasse für den jeweiligen Hornstatus wieder. Außerdem ist die Allelfrequenz des rezessiven Allels p und das Hardy-Weinberg- Gleichgewicht aufgeführt. Es zeigen sich große Unterschiede zwischen den Rassen Schwarzbunt und Rotbunt. Bei den Schwarzbunten Tieren ist die Allelfrequenz des dominanten Hornlosgens (P) unter 0,01 gefallen. Es besteht kein Hardy-Weinberg-Gleichgewicht. Bei den Rotbunten Tieren herrscht Hardy-Weinberg-Gleichgewicht. Allerdings ist auch hier die Allelfrequenz des Allels P als sehr gering einzuschätzen. Ebenfalls wird aus der Tabelle ersichtlich, dass bei beiden Rassen der Anteil weiblicher hornloser Tiere höher ist, als der der männlichen. 42

43 Tab. 3 Verteilung des Hornstatus, der Allelfrequenz des rezessiven Allels p und des Hardy- Weinberg-Gleichgewichts aufgeteilt nach Rasse und Geschlecht Distribution of polled status, allele frequencies of the recessive allele p and Hardy-Weinbergequilibrium by breed and sex PP Pp pp Teilpopulation Allelfrequenz p Tiere Tiere Tiere im HWG 1 Männlich Schwarzbunt Nein Weiblich Schwarzbunt Nein Männlich Rotbunt Ja Weiblich Rotbunt Ja 1 Hardy-Weinberg-Gleichgewicht (p < 0.05) Abbildung 2 gibt die Entwicklung des Anteils heterozygot hornloser Tiere im Vergleich zu allen Tieren für die beiden Rassen je Geburtsjahr wieder. Vor 2007 lag der Anteil heterozygoter Tiere am gesamten Geburtsjahr in beiden Rassen bis auf wenige Ausnahmen unter 0,5 Prozent. Ab 2007 erfolgte in beiden Rassen ein deutlicher Anstieg der heterozygoten Tiere. Allerdings fiel dieser bei den Rotbunten deutlich stärker aus. In den aktuellen Geburtsjahrgängen liegt der Anteil heterozygoter rotbunter Tiere zehn Mal so hoch wie der Anteil schwarzbunter heterozygoter Tiere. Der Anteil homozygot hornloser Tiere liegt bei beiden Rassen deutlich unter einem Prozent. Allerdings ist auch hier der Anteil bei den Rotbunten höher als bei den Schwarzbunten. 43

44 30 25 Anteil Pp Tiere (%) Geburtsjahr Rotbunt Schwarzbunt Gesamt Abb. 2 Anteil Pp Tiere an allen typisierten Tieren, aufgeteilt nach Geburtsjahr Proportion of Pp animals of all animals, listed by birthyear Verwandtschaft und Inzucht 72% der rotbunten hornlosen Tiere gehen auf sechs verschiedene Väter zurück, die mehr als 20 typisierte Nachkommen haben. Mit 106 rotbunten hornlosen Nachkommen hat der heterozygot hornlose rotbunte Bulle Lawn Boy, 2002 in den USA geboren, aktuell die größte Bedeutung für die Hornlospopulation. Durch die genomische Selektion hat sich das Generationsintervall verkürzt, mittlerweile wurde bereits die 2. Generation nach Lawn Boy geboren. Zwei Lawn Boy Söhne weisen ihrerseits mehr als 90 typisierte Nachkommen auf. Nach diesen Söhnen folgt der Bulle Mitey und ein Sohn von Mitey. Magna P, der auf die gleiche Mutter wie Mitey zurückgeht, hat 21 typisierte Nachkommen. Als Muttersvater der hornlosen Tiere wurde 104 mal Lawn Boy eingesetzt. Bei den anderen hornlosen Tieren wurden vor allem gehörnte Muttersväter eingesetzt. Lawn Boy ist zudem der einflussreichste Ahne der rotbunten hornlosen Tiere die zwischen 2008 bis 2012 geborenen wurden. Sein Anteil an diesem Genpool macht 25% aus. Bei den schwarzbunten hornlosen Tieren zeigt sich ebenfalls eine Dominanz von Lawn Boy und seinen Söhnen. Allerdings erhält hier der 2007 in Kanada geborene, heterozygot hornlose schwarzbunte Bulle Mitey mit 56 Nachkommen ein größeres Gewicht als bei den Rotbunten. 44

45 Auf der Muttersvaterseite wurde ähnlich wie bei den rotbunten Tieren vor allem Lawn Boy eingesetzt. Tabelle 4 zeigt die mittlere Verwandtschaft der rotbunten Tiere der Jahrgänge 2008 bis 2012, aufgeteilt nach dem Hornstatus. Es wird deutlich, dass die Verwandtschaft zwischen Pp Tieren doppelt so hoch ist wie zwischen pp Tieren. Gleichzeitig besteht ein deutlich engeres verwandtschaftliches Verhältnis zwischen PP und Pp Tieren als zwischen hornlosen und gehörnten Tieren. Tab. 4 Mittlere Pedigreeverwandtschaftskoeffizienten zwischen und innerhalb PP, Pp und pp Rotbunter Tiere der Jahrgänge 2008 bis 2012 Mean pedigree relationship coefficient between and within PP, Pp and pp Red Holsteins of birthyear 2008 to 2012 Hornstatus PP Pp pp PP 0,102 0,088 0,048 Pp 0,088 0,085 0,049 pp 0,048 0,049 0,047 Für die schwarzbunten Tiere der Jahrgänge 2008 bis 2012 lag die durchschnittliche genomische Inzucht bei 5,06%. Es gab keinen signifikanten Unterschied zwischen gehörnten oder nicht gehörnten Tieren, jedoch einen solchen (P<0,001) der Pedigreeinzuchtkoeffizienten zwischen pp und Pp Tieren. Während die gehörnten im Mittel einen genomischen Inzuchtkoeffizenten von 5,06% hatten, betrug dieser bei Pp Tieren 4,71%. Bei den rotbunten Tieren lag die mittlere Pedigreeinzucht sowohl bei den Pp (4,37%) als auch bei den PP (5,34%) Tieren signifikant (P<0,001) über der der pp (3,97%) Tiere. Bei der genomischen Inzucht war nur zwischen pp und Pp Tieren ein signifikanter (P<0,001) Unterschied feststellbar. Tiere mit pp hatten hier eine mittlere Inzucht bei 4,14%, hingegen lag die Inzucht der Pp sogar mit 3,19% unter denen der pp Tiere. 45

46 Zuchtwerte Abbildung 3 zeigt die Streuung und Mittelwerte des genomisch unterstützen RZG für rotbunte und schwarzbunte Tiere, unterteilt in gehörnte und nicht gehörnte Tiere. Vergleicht man die gzw der gehörnten (n=25.302) mit denen der hornlosen schwarzbunten Tiere (n=279) für die Geburtsjahrgänge 2008 bis 2012 weisen die gehörnten Tiere einen signifikant (P<0,001) besseren mittleren RZG von 124 und die hornlosen Tiere einen mittleren RZG von 121 auf. Die Streuung in den Zuchtwerten ist bei den hornlosen Tieren geringer als bei den gehörnten Tieren. Dies wird in den minimalen und maximalen RZG deutlich. Während das beste gehörnte Tier einen RZG von 163 besitzt, hat das beste hornlose Tier einen RZG von 142 und ist somit knapp zwei genetische Standardabweichungen unterlegen. Bei den rotbunten Tieren zeigt sich eine signifikante Überlegenheit (P<0,05) der hornlosen Tiere im RZG. Während die gehörnten Tiere einen mittleren RZG von 120 aufwiesen, besitzen die hornlosen Tiere im Mittel einen RZG von 122. Bei der Streuung der Zuchtwerte zeigt sich, ähnlich wie bei den schwarzbunten Tieren, dass selbst bei einem wiederholten zufälligen sampeln aus der gehörnten Stichprobe, die Streuung bei den hornlosen Tieren geringer ausfällt als bei den gehörnten Tieren. Der Unterschied zwischen dem besten gehörnten Tier und dem besten hornlosen Tier beträgt ungefähr eine halbe genetische Standardabweichung im RZG. 46

47 Abb. 3 Streuung und Mittelwerte des genomisch unterstützten RZG für Rot- und Schwarzbunte Tiere, aufgeteilt nach Hornstatus für die Kandidatenjahrgänge 2008 bis 2012 (n= Tiere) Deviation and mean of the genomic enhanced RZG for Red and Black&White Holstein cattle, per polled status for candidates born between 2008 and 2012 (n= animals) 47

48 Betrachtet man die drei Fruchtbarkeitshaplotypen HH1, HH2 und HH3, lässt sich ein signifikanter Zusammenhang zwischen den Haplotypen HH1 (P<0,001), HH2 (P<0,05) und dem Hornstatus für beide Rassen und den Geburtsjahrgängen 2008 bis 2012 feststellen. Während bei HH2 die hornlosen Tiere beider Rassen eine geringere Verdachtshäufigkeit aufweisen, zeigt sich bei HH1 Haplotypen genau das Gegenteil. Bei der Rasse Rotbunt liegt der Anteil der Pp beziehungsweise PP Tiere und Verdacht auf HH1 vier mal so hoch wie bei den pp Tieren. Bei den schwarzbunten Tieren ist der Anteil der Pp Verdachtstiere doppelt so hoch wie bei den gehörnten Tieren. Beim Haplotypen HH3 lässt sich für beide Rassen kein Zusammenhang zwischen dem Haplotypen und dem Hornstatus feststellen. Ergebnisse der Simulationsstudie Zur Beschreibung der Wirkung der simulierten Zuchtprogramme wurden vier Ergebnisparameter analysiert. Die mittlere Allelfrequenz (P), die mittlere Zuchtwertentwicklungen getrennt nach eingesetzten Bullen und Kühen in der Gesamtpopulation sowie der Verlauf der Inzuchtsteigerung je Generation. Die Entwicklung der Allelfrequenz in der Gesamtpopulation ist in Abbildung 4 dargestellt. 48

49 1 0,9 Mittlere Allelfrequenz P 0,8 0,7 0,6 0,5 0,4 0,3 0,2 0, Generation A A + BM AG G P Abb. 4 Entwicklung der Allelfrequenz P in der Gesamtpopulation für die fünf verschiedenen Szenarien über 15 Generationen Development of allele frequency of P in the whole population for the 5 scenarios over 15 generations Die beiden Szenarien, die den Genotyp in der Bullenmütterselektion berücksichtigen (G, AG), zeigen den stärksten Anstieg der Frequenz des bevorzugten Allels (P) mit einer nahezu vollständigen Fixierung nach ca. 12 Generationen. Im Szenario P wird eine mittlere Erhöhung der Frequenz des erwünschten Allels erreicht. In den Szenarien mit nahezu reiner Selektion auf den Zuchtwert (A, A+BM) werden kaum Verbesserungen der Allelfrequenz erreicht, wobei im Szenario A+BM der zusätzliche Einsatz genotypisierter Bullenmütter sich gering positiv auswirkt. Die Anpaarungsstrategien bewirken, dass in der 2. Generation fast ausschließlich heterozygote Tiere erzeugt werden und erst in den folgenden Selektionsschritten die Genfrequenz segregiert. 49

50 Der Einfluss der Selektion auf die Bullenzuchtwerte (Wiedereinsatz und Vererber) ist in Abbildung 5 dargestellt. Bei Selektion auf den Zuchtwert (A, A+BM) sind die höchsten Zuchtfortschritte zu beobachten, gefolgt von Szenario P. Bei reiner Selektion auf den hornlos Genotyp (G) sind die geringsten Zuchtfortschritte erkennbar, im Szenario AG liegen sie in der Mitte. Mittlerer Zuchtwert der selektierten Bullen 3 2,8 2,6 2,4 2,2 2 1,8 1,6 1, Generation A A + BM AG G P Abb. 5 Zunahme des mittleren Zuchtwerts der Bullen über die 15 betrachteten Generationen und der fünf verschiedenen Szenarien Increase of mean breeding values of bulls over the investigated 15 generations and 5 scenarios Die Kuhzuchtwerte (Abbildung 6) entwickeln sich im Vergleich der einzelnen Szenarien analog zu der Entwicklung bei den Bullen. Da die simulierten Kuhzuchtwerte in der Ausgangspopulation einen Mittelwert von Null haben, sieht der Kurvenverlauf zu Anfang etwas anders aus als bei den Bullen, die bereits am Anfang auf ein höheres Niveau eingestellt wurden. Auch das erreichte Zuchtwertniveau nach 15 Generationen ist geringer als bei den Bullen. 50

51 Mitllerer Zuchtwert der Kuhpopulation 3 2,5 2 1,5 1 0, Generation A A + BM AG G P Abb. 6 Verlauf der durchschnittlichen Zuchtwerte der Kuhpopulation über 15 Generationen und der fünf verschiedenen Szenarien Trend of mean breeding values of cows over the investigated 15 generations and 5 scenarios Die Entwicklung der Inzucht in der simulierten Population wird in Abbildung 7 dargestellt. Durch die Simulation pendelt sich das Verwandtschaftsmodell innerhalb der Generationen zuerst ein. In der zweiten Generation existieren 30 große Halbgeschwistergruppen der zuerst eingesetzten Vererber. Im Laufe der weiteren Selektions- und Anpaarungsstrategie sinkt die hohe durchschnittliche Verwandtschaft ab, bis zu einem Minimalwert in Generation 5. Danach steigt die Inzucht innerhalb der Generationen wieder an, je nach Szenario. Die Szenarien, bei denen die Bullenmütter nach Zuchtwert rangiert werden, zeigen einen höheren Inzuchtgrad, als die Strategie G, bei der nur nach hornlos Genotyp rangiert wird. 51

52 Mittlere Inzuchtrate 0,09 0,08 0,07 0,06 0,05 0,04 0,03 0,02 0, Generation A A + BM AG G P Abb. 7 Entwicklung der mittleren Inzucht je Generation für die fünf verschiedenen Szenarien über 15 Generationen Development of mean inbreeding per generation for the five scenarios over 15 generations Diskussion Grundsätzlich ist die Imputation geeignet, nicht beobachtete Marker vorherzusagen. Mehrere Studien zeigen (Chen et al., 2011; Segelke et al., 2012), dass Beagle nicht beobachtete SNPs am genauesten vorhersagen kann. Neben geringen Imputationsfehlern sind Fehler in der Erfassung von Herdbuchinformationen zu nennen. Vor allem die im Herdbuch als phänotypisch hornlos erfassten Tiere stellen eine potentielle Fehlerquelle im Imputationsprozess dar, da aufgrund dieser Information keine eindeutige Identifikation als hetero- oder homozygotes hornloses Tier möglich ist. Die Imputation des Hornstatus bietet neben dem Aufdecken nicht identifizierter hornloser Tiere zudem die Chance, die Herdbuchergebnisse zu überprüfen. Fehler in der Eintragung oder Laborfehler wie Probenvertauschung können somit aufgedeckt und zur Klärung an die Praxis zurückgegeben werden. 52

53 Für die schwarzbunten Tiere wurde keine Übereinstimmung zwischen erwarteter und beobachteter Verteilung und dementsprechend kein Hardy-Weinberg-Gleichgewicht gefunden. Es verdeutlicht, dass eine Selektion gegen die Hornlosigkeit in den letzten Jahrzehnten, beziehungsweise Jahrhunderten, stattgefunden hat. Der beschriebene Sachverhalt, dass die hornlosen Tiere tendenziell den gehörnten Tieren im genetischen Niveau unterlegen sind, findet sich auch in Lamminger et al. (2000) wieder. Hier wird beschrieben, dass die hornlosen Tiere vor allem in den Leistungszuchtwerten Milch, Fett und Eiweißmenge sowie Fettgehalt den gehörnten Tieren unterlegen waren. Hingegen waren die hornlosen Tiere den gehörten Tieren im Fruchtbarkeitsbereich und Kalbeverhalten überlegen. Auch in dieser Arbeit zeigt sich, dass die hornlosen schwarzbunten Tiere im Relativzuchtwert Fruchtbarkeit (RZR) den gehörnten Tieren überlegen sind. Die mittleren gzw der hornlosen Tiere liegen diesem Merkmal vier Punkte über denen der gehörnten Tiere. Eine Erklärung für dies ist der Antagonismus zwischen Leistungsund Reproduktionsmerkmalen, sodass Tiere mit höheren Leistungszuchtwerten tendenziell immer im Bereich der Fruchtbarkeit geringer abschneiden. Generell ist anzumerken, dass die Analysen stets auf dem SNP Genoytpenpool basieren, der vor allem die Elite Bullen und Bullenmütter repräsentiert. Hingegen sind über die gesamte Kuhpopulation im Feld nur wenig Informationen bezüglich des Hornstatus bekannt. Auf Grund der sehr geringen Frequenz in der SNP- typisierten Population ist davon auszugehen, dass die Zahlen dennoch repräsentativ für die Gesamtpopulation sind. Während bei rotbunten Tieren die gefundenen Frequenzen der Fruchtbarkeitshaplotypen HH1 und HH2 mit 4,3%, beziehungsweise 4,8%, nahezu identisch zu VanRaden et al. (2011) liegen (4,5% und 4,6%), zeigt sich bei den HH3 Trägern ein Unterschied von 3,5% zwischen den gefunden Trägerfrequenzen in der Rotbuntpopulation und in der amerikanischen Population. Bei den schwarzbunten Tieren hingegen betragen die Trägerfrequenzen der HH1 und HH2 Träger in etwa die Hälfte der amerikanischen Population, beziehungsweise der Rotbuntpopulation. Die HH3 Trägerfrequenz ist mit 7% höher als bei VanRaden et al. (2011). Die Unterschiede zwischen der amerikanischen und deutschen Population sowie den Rassen ist durch verschieden starken Einsatz einzelner einflussreicher Trägerahnen zu begründen. Dieser starke Unterschied für HH1 zwischen gehörnten und nicht gehörnten Tiere lässt sich daraus erklären, dass ein häufig 53

54 eingesetzter Sohn von Lawn Boy Träger dieses Haplotypen ist. Da er hornlos ist und gleichzeitig gute Zuchtwerte besitzt, wurde er als Bullenvater eingesetzt und vererbt neben dem Hornlos Allel den schlechten HH1 Haplotypen. Da dieser Bulle weniger in der Schwarzbuntpopulation eingesetzt wurde und auch sonst wenige Träger in der deutschen Population enthalten sind, entfällt hier der Anstieg. Der starke Einsatz einzelner Bullen ist als kritisch einzustufen, da die Wahrscheinlichkeit der Anreichung genetischer Erbdefekte erhöht wird. Die Vergangenheit hat gezeigt, dass genetische Defekte wie CVM, BLAD (Schütz et al., 2008) oder Brachyspina (Agerholm et al., 2006) vor allem durch den massiven Einsatz einzelner Bullen in der Population angereichert werden. Problematisch sind die rezessiven Gendefekte, da sich das Problem erst mit einer Verzögerung von mehreren Generationen manifestiert. Die Wahrscheinlichkeit der Verpaarung zweier Trägertiere hängt vor allem von der Allelfrequenz in der Population ab. Da die genetische Basis der hornlosen Tiere allerdings nur sehr wenige Tiere umfasst, fällt ein Ausschluss einzelner überlegener Tiere schwer. Daher ist es ratsam, eine Verpaarung zweier Trägertiere zu vermeiden. Auch hier bietet die SNP- Typisierung eine Chance, dem entgegenzuwirken, da grundsätzlich auch eine Imputation rezessiver Erbfehler denkbar wäre und hierdurch kosteneffektiv Informationen über die Ausprägung weiterer Merkmale und Erbfehler generiert werden können. In den Simulationen ist deutlich zu erkennen, dass die Selektionsstrategie einen klaren Einfluss auf die Entwicklung der Allelfrequenzen in der Gesamtpopulation hat. Je konsequenter auf einen Genotyp selektiert wird, umso schneller erhöht sich die Frequenz des Allels P. Bei dem vorliegenden dominanten Erbgang bewirkt selbst eine Selektion auf den durch den Genotyp verursachten Phänotyp eine erfolgreiche Erhöhung der Frequenz von P, die erwartungsgemäß in der Mitte der simulierten Szenarien liegt (Abbildung 4). Allerdings zieht die Praxis andere Selektionsstrategien vor. Hauptsächlich werden Bullen aufgrund ihrer hohen Zuchtwerte vermarktet und nicht wegen eines einzelnen Genotyps. In der Simulation sind daher die eingesetzten Bullen nur auf ihren Zuchtwert selektiert. Die Berücksichtigung der Hornlosigkeit fand in der Selektion der Bullenmütter statt. Hier hat ein Zuchtverband direkten Einfluss auf die Auswahl und kann seine Züchtungsstrategien anpassen. 54

55 Die Beobachtung der Zuchtwertentwicklung in den simulierten Szenarien ist besonders wichtig, da nur diejenigen mit einer erfolgreichen Zuchtwertentwicklung auch in der Praxis Anwendung finden können. Unter dieser Voraussetzung sind die Szenarien neu zu beurteilen. Die reine Selektion auf homozygot hornlose Tiere (Szenario G) bewirkt zwar den oben beschrieben Erfolg im untersuchten Genort, führt aber zum geringsten Zuchtfortschritt in der Simulation. Das Szenario A ist in allen Fällen das Referenzszenario für den möglichen Zuchtfortschritt ohne zusätzliche Berücksichtigung der Hornlosigkeit. Dieses Szenario erzeugt auf Bullen- und Kuhseite den jeweils höchsten Zuchtfortschritt, führt aber zu keinem Anstieg der Allelfrequenz von P. Es sollte eine optimale Kombination aus erzielbarem Zuchtfortschritt und Verbesserung der Allelfrequenz von P gefunden werden. Aus diesem Grund wurden die drei zusätzlichen Selektionsstrategien simuliert: Das Szenario A+BM unterstellt eine weiterhin auf Zuchtwert ausgelegte Selektion, es werden aber zusätzlich auf Hornlosigkeit typisierte Kühe selektiert. Das könnte zum Beispiel durch eine Prämienzahlung des Zuchtverbandes zur gezielten Untersuchung von Töchtern genetisch hornloser Bullen erfolgen, um diese als zusätzliche Bullenmütter einsetzen zu können. Der mittlere Zuchtwert der Bullen ist etwas höher als im Szenario A, da sich die Anzahl der Bullenmütter und damit die Selektionsintensität auf diesem Pfad erhöht hat (Abbildung 5). Durch die zusätzlichen Typisierungen erhöht sich jedoch die Allelfrequenz P nur geringfügig. Das Szenario AG unterstellt die genaue Kenntnis des Genotyps aller Kühe. Aus den Kühen mit dem Genotyp PP werden die mit dem höchsten Zuchtwert selektiert. Sind weniger als 500 PP- Kühe vorhanden, wird die fehlende Menge mit den besten Pp-Kühen ergänzt. Durch die erste Selektion auf den Genotyp wird in diesem Fall die gleiche Allelfrequenz wie im Szenario G erzielt, durch die weitere Unterselektion auf den Zuchtwert ist der erzielte Zuchtfortschritt höher als im Szenario G, aber geringer als im Szenario A. Durch die vorherige Selektion nach hornlos Genotyp werden viele Kühe mit besonders hohen Zuchtwerten nicht erfasst und das Zuchtwertniveau der selektierten Tiere ist geringer. In Abbildung 5 lässt sich in dem Szenario G ein Rückgang der mittleren Zuchtwerte in Generation 2 erkennen, da die Selektion nach dem Genotyp am Anfang eine Untergruppe PP-Kühe selektiert. Bullenmütter werden danach aus dieser Untergruppe ausgewählt. Es wird nicht nach den Zuchtwerten dieser Untergruppe 55

56 selektiert und daher haben diese keine genetische Überlegenheit. Erst nach gezielter Anpaarung aus dieser Generation sind Bullenmütter mit dem Genotyp PP genetisch besser als das Populationsmittel und ein Zuchtfortschritt auf der Bullenseite entsteht. Dieser Effekt tritt in Szenario AG weniger stark auf, da hier nur die genetisch besten Kühe der Untergruppe PP eingesetzt werden. Das Szenario P setzt keinerlei Kenntnis über den genauen Genotyp der Kühe oder Bullen voraus. Durch den dominanten Erbgang können Allelträger (PP, Pp) bereits am Phänotyp erkannt werden. Hierdurch lassen sich die potentiellen Bullenmütter bereits vorselektieren und aus dieser Untergruppe die genetisch besten auswählen. Dabei nimmt man zwar den unbewussten Einsatz von heterozygoten Kühen in Kauf, kann aber die Bullenmütter aus einer größeren Untergruppe selektieren als im Szenario AG. Hierdurch erhöht sich die Selektionsintensität in den Leistungsmerkmalen. In Abbildung 6 kann man die Auswirkungen dieser schärferen Selektion erkennen. Der Zuchtfortschritt im Szenario P ist höher als im Szenario G und AG, aber niedriger als bei A und A+BM. Die Differenz zu G ist deutlich größer als zu A. Im Gegenzug wird die Anpaarung von heterozygoten Tieren zugelassen, was sich deutlich in der Entwicklung der Allelfrequenz von P auswirkt. In Abbildung 4 liegt die Allelfrequenz im Szenario P zwischen den beiden Extrema. Vergleicht man die simulierten Szenarien in Hinblick auf Zuchtfortschritt und Allelfrequenz, bietet nur das Szenario P ein ausgeglichenes Verhältnis im erzielten Zuchtfortschritt bei gleichzeitiger Erhöhung der Allelfrequenz. Die Szenarien A und A+BM haben zwar den höchsten Zuchtfortschritt, aber keine Verbesserung der Allelfrequenz. Die Szenarien G und AG steigern zwar die Allelfrequenz, erzielen aber einen deutlich schlechteren Zuchtfortschritt. Zusätzlich zu den oben angesprochenen Parametern müssen die zusätzlichen Kosten der Leistungsprüfung berücksichtigt werden. Im Szenario A+BM fallen Kosten für die Ermittlung des genetischen Hornstatus der zusätzlichen 100 Bullenmütter an, bei Szenario G und AG sogar für alle Kühe in der Population. Ist im ersten Fall die Finanzierung noch realistisch, sind die Kosten für G und AG nicht tragbar. Im Szenario P fallen keine Laborkosten an, sondern nur Kosten für den organisatorischen Aufwand zur Erhebung des phänotypischen Hornstatus, der leicht über das Internet dem 56

57 Zuchtverband gemeldet werden könnte. Daher ist diese Selektionsstrategie auch aus ökonomischer Sicht den anderen vorzuziehen. In der Praxis fallen für eine wachsende Anzahl an Kühen SNP-Typisierungen an, aus denen kostengünstig und zuverlässig Hornlos-Genotypen imputiert werden können. Die Nutzung dieser Informationsquelle zur Bullenmütterselektion bietet sich an. Die Nutzung dieser imputierten Genotypen müssen mit den Kuhzuchtwerten in der Selektion berücksichtigt werden. Damit ergibt sich eine Selektion nach Szenario AG. Die restlichen Bullenmütter werden nach Szenario P selektiert. Damit lässt sich die Allelfrequenz auf Werte zwischen Szenario AG und P (Abbildung 4) anheben, ohne dass der Zuchtfortschritt vernachlässigt wird, der bei Szenario AG und P ähnlich ist (Abbildung 5 und 6). Die Entwicklung der Inzucht ist in Simulationen immer etwas problematisch zu beurteilen, da alle Tiere in der ersten Generation unverwandt sind. Daher steigt der Inzuchtgrad in den ersten Generationen stark an, da wenige, aber eng verwandte Familien erzeugt werden. Erst im weiteren Verlauf der Generationen vermischen sich diese Anfangsfamilien soweit, dass ein realistischer Inzuchtverlauf dargestellt wird. Die einzelnen Szenarien unterscheiden sich kaum voneinander, nur das Szenario G erzeugt weniger Verwandtschaft in der Population, da die Tiere nicht nach Zuchtwert angepaart werden. Die anderen Szenarien bevorzugen assortative Anpaarung, die aufgrund der additiv-genetischen Vererbung die Anpaarung stärker verwandter Tiere bevorzugt. Der Inzuchtverlauf der Simulation ist nicht mit der aktuellen Situation in der Praxis zu vergleichen, da dort bislang nur sehr wenig genetisch hornlose Bullen eingesetzt werden (z.b. Lawn Boy ). Diese werden jedoch überproportional häufig eingesetzt. Bei dem derzeitigen Einsatz genetisch hornloser Bullen in der Praxis ist eine höhere Inzuchtsteigerung zu erwarten, als in der Simulation. Zudem steigt die mittlere Inzuchtrate mit zunehmender Zucht auf Hornlosigkeit (Tabelle 4). 57

58 Schlussfolgerungen und Ausblick Die vorliegende Arbeit sollte die Chancen und Grenzen der Zucht auf Hornlosigkeit aufzeigen. Als Chance ist vor allem die routinemäßige SNP- Typisierung möglichst vieler Tiere anzusehen. Hierdurch können neben den erbgutbasierten Zuchtwerten zusätzlich Informationen wie rezessive Erbdefekte, bestimmte Fruchtbarkeitshaplotypen, genomische Inzuchtgrade oder der Hornstatus kosteneffizient ermittelt werden. Durch dieses Verfahren kann die genetische Basis der Hornloszucht verbreitert werden, da zusätzliche, zuvor nicht im Herdbuch eingetragene, hornlose Tiere identifiziert werden und gleichzeitig ein Einfluss negativer Erbeigenschaften frühzeitig in die Selektionsentscheidungen eingebunden werden kann. Die Grenzen der Zucht auf Hornlosigkeit liegen derzeit darin, dass es nur zwei dominierende Founder für die Hornlosigkeit gibt, deren Nachkommen im Zuge der Selektion immer wieder miteinander verpaart werden. Diese Anpaarung beinhaltet eine Reihe von Gefahren: Inzuchtsteigerung und damit verbundene Anreicherung von rezessiven Allelen (z.b. HH1), geringere Selektionsintensität und verminderter Zuchtfortschritt sowie die Reduktion der genetischen Varianz. Um eine Verbesserung der Hornlosigkeit unter Berücksichtigung der Gefahrpunkte in der Gesamtpopulation zu erzielen, muss auch auf der weiblichen Seite dieses Merkmal erhoben werden. Dabei können typisierte sowie imputierte Genotypen direkt verwendet werden. Um eine stets aktuelle Referenzpopulation für die Imputation aufrecht zu erhalten, sollte für die selektierten hornlos imputierten Bullenmütter zusätzlich ein Labortest auf Hornlosigkeit erfolgen. Zusätzliche Typisierungen nur zur Ermittlung des Hornstatus sind nicht zu empfehlen. In der Kuhpopulation wird die generelle Erfassung des Phänotyps empfohlen, da dieser bereits kostengünstig genügend sichere Informationen liefert. Die simulierten Zuchtprogramme sind so ausgelegt, dass die Selektion auf Hornlosigkeit in erster Linie über den Bullenmutterpfad erfolgt. Dadurch bleibt die genetische Variation in den züchterischen relevanten Merkmalen auf der Bullenseite für die Selektion erhalten und KB Bullen können entsprechend scharf selektiert werden. Dadurch entstehen kaum Verluste im Zuchtfortschritt, die Inzuchtzunahme kann moderat gehalten werden und das Hornlosallel kann trotzdem schnell in die Population gebracht werden. 58

59 Danksagung (1) Dem Förderverein Biotechnologieforschung e.v. (FBF, Bonn) wird für die finanzielle Unterstützung gedankt. (2) Dem EuroGenomics Konsortium wird für die Bereitstellung der Genotypen gedankt. Literatur Agerholm, J. S., F. McEvoy und J. Arnbjerg (2006): Brachyspina syndrome in a Holstein calf. J Vet Diagn Invest 18, Boichard D. (2002): Pedig: a fortran package for pedigree alaysis suited to large populations. 7th World Congress on Genetics Apllied to Livestock production, Montpellier, aout 2002, paper Browning, S. R. und B. L. Browning (2007): Rapid and accurate haplotype phasing and missingdata inference for whole-genome association studies by use of localized haplotype clustering. American Journal of Human Genetics 81, Browning, B. L. und S. R. Browning (2010): High-resolution detection of identity by decent in unrelated individuals. American Journal of Human Genetics 86, Cargill, E. J., N. J Nissing und M. D. Grosz. (2008): Single nucleotide polymorphisms concordant with the horned/polled trait in Holsteins. BMC Research Notes 2008, 1:128. Chen, J., Z. Liu, F. Reinhardt und R. Reents. (2011): Reliability of genomic prediction using imputed genotypes for German Holsteins: Illumina 3K to 50K bovine chip. Interbull Bull. 44, Dove, W. F. (1935): The physiology of horn growth: a study of the morphogenesis, the interaction of tissues and the evolutionary processes of a Mendelian recessive character by means of transplantation of tissues. J Exp Zool 69, Falconer, D. S. (1984): Einführung in die quantitative Genetik. Verlag Eugen Ulmer Stuttgart. Gengler, N., P. Mayeres und M. Szydlowski. (2007): A simple method to approximate gene content in large pedigree populations: Application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal 1,

60 Lamminger A., H. Hamann, G. Röhrmoser, E. Rosenberger, H. Kräusslich und O. Distl (2000): Beziehungen zwischen der Hornlosigkeit und den Zuchtzielmerkmalen beim Deutschen Fleckvieh. Züchtungskunde, 72, Liu, Z., F. R. Seefried, F. Reinhardt, S. Rensing, G. Thaller und R. Reents (2011): Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction. Genet. Sel. Evol. 43:19. Medugorac, I., D. Seichter, A. Graf, I. Russ, H.t Blum, K. H. Göpel, S. Rothammer, M. Förster und S. Krebs (2012): Bovine Polledness An Autosomal Dominant Trait with Allelic Heterogeneity. PLoS ONE 7:e VanRaden, P. M. (2008): Efficient Methods to Compute Genomic Predictions. J. Dairy Sci. 91, VanRaden, P. M., K. M. Olson, D. J. Null und J. L. Hutchison (2011): Harmful recessive effects on fertility detected by absence of homozygous haplotypes. Journal of Dairy Science 94, Schütz, E., M. Scharfenstein und B. Brenig (2008): Implication of Complex Vertebral Malformation and Bovine Leukocyte Adhesion Deficiency DNA-Based Testing on Disease Frequency in the Holstein Population. J. of Dairy Sci. 91, Seefried, F., Z. Liu, G. Thaller und F. Reinhardt (2010): Die Genomische Zuchtwertschätzung bei der Rasse Deutsche Holstein. Züchtungskunde, 82, Segelke, D., J. Chen, Z. Liu, F. Reinhardt, G. Thaller und R. Reents (2012): Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips. J. Dairy Sci. 95, Täubert, H., S. Rensing und F. Reinhardt (2011): Zuchtplanung mit ZPLAN+ am Beispiel genomischer Zuchtprogramme bei Holsteins. Züchtungskunde, 83,

61 Chapter 3 Considering genetic characteristics in German Holstein breeding programs D. Segelke* 1, H. Täubert*, F. Reinhardt* and G. Thaller *Vereinigte Informationssysteme Tierhaltung w.v. (vit), Heideweg 1, Verden, Germany Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany Submitted for Publication in Journal of Dairy Science 61

62 ABSTRACT Within the last years several research groups showed that different haplotypes may cause embryonic loss in the homozygote state. Up to now carriers of genetic disorders were excluded from mating resulting in a decrease of genetic gain and a reduced number of sires available for the breeding program. Ongoing research will most probably identify additional genetic defects causing embryonic loss and calf mortality by genotyping a large proportion of the female cattle population and sequencing key ancestors. Hence there is a clear demand to develop a method combining recessive defects (e.g. HH1 - HH5) with economical beneficial traits (e. g. polled) for mating decisions. Our proposed method is a genetic index which accounts for the allele frequencies in the population and the economic value of the genetic characteristic without excluding carriers from breeding schemes. Fertility phenotypes from routine genetic evaluation were used to determine the economic value per embryo loss. The embryo loss caused by HH1 and HH2 occurs later than the loss for HH3, HH4 and HH5. Therefore an economic value of 93 euro was used against HH1 and HH2 and 70 euro against HH3, HH4 and HH5. For polled 7 euro per polled calf was considered. A genomic breeding program was simulated to study the impact of changing the selection criteria from assortative mating based on breeding values to selecting the females due to the genetic index. Selection for a genetic index on the female path is a useful method to control the allele frequencies by reducing undesirable alleles and simultaneously increasing economical beneficial characteristics maintaining most of the genetic gain in production and functional traits. However, further investigation is needed to better understand the biology, to determine the correct time of embryo loss and the economic value of fertility disorders. Keywords: genetic index, lethal recessive, genomic evaluation, dairy cattle 62

63 INTRODUCTION Routine genotyping of a large proportion of the dairy population for genomic evaluation and sequencing key ancestors provides the possibility to discover and monitor recessive genetic disorders. VanRaden et al. (2011a) were one of the first research groups who used the genotype pool from routine genomic evaluation to screen for recessive fertility haplotypes. They showed that three different Holstein haplotypes, (HH1, HH2 & HH3) may cause embryonic loss in the homozygote state. Studies from Denmark (Sahana et al., 2013), the United States (Cooper et al., 2013) and France (Fritz et al., 2013) confirmed these results and identified additional haplotypes (among others HH4 & HH5). These were also associated with a decrease of fertility due to a potential embryonic loss. Researchers have already found the causal mutation for HH1 (Adams et al. 2012), HH3 (Daetwyler et al., 2014, McClure et al., 2014) and HH4 (Fritz et al. 2013) by using whole genome re-sequencing data of the populations key ancestors (Jansen et al., 2013). However, the causal mutation for HH2 and HH5 is still unknown (McClure et al., 2014). With introduction of genomic selection average inbreeding level increases more than lineary. This strengthens the risk of potentially many new genetic disorders. In general for mating decisions and publication there will be a clear demand to combine this potentially large number of disorders with economical beneficial genetic characteristics. Detecting and management of genetic disorders will have an impact on genetic gain for fertility traits, animal welfare and the overall image of the breed (Egger-Danner et al., 2014). Furthermore there are economical beneficial traits like polled (Medugorac et al. 2012, Rothammer et al. 2014) which should be expanded in the German Holstein population to avoid dehorning young calves. In the past, allele frequencies of genetic disorders like Complex vertebral malformation (CVM; Agerholm et al., 2001), bovine leukocyte adhesion deficiency (BLAD; Shuster et al., 1992), Deficiency of Uridine Monophosphate Synthase (DUMPS; Shanks et al., 1984) and Brachyspina (Agerholm et al., 2006; Charlier et al., 2012) were drastically decreased by excluding carrier bulls for artificial insemination. Superior bulls were excluded from mating irrespective of their genetic merit and the frequency of the genetic disorder in the population. This results in lower genetic gain for production and functional traits (Van Eenennaam and Kinghorn, 2014). An example by VanRaden et al. (2014) showed that the famous key ancestor PAWNEE FARM ARLINDA CHIEF carries the HH1 haplotype. This sire contributed 14% of his gene variants to 63

64 the current Holstein population, which increased the milk yield by 25 billion dollar. In contrast the costs for mid term abortions due to HH1 were only 0.4 billion dollar. For each genetic characteristic comprehensive investigation including allele frequencies, inheritance, the economic value and the causal mutation are needed to find the best way of handling the genetic characteristics in breeding programs. In most cases the carrier frequency can be managed by finding the appropriate mating partner (e.g. mating a carrier sire only to non carrier dams). The aim of this study was to derive the economic values for the most important genetic characteristics, segregating in German Holsteins and to develop an index of genetic properties summarizing the genetic characteristics and their economic values. MATERIALS AND METHODS Identification of the considered genetic characteristics In this study, we used the information of five recessively inherited disorders of fertility (HH1 - HH5) and one economical beneficial trait (polled) segregating in the German Holstein population. BLAD and DUMPS were not considered because the carrier frequencies are very low in the recent German Holstein population (BLAD 0.2%; DUMPS <0.1%). Meaning that the probability of mating two carriers is very low. CVM and Brachyspina could not be considered based on their patent protection. (Bendixen et al., 2014; Georges et al., 2010). To investigate the development of the carrier frequencies over time, 143,511 Holstein animals from routine German Holstein genomic evaluation (February 2015) were chosen. The genotype states for HH1 - HH5 were derived by the haplotype based missing homozygosity approach from VanRaden et al. (2011a). Table 1 shows the used location and the mean minor allele frequency for the birth years 2012 to Since the end of 2014, the EuroGenomics10KV4 chip is used for routine genotyping. This chip contains the known causal mutation for HH1, HH3 and HH4 twice, facilitating the validation of the genotyped status with the derived haplotype status for 7,032 animals. This enables an estimation of the reproducibility for each mutation. The polled state was derived by a method described by Segelke et al. (2013). Briefly, this approach uses the polled entries from the herdbook as an additional marker within the polled region (Table 1) for the reference population. For animals with unknown polled state the 64

65 additional marker is set to missing and afterwards imputed using the Beagle (Browning & Browning, 2007) software package. Segelke et al. (2013) showed that the allele error rate for the imputation of the polled state compared to the herdbook entry was 0.2%. Table 1. Location and allele frequency of the considered characteristics. Characteristic Chromosome Map location (Bp) Minor allele frequency (%), HH1 5 62,394,447 63,983, HH2 1 93,172,083 98,133, HH3 8 95,003,606 96,266, HH4 1 2,128,924 2,942, HH5 9 92,350,052 93,910, Polled 1 845,494 4,052, year of birth Definition of the economic values The economic value of the fertility defects were indirectly quantified by analyzing the fertility and calving records from routine genetic evaluation (April 2015) for German Holsteins (vit, 2015). Non-return-rate (NRR) 56 and 90 as measure of increased calving interval was analyzed to estimate the time point of embryonic loss. The NRR is defined as specified period where no consecutive insemination is recorded assuming that the first insemination was successful. If a cow was culled within the lactation, the insemination records of the actual lactation were ignored. The final data set contains 12,439,144 insemination records. For all these records, sire and maternal grandsire of the embryo were genotyped. This means that a carrier status for each genetic characteristic could be derived. Cows and heifers were born between 2000 and Mean NRR56 and NRR90 heifers were 57.5% and 48.5%, respectively 52.5% and 42.7% for cows. Phenotypic correlation between NRR56 and NRR90 was 0.83 for heifers and 0.82 for cows. NRR of risk mating, meaning that embryo sire and maternal grandsire both carry a specific characteristic, were compared to NRR of non risk mating where both were non-carriers. For carrier risk mating on average 12.5% of all embryos will die because of the homozygote state of the genetic defect. In contrast if sire and/or maternal grandsire do not carry a specific fertility 65

66 defect no embryos will die due to the specific defect. Economic value of embryo loss was estimated as costs for an additional insemination and costs for increased calving interval. The economic value of polled was estimated to be 7 euro per calf (5 euro labor and 2 euro for drugs). The social policy value and the current higher marketing price of a polled calf were not considered. Economic value of 7 euro per polled calf is in agreement with Widmar et al. (2013) who compared different dehorning scenarios. Expected costs of dehorning varied from 5.84 to dollar with a mean of dollar. No extra costs for direct gene tests were considered because causal mutation can be easily added to customized chips used in routine genomic evaluation. Genetic index Table 2 illustrates the average economic effect of a genetic characteristic in a given population (Falconer, 1996). Table 2. Effect of a genetic characteristic on a given population. (Falconer, 1996) Genotype animal AA Average economic effect in a given population 2q α AB (q-p) α BB -2p α In general, the effect on the population depends on the genotype of the animal, the allele frequency (p and q) and the economic value (α). The genetic index (GI) is the sum of all considered characteristics: GI = AV. Where AVk is the average economic effect of the genetic characteristic k and n is the total number of considered genetic characteristics. 66

67 The influence of a genetic index on production and functional breeding values and allele frequencies were studied by a simulation of a genomic breeding program. Additionally the genetic index was applied to real data by calculating the genetic index for each genotyped animal. Simulation of a genomic breeding program The genomic breeding program was simulated as reported by Täubert et al. (2012) and Segelke et al. (2013). A schematic overview is given in Figure ,000 cows with milk recording Bullsires 2,500 sire dams = best 1% 500 genotyped bull calves Cowsires 95% 29 genomically tested sires 94% 5% 1 proven sire 6% Figure 1. Simulation of a genomic breeding program. The breeding program had a size of 250,000 dairy cows under milk recording, from which the best 1% were selected as bull dams based on their breeding values. From these dams 1,250 male selection candidates were born, 500 bull calves were genotyped out of the 1,250 candidates. 29 of these 500 bull calves were selected based on their genomic breeding values and used as service sires as soon as they produce semen. One bull was selected for further second crop service. To realize as much genetic gain as possible, 95% of sire of sons are young genomic tested bulls. 67

68 Only 5% of all sires of sons are also daughter proven. Cow sires are 94% genomic proven and 6 % daughter proven bulls. The cow base population had a mean of 100 and a standard deviation of 20. The reliabilities of the estimated breeding values were 50%. Artificial insemination bulls are superior compared to the cows, therefore 500 bulls were simulated in the first generation with a mean of 130 and a standard deviation of five. The best 30 animals were selected as sires. The reliability for the estimated breeding values was 67% for the genomic bulls and 99% for the daughter proven bulls. There was no mating of closely related animals. 10 generations were simulated and each scenario was repeated 100 times. Starting allele frequencies for the base population were the average allele frequency of the animals born between 2012 and 2014 (Table 1). Two different breeding scenarios for the females were analyzed: Scenario A: All animals were selected due to their breeding values. The genetic index was not considered. Scenario I: Animals on the dam path were ranked and selected due to the genetic index. In both scenarios the AI bulls were selected due to the estimated breeding values to maintain the genetic gain for the production and functional traits. RESULTS AND DISCUSSION Validation of the missing homozygosity approach The comparison between the genotypes of HH1, HH3 and HH4 from the custom SNP array and the genotypes derived from the missing homozygotes approach is shown in Table 3. Two genotyped embryos were homozygous BB for HH3. In total 7,032 animals were compared and a high agreement between the SNP chip based and haplotype based results were found (genotype error rate below 0.3%). For all three defects the call rate was 99.8%. Interestingly the reproducibility identical position differs between the mutations. For HH4 no unequal genotypes were recorded whether for HH1 0.09% of all animals had unequal genotypes, although the identical position was analyzed. 68

69 Table 3. Comparison between carrier statues derived from custom SNP array and indirect genotype derived from haplotypes. Chip based Haplotype Carrier State AA AB BB No call Error call Haplotype Based HH1 HH3 HH4 AA: Homozygote non-carrier AB: Heterozygote carrier BB: Homozygote carrier No call: One or two SNP without a result Error call: Results of the both SNP were unequal AA AB BB AA AB BB AA AB BB Carrier frequencies of the considered traits The development of the carrier frequencies over the years of birth for the six considered traits is illustrated in Figure 2. 69

70 Figure 2. Minor allele frequencies of the analyzed recessive fertility defects and polled. No evidence for a selection against the recessive haplotypes was found. Carrier frequencies of HH1, HH2 and HH4 are actually decreasing. In contrast carrier frequencies of HH3 and HH5 are still increasing because popular sires like PICSTON SHOTTLE and O-BEE MANFRED JUSTICE-ET, with a high influence on the actual German Holstein population, are carriers of these haplotypes. The famous sire JOCKO BESN carries HH4 and has a lot of popular sons, resulting in an increase of the carrier frequency for the birth years 2005 to The sons of JOCKO BESN were famous cow sires, consequently the allele frequency of HH4 drops because mainly male animals were genotyped at this time. Polled phenotype is increasing since 2010 because of the social policy force to avoid dehorning. As reported by Segelke et al. (2013) the gain mainly based on the excessive use of the sire AGGRAVATION LAWN BOY P-RED, which potentially leads to a high inbreeding level for polled animals. Minor allele frequencies of birth years 1990 to 2009 mainly base on genotyped bulls representing the genomic reference population. However, due to increased number of genotyping and genotyping complete herds in near future the genotype pool from genomic evaluation can be used as a monitoring system for genetic characteristics. Figure 2 shows that strategies to reduce and manage allele frequencies especially for HH3 and HH5 are needed. 70

71 Impact of missing homozygote haplotypes on fertility and economic values To determine the time of embryo loss for each recessively inherited disorder of fertility NRR56 and NRR90 for heifers and cows were analysed to approximate the economic value (Table 4). Table 4. Impact of recessive defects on fertility phenotypes. Characteristic NRR56 (%) heifer NRR90 (%) NRR56 (%) cow NRR90 (%) Economic value per embryo ( ) HH1-2.0 ± ± ± ± HH2 0.1 ± 1.6 NS 0.1 ± 1.6 NS 0.2 ± 1.4 NS 0.8 ± 1.4 NS -93 HH3-4.1 ± ± ± ± HH4-2.4 ± ± ± ± HH5-5.4 ± ± ± ± NRR: Non-return-rate NS: Not significant (p>0.05) : NRR non-risk mating NRR risk mating For HH1 a decrease of NRR56 in heifers and cows was observed for risk mating in comparison to non-risk mating. Additionally a higher negative effect on NRR90 was found in contrast to NRR56. Indicating that the embryo dies until day 90 of gestation. The estimation of the economic value per lost embryo was based on the averaged day of the embryo s death. The average date of death at day 63 includes a 63 day extended calving interval. Assuming marginal costs of 1.5 euro per day of extended calving interval and 15 euro per insemination results in a total economic value of 93 euro per lost embryo for HH1. The allele frequency of HH2 was very low during the last years (Figure 2), preventing the occurrence of risk mating. No significant effect was found in this data set. Similar to HH1 it was assumed that the embryo dies on average on day 63 meaning an economic loss of 93 euro occurs. For HH3, HH4 and HH5 a main effect on NRR56 was observed, implying the death of the embryo before day 56. It was assumed that on average the embryo dies on day 42 (2 fertility cycles) yielding in an economic value per lost embryo of 70 euro. The estimated effect of HH1 on NRR are in agreement with VanRaden et al. (2011b). They reported an effect of -1.1% on NRR60 and -1.6% on NRR100. The authors concluded that the 71

72 embryo dies throughout the gestation. Fritz et al. (2013) found a higher effect of HH1 on overall heifer and cow calving rate. Non significant results of HH2 are in agreement with Fritz et al. (2013), who also were not able to significantly assure the effect on NRR because of low number of risk mating. VanRaden et al. (2011b) reported an effect on NRR60 of -1.7% and -3.0% for NRR100, supporting our hypothesis that the embryo dies between day 60 and 100. However, VanRaden et al. (2011b) concluded that for HH2 the embryo loss is mainly before 60 days. The results of HH3 and early embryonic loss are in agreement with VanRaden et al. (2011b) and Fritz et al. (2013). The estimated effects on fertility phenotypes mainly depend on the allele frequency in the population. The estimated effects between countries differ because of differences in the use of bull sires between the countries. The estimated economic values in this study are very similar because currently mainly disorders causing fertility aborts are known. Genetic disorders occur across breeds and across species (Nicholas and Hobbs, 2014). Recently many genetic disorders were identified in the Simmental breed (Jung et al., 2014; Pausch et al., 2015). These disorders have diverse allele frequencies and time points of embryo or calf loss lead to different economic values. Egger-Danner et al. (2014) showed that losses for Fleckvieh Haplotype 2, Dwarism and Zink-Deficiency-Like-Syndrom have an economic value of 350 euro because the calf is not marketable. Considering veterinary costs an even higher economic loss is realistic. Costs for Arachnomelia are approximately 700 euro because additionally a damage of the cow might occur (Egger-Danner et al., 2014). The authors expect a 7% loss in annual monetary genetic gain and a 9% reduction in discounted profit when all male carriers of the six considered defects are excluded from mating. Ongoing identification of genetic defects leads to more difficulties in the exclusion of all carriers from mating. Van Eenennaam and Kinghorn (2014) showed that decrease of genetic gain depends on number of lethal loci and allele frequencies. Selection against a small number of defects with intermediate allele frequencies reduced genetic gain to 92.5%. However if 100 loci were modeled the genetic gain decreased to 86%. Having a large number of known genetic characteristics and different economic values enable the power of a genetic index without excluding carriers from mating. Additionally the application of a genetic index considering economic values on the female path ensures a progress in the production and functional traits. 72

73 Genetic index The development of the breeding values over 10 generations for bulls and dams are shown in Figure 3. Figure 3. Development of the breeding values over 10 generations. In scenario A (solid line), all animals were selected using their breeding values. In scenario I (dotted line), animals on the dam path were ranked and selected due to the genetic index. A large increase of the breeding values is expected in the next generations. Changing the mating strategy of the female side from assortative mating to selection on genetic index (scenario I) results in a loss of genetic progress compared to scenario A. Lower selection intensity is the reason for lower genetic response. Figure 4 shows the development of the genetic index over 10 generations for both scenarios. Selecting the females due to scenario I instead of selecting them due to the breeding values (scenario A) results in a high increase of the genetic index in both sexes. 73

74 Figure 4. Development of the genetic index over 10 generations. In scenario A (solid line), all animals were selected using their breeding values. In scenario I (dotted line), animals on the dam path were ranked and selected due to the genetic index. The development of the allele frequencies for the six analyzed traits are illustrated in Figure 5. Recessive fertility defects are decreasing over time because homozygote embryos do not survive. The polled genotype slightly increase in scenario A because of the dominant inheritance. In scenario I a high increase of the polled allele frequency occurs because the positive economic value results in a additional selection for polled. In comparison the recessively inherited disorders of fertility with a negative economic value result in an additional selection against these traits. In conclusion, selection for a genetic index considering the economic value is an efficient method to control the allele frequencies by reducing undesirable alleles and simultaneously increasing economical beneficial traits. 74

75 Figure 5. Development of the mean allele frequencies over 10 generations. Genetic index applied to real data The concept of an index for genetic characteristics as sum of the specific attributes in respect to their allele frequencies in the population and their economic values (Table 1) was applied to the genotype data pool from routine genomic evaluation. For all these individuals an animal specific genetic index was calculated. A decrease of the genetic trend of 30 euro on average can be found for the birth years 1990 to 2010 (Figure 6). This implies that the overall number of genetic defects in the population increased over time. This can be explained by high frequencies of HH1 from Afterwards frequencies of HH3 and HH5 with a lower economic value compared to HH1 increased. Since 2010 the polled frequency is strongly increasing which raises the genetic index. 75

76 Figure 6. Genetic index applied to genotyped animals for the year of birth Analyzing the total number of known recessive defects each animal carries showed that 83.21% of all genotyped animals were free from all considered genetic defects % had one genetic defect, 1.07% carried two genetic defects, 0.04% had three defects and one animal carried four fertility defects. We additionally investigated the relationship between the genetic index and genomic enhanced fertility breeding values. No significant association between fertility breeding values and the genetic index was noticed because most of the animals carried any currently known genetic defect. No associations between number of defects and fertility breeding values were found. This implies that breeding values only slightly consider known recessive defects and no double counting occurs. Additional impact of the fertility disorders on fertility breeding values depend on the allele frequencies in the population. Intermediate allele frequencies would lead to a significant embryo loss. Most of the considered fertility defects had low and fluctuating minor allele frequencies. Taking the genetic defects into account avoids carrier mating. This eliminate the risk of homozygous affected embryos and allows for a quick decrease of the allele frequencies. Mating only by fertility breeding values can not limit the increase of the allele frequencies (Figure 2). 76

77 Future aspects Routine genotyping of large proportion of the population allows discovering, managing and monitoring genetic disorders and economical beneficial genetic characteristics. However even with next generation sequencing technologies for some defects the causal mutation is still unknown (McClure et al., 2014). For defects where the mutation is known additional research is needed to understand the biology of the mutation. Analyzing data from embryo transfer stations might be an opportunity to more precise determination of the time of embryonic loss compared to the indirect use of NRR data sets. Additionally more research is needed to determine whether the genetic effects of the recessive alleles are already summarized in the fertility breeding values and how the genetic index can be integrated in the total merit indices. Genomic mating plans are needed for computer based accounting for all genetic characteristic according there economic values. CONCLUSIONS Ongoing research will detect a lot of new genetic disorders. There is a clear demand to consider the genetic characteristics due to their economic value and combine them to an index. A genetic index is a method to combine different genetic characteristics with different economic values and allele frequencies. For breeding decisions the index should be used for the female path and bulls should be selected due to breeding values of production and functional traits to maintain the genetic gain. Mating recommendations should be calculated using mating programs taking all genetic characteristics of mating partners into account. However further investigation is needed to determine the correct time of embryo loss and the economic value. ACKNOWLEDMENTS German national organization FBF is thanked for financial support. 77

78 REFERENCES Adams, H. A., T. Sonstegard, P. M. VanRaden, D. J. Null, C. Van Tassell, and H. Lewin Identification of a nonsense mutation in APAF1 that is causal for a decrease in reproductive efficiency in dairy cattle. Proc. Plant Anim. Genome XX Conf., abstr. P0555. Agerholm, J. S., C. Bendixen, O. Andersen, and J. Arnbjerg Complex vertebral malformation in Holstein calves. J. Vet. Diagn. Invest. 13: Agerholm, J. S., F. McEvoy, and J. Arnbjerg Brachyspina syndrome in a Holstein calf. J. Vet. Diagn. Invest. 18: Browning, S. R., and B. L. Browning Rapid and accurate haplotype phasing and missing data inference for whole genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81: Bendixen, C., S. Soren, J. Helle, P. Frank, P. A. Aasberg., H. Lars-Erik, H. Per, H. Anette, T. Bo, J. Mette, N. B. Vivi Hunnicke, and J. M. Ssergij Genetic Test for the Identification of Carriers of Complex Vertebral Malformations in Cattle. US Accessed April 14, Cooper, T. A., G. R. Wiggans, P. M. VanRaden, J. L. Hutchison, J. B. Cole, and D. J. Null Genomic evaluation of Ayrshire dairy cattle and new haplotypes affecting fertility and stillbirth in Holstein, Brown Swiss and Ayrshire breeds. ADSA-ASAS Joint Annual Meeting, poster T206. Charlier, C., J. S. Agerholm, W. Coppieters, P. Karlskov-Mortensen, W. Li, G. de Jong, C. Fasquelle, L. Karim, S. Cirera, N. Cambisano, N. Ahariz, E. Mullaart, M. Georges, and M. Fredholm A deletion in the bovine FANCI gene compromises fertility by causing fetal death and brachyspina. PLoS ONE 7:e

79 Daetwyler, H. D., A. Capitan, H. Pausch, P. Stothard, R. van Binsbergen, R. F. Brøndum, X. Liao, A. Djari, S. C. Rodriguez, C. Grohs, D. Esquerré, O. Bouchez, M.-N. Rossignol, C. Klopp, D. Richa, S. Fritz, A. Eggen, P. J. Bowman, D. Coote, A. J. Chamberlain, C. Anderson, C. P. Van Tassell, I. Hulsegge, M. E. Goddard, B. Guldbrandtsen, M. S. Lund, R. F. Veerkamp, D. A. Boichard, R. Fries, and B. J. Hayes Whole-genome sequencing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nature Genet. 46: Egger-Danner, C., H. Schwarzenbacher, C. Fuerst, and A. Willam Analysis of breeding strategies against genetic disorders in Austrian Fleckvieh cattle. Proc. World Congr. Genet. Appl. Livest. Prod. Falconer, D. S, and F. C. MacKay Introduction to Quantitative Genetics. 4th ed. New York: John Wiley & Sons; Fritz, S, A. Capitan, A. Djari, S. C. Rodrigue, A. Barbat, A. Baur, C. Grohs, B. Weiss, M. Boussaha, D. Esquerré, C. Klopp, D. Rocha, and D. Boichard Detection of Haplotypes Associated with Prenatal Death in Dairy Cattle and Identification of Deleterious Mutations in GART, SHBG and SLC37A2. PLoS ONE 8:e Georges, M., W. Coppieters, C. Charlier, J. S. Agerholm, and Fredholm, M A genetic test for Brachyspina and fertility in cattle. Patent application WO Accessed April 12, Jansen, S., B. Aigner, H. Pausch, M. Wysocki, S. Eck, A. Benet-Pagès, E. Graf, T. Wieland, T. M. Strom, T. Meitinger, and R. Fries Assessment of the genomic variation in a cattle population by re-sequencing of key animals at low to medium coverage. BMC Genomics 14:446. Jung, S., H. Pausch, M. C. Langenmayer, H. Schwarzenbacher, M. Majzoub-Altweck, N. S. Gollnick, and R. Fries A nonsense mutation in PLD4 is associated with a zinc deficiency-like syndrome in Fleckvieh cattle. BMC Genomics 15:623. McClure, M. C., D. Bickhart, D. Null, P. VanRaden, L. Xu, G. Wiggans, G. Liu, S. Schroeder, J. Glasscock, J. Armstrong, J. B. Cole, C. P. Van Tassell, and T.S. Sonstegard Bovine exome sequence analysis and targeted SNP genotyping of recessive fertility defects BH1, HH2, and HH3 reveal causative mutation in SMC2 for HH3. PLoS ONE 9:e

80 Medugorac, I., D. Seichter, D., A. Graf, I. Russ, H. Blum, K.H. Göpel, S. Rothammer, M. Förster, and S. Krebs Bovine polledness An autosomal dominant trait with allelic heterogeneity. PLoS ONE 7:e Nicholas, F., and M. Hobbs Mutation discovery for Mendelian traits in nonlaboratory animals: a review of achievements up to Anim. Genet. 45: Pausch, H., H. Schwarzenbacher, J. Burgstaller, K. Flisikowski, C. Wurmser, S. Jansen, S. Jung, A. Schnieke, T. Wittek, and R. Fries Homozygous haplotype deficiency reveals deleterious mutations compromising reproductive and rearing success in cattle. BMC Genomics 16:312. Rothammer, S., A. Capitan, E. Mullaart, D. Seichter, I. Russ, and I. Medugorac The 80-kb DNA duplication on BTA1 is the only remaining candidate mutation for the polled phenotype of Friesian origin. Genet. Sel. Evol. 46:44. Shanks, R. D., D. B. Dombrowski, G. W. Harpestad, and J. L. Robinson Inheritance of UMP synthase in dairy cattle. J. Hered. 75: Shuster, D. E., M. E. Kehrli Jr., M. R. Ackermann, and R. O. Gilbert Identification and prevalence of a genetic defect that causes leukocyte adhesion deficiency in Holstein cattle. Proc. Natl. Acad. Sci. USA 89: Sahana, G., U. S. Nielsen, G. P. Aamand, M. S. Lund, and B. Guldbrandtsen Novel Harmful Recessive Haplotypes Identified for Fertility Traits in Nordic Holstein Cattle. PLoS ONE 12:e Segelke, D, H. Täubert, F. Reinhardt, and G. Thaller Chancen und Grenzen der Hornloszucht für die Rasse Deutsche Holstein. Züchtungskunde 85:4. Täubert, H., S. Rensing, and F. Reinhardt Comparing Conventional and Genomic Breeding Programs with ZPLAN+. Interbull Bulletin 44. Vit, 2015: Estimation of Breeding Values for Milk Production Traits, Somatic Cell Score, Conformation, Productive Life and Reproduction Traits in German Dairy Cattle. Accessed April 4,

81 Van Eenennaam A. L., and B. P. Kinghorn Use of mate selection software to manage lethal recessive conditions in livestock populations. Proc. World Congr. Genet. Appl. Livest. Prod. VanRaden, P. M., K. M. Olson, D. J. Null, and J. L. Hutchison. 2011a. Harmful recessive effects on fertility detected by absence of homozygous haplotypes. J Dairy Sci 94: VanRaden P. M., K. M. Olson, D. J. Null, and J. L Hutchison. 2011b. Reporting of Haplotypes with Recessive Effects on Fertility. Interbull Bulletin 44. VanRaden P. M., C. Sun, T. A Cooper, D. J. Null, and J. B. Cole Keynote presentation III: Genotypes are useful for more than genomic evaluation. Proc. 39th Int. Commun. Anim. Recording Sess., Berlin, Germany, May 19 23, 4 pp Widmar, N.J. O., M. M. Schutz, and J. B. Cole Breeding for polled dairy cows versus dehorning: Preliminary cost assessments and discussion. J. Dairy Sci. 96(E-Suppl. 1):602(TH373). 81

82 Chapter 4 Prediction of expected genetic variation within groups of offspring for innovative mating schemes Dierck Segelke 1,2, Friedrich Reinhardt 1, Zengting Liu 1, Georg Thaller 2 1 Vereinigte Informationssysteme Tierhaltung w.v. (vit), Heideweg 1, Verden, Germany 2 Institute of Animal Breeding and Husbandry, Christian-Albrechts-University, Kiel, Germany Published in Genetics Selection Evolution 2014, 46:42 82

83 Abstract Experience from progeny-testing indicates that the mating of popular bull sires that have high estimated breeding values with excellent dams does not guarantee the production of offspring with superior breeding values. This is explained partly by differences in the standard deviation of gamete breeding values (SDGBV) between animals at the haplotype level. The SDGBV depends on the variance of the true effects of single nucleotide polymorphisms (SNPs) and the degree of heterozygosity. Haplotypes of Holstein animals were used to predict and investigate expected SDGBV for fat yield, protein yield, somatic cell score and the direct genetic effect for stillbirth. Differences in SDGBV between animals were detected, which means that the groups of offspring of parents with low SDGBV will be more homogeneous than those of parents with high SDGBV, although the expected mean breeding values of the progeny will be the same. SDGBV was negatively correlated with genomic and pedigree inbreeding coefficients and a small loss of SDGBV over time was observed. Sires that had relatively low mean gamete breeding values but high SDGBV had a higher probability of producing extremely positive offspring than sires that had a high mean gamete breeding value and low SDGBV. An animal s SDGBV can be estimated based on genomic information and used to design specific genomic mating plans. Estimated SDGBV are an additional tool for mating programs, which allows breeders to identify and match mating partners using specific haplotype information. Background Within the last years, dairy cattle breeding schemes have changed drastically with the availability of routine dense single nucleotide polymorphism (SNP) chips. Initially, research focused mainly on estimation of genomic breeding values (Lund et al., 2011; Liu et al., 2011; VanRaden et al., 2009) and more recently, on imputation from low-density marker sets to denser marker sets (Erbe et al., 2012; Segelke et al., 2012; Wiggans et al., 2012). In addition to genomic breeding values, other information can also be derived from dense marker information, such as parentage verification (Heaton et al., 2002). In addition, VanRaden et al. (2011) identified haplotypes with genetic lethal effects that may lead to embryonic death in the homozygous state. Moreover, 83

84 genetic characteristics such as horn status (Segelke et al., 2013) can be predicted with routine SNP information. In addition, genotyping large numbers of animals and dense SNP datasets makes it possible to characterize genetic variation at the chromosome and haplotype levels (Cole and VanRaden, 2011; Cole and Null, 2013). Consequently, SNP haplotype information can be used to estimate the expected variance of breeding values at the gamete level. Variation between gametes is generated by random sampling of parental haplotypes during meiosis (Cole and VanRaden, 2011) if the dam and / or the sire are heterozygous. Knowledge on the mean (MGBV) and standard deviation of gamete breeding values (SDGBV) assuming normally distributed estimated breeding values allows the development of specific mating plans. For example, the probability that the breeding value of an offspring exceeds a certain threshold can be estimated. In addition, it is possible to predict the number of animals to be tested to produce an offspring with an estimated breeding value above a given threshold. Cole and VanRaden (2011) discussed the possibility of selecting animals for which gamete breeding values vary little, in order to produce more homogeneous progeny and simplify herd management. Conversely, breeding companies may be more interested in heterogeneous progeny to increase the probability of extremely positive offspring. In line with this, experience with progeny-testing indicates that the use of popular sires with high estimated breeding values and many tested offspring does not guarantee that male offspring with superior breeding values are produced. In contrast, bulls for which fewer male offspring are tested sometimes produce more excellent offspring than popular bulls. The objective of this study was to predict and investigate the expected SDGBV using genomic information and to demonstrate its usefulness to improve mating decisions. Methods Data A total of Holstein animals genotyped with the Illumina BovineSNP50 BeadChip (Illumina Inc., San Diego, CA, USA) obtained from routine genomic evaluation for German Holsteins (Liu et al., 2011) (February 2013) were chosen for the study. Of the 50k SNPs on this chip, autosomal SNPs that had a minor allele frequency greater than 1% were selected. 84

85 The algorithm reported by Hayes (2011) was used to check whether genotype information agreed with the pedigree information. Only genotypes with a call rate greater than 98% were used. The software package Beagle (version 3.3, Browning and Browning, 2010) with default settings was used for imputation of missing marker genotypes and for phasing the genotypes. For this purpose, Beagle uses linkage disequilibrium at the population level. The order of the SNPs on the chromosomes was based on the UMD3.1 bovine genome assembly (Zimim et al., 2009). Four traits (fat yield, protein yield, somatic cell score and the direct genetic effect for stillbirth) with different genetic architectures, heritabilities and genomic reliabilities were chosen. SNP effects were estimated with a BLUP model assuming trait-specific residual polygenic variance (for more details on the model see (Liu et al., 2011)). Pedigree and genomic relationships The pedigree contained genotyped animals ( females and males) and their ancestors. All sires and dams of the genotyped animals were known. The animals were born between 1960 and 2013 and were descendants from 2768 different sires and different dams. Genomic inbreeding coefficients were calculated by setting up the diagonal elements of the genomic relationship matrix, as suggested by VanRaden (2008). Allele frequencies in the base population were estimated using the gene content method described by Gengler et al. (2007). Flow of information A scheme of the flow of information through the different steps of the estimation of MGBV and SDGBV is in Figure 1. First, the software package Beagle was used to phase the SNP genotypes and construct haplotypes. The haplotypes, SNP effects, and in order to define haplotype size, a map of recombination events were used to estimate haplotype specific breeding values (program hapdgv.f90). These results were the inputs for estimating MGBV and SDGBV (program genvar.f90). The resulting data and the pedigree and animal ownership information were then used for the mating software. 85

86 Figure 1 Flow of data and programs used to estimate MGBV and SDGBV genotypes beagle SNP effects recombination map haplotypes hapdgv.f90 haplotype DGV genvar.f90 pedigree MGBV/SDGBV ownership information mating software 86

87 Prediction of mean and standard deviation of gamete breeding values MGBV and SDGBV were obtained by sampling different sets of transmitted haplotypes from the animals. In theory, with 29 autosomal chromosomes and ignoring the sex chromosome, there are 2 29 possible combinations of sampled haplotypes if the length of a haplotype is defined as one autosome and recombination is ignored. Assuming that, on average, one recombination occurs per centimorgan, there is a near unlimited number of possible combinations of haplotypes. Thus, to make the simulation computationally feasible and to reduce the number of haplotype combinations, the genome was divided into 1856 chromosome segments (C) according to positions in the genome where a high number of recombination events occurred. These recombination events were identified in a preliminary study (results not shown here) in which a whole genome map of the number of crossing-over events was derived by identifying phase switches between the haplotypes of the sires and the paternal haplotypes of their sons. In the first step of the simulation of the SDGBV within an animal (program hapdgv.f90), the parental and maternal haplotype breeding values for each animal were calculated as: h = z α, where h ij is the i th haplotype, with j the indicator of maternal or paternal haplotype, z is the maternal or paternal allele of marker k, α k is half of the estimated effect of the k th SNP from routine genomic evaluation of German Holstein cattle (Liu et al., 2011), and n is the number of SNPs belonging to the i th haplotype. Imprinting, dominance and epistasis were not considered in the simulation. In the second step, using the program genvar.f90, possible gametes were simulated by selecting either the maternal or paternal phase from an animal. At the beginning of the chromosome, the probability of selecting the maternal or paternal strand was equal to 50%. Location of cross-overs was implemented in the simulation based on a uniform distribution over the interval [0,C] (C being the number of chromosome segments). The mean recombination rate between the haplotype strands was set to 0.3, which is in line with the number of expected recombinations assuming one recombination per Morgan. The MGBV of a parent was calculated as: MGBV = h, where N is the number of replicates of the simulation, H is the number of haplotypes, and h ij is the i th parental or maternal haplotype breeding value. 87

88 The SDGBV of a parent was calculated as: SDGBV = h! h #. Correlations between traits were analyzed for MGBV and SDGBV to investigate relationships between traits. To study whether selection, which should result in increased inbreeding and homozygosity per generation, had an antagonistic effect on MGBV and SDGBV, correlations of SDGBV and MGBV with the genomic (F G ) and the pedigree (F P ) inbreeding coefficients were computed for each trait. Furthermore, MGBV and SDGBV were tested for normality. Validation Results of the simulation were validated by reconstructing the paternally transmitted haplotype for each animal. Then the paternally transmitted haplotype breeding value was estimated, by summing the paternally transmitted haplotype, which in this case refers to haploid chromosomes, with half the estimated SNP effects. A sensitivity analysis was performed to determine the size of the progeny groups per sire needed for validation. The observed mean and standard deviation of the estimated breeding values of the offspring were compared with the mean and standard deviation obtained from the simulation and correlations were computed. Mating plan Subsequent to the prediction of MGBV and SDGBV, specific matings were designed using newly developed mating software, which also includes animal ownership information and pedigree data. The expected mean breeding value of a potential offspring was calculated as: mbv = MGBV s + MGBV d, where mbv is the expected breeding value of an offspring based on the parental average estimated breeding values, MGBV s is the estimated mean gamete breeding value of the sire, and MGBV d is the estimated mean gamete breeding value of the dam. Standard deviation of breeding values of the progeny, assuming no covariance between sire and dam, was calculated as: sbv=sdgbv % +SDGBV ', 88

89 where sbv is the expected standard deviation of breeding values within the potential offspring of the same mating, SDGBV s is the standard deviation of gamete breeding values of the sire, and SDGBV d is the standard deviation of gamete breeding values of the dam. In addition, the probability to obtain offspring with a breeding value over a given threshold was calculated assuming normally distributed breeding values and the number of matings to produce at least one offspring with an estimated breeding value over a given threshold was calculated using a binomial distribution. Results Mean and standard deviation of gamete breeding values Figure 2 shows for each trait and animal the relation between MGBV and SDGBV. Average MGBV were equal to 0.36 genetic standard deviation (σ a ) for fat yield, 0.54 σ a, for protein yield, 0.22 σ a for somatic cell score, and 0.09 σ a for the direct genetic effect for stillbirth. A mean SDGBV of 0.47 σ a was obtained for somatic cell score. The direct genetic effect for stillbirth had an average SDGBV of 0.25 σ a. All plots show the presence of animals with equal MGBV but significantly different SDGBV. For example, for protein yield, bulls with an MGBV of 1.8 σ a showed a maximum difference in SDGBV of 0.22 σ a. 89

90 Figure 2 Relationship between MGBV and SDGBV Traits investigated were fat yield, protein yield, somatic cell score and the direct genetic effect for stillbirth. The red lines indicate means for MGBV and SDGBV. Each dot represents an animal. Table 1 contains the observed correlations between the MGBV for the four traits, the genomic (F G ) and the pedigree (F P ) inbreeding coefficients. The correlation between MGBV was 0.66 for fat yield with protein yield and 0.15 for somatic cell score with the direct genetic effect for stillbirth. Correlation of SDGBV was lower with F G than with F P. 90

91 Table 1 Correlations between MGBV among traits and with inbreeding coefficients Item MGBV FY MGBV SCS MGBV SBd F G F P MGBV PY MGBV FY MGBV SCS MGBV SBd F G 0.52 MGBV PY : mean gamete breeding value for protein yield; MGBV FY : mean gamete breeding value for fat yield; MGBV SCS : mean gamete breeding value for somatic cell score; MGBV SBd : mean gamete breeding value for the direct genetic effect for still birth; F G : genomic inbreeding coefficient; F P : pedigree inbreeding coefficient. Correlations among SDGBV for the four traits are in Table 2. These correlations were lower than correlations among MGBV. Correlation between SDGBV was highest for fat yield with protein yield (0.41). Correlations between SDGBV for the other traits ranged from 0.05 to For all traits, correlations between SDGBV and F P were negative. Correlations between SDGBV and F G were also negative for all traits and two to four times larger than correlations between SDGBV and F P. Table 2 Correlation between SDGBV among traits and with inbreeding coefficients Item SDGBV FY SDGBV SCS SDGBV SBd F G F P SDGBV PY SDGBV FY SDGBV SCS SDGBV SBd F G 0.52 SDGBV PY : Standard deviation of gamete breeding values for protein yield; SDGBV FY : Standard deviation of gamete breeding values for fat yield; SDGBV SCS : Standard deviation of gamete breeding values for somatic cell score; SDGBV SBd : Standard deviation of gamete breeding values for the direct genetic effect for stillbirth; F G : genomic inbreeding coefficient; F P : pedigree inbreeding coefficient. 91

92 The MGBV showed no difference between theoretical and sampled quintiles of the normal distribution function for any of the studied traits (results not shown). Figure 3 shows Q-Q plots for SDGBV for the four traits. The graphs indicate that the classes in the middle of the distribution were almost normally distributed for all traits. For the more extreme classes, especially for animals with a SDGBV for fat yield lower than 0.35 σ a, a substantial deviation from the normal distribution was observed. Figure 3 Normal Q-Q plots for SDGBV for fat yield, protein yield, somatic cell score and the direct genetic effect for stillbirth 92

93 Changes in SDGBV over time are in Figure 4. Similar to Figure 2, the SDGBV was highest for somatic cell score. The SDGBV for the direct genetic effect for stillbirth was only half of the SDGBV for somatic cell score. All traits indicated a slightly negative trend of SDGBV over the last decades. Regression of SDGBV on birth year indicated that the decline in SDGBV was greatest for somatic cell score ( σ a per year), followed by fat yield ( σ a per year). Figure 4 Changes in SDGBV for fat yield, protein yield, somatic cell score and the direct genetic effect for stillbirth for animals born between 1990 and

94 Validation of simulated SDGBV Table 3 shows a sensitivity analysis to determine the size of the progeny groups needed for validation. Sires with more than 150 offspring are a good compromise between size of the group of offspring and number of sires available. In this case, correlations between the observed real progeny variation with the simulated SDGBV were highest for fat yield (r = 0.93), followed by protein yield and somatic cell score (r = 0.90), while the direct genetic effect for stillbirth had the lowest correlation (r = 0.78). Table 3 Correlations (r) between SDGBV with real progeny variations for different traits per minimum number of offspring per sire Minimum number of offspring per sire Number of sires r FY r PY r SCS r SBd PY = protein yield; FY = fat yield; SCS = somatic cell score; SBd = the direct genetic effect for stillbirth. 94

95 Mating schemes Table 4 and Figure 5 show results from the mating of two bulls that have extremely different SDGBV for protein yield, with a poor, average and superior female from the population. In addition, Figure 5 Distribution of the breeding values of offspring for protein yield Two bulls (with MGBV equal to 1.81 σ a and 1.68 σ a and SDGBV equal to 0.29 σ a and 0.52 σ a, respectively) are mated with an average female of the population (MGBV equal to 0.55 σ a, SDGBV equal to 0.39 σ a ). Table 4 contains the probabilities of producing an offspring with a breeding value exceeding 0, 1, 2, 3 and 4 σ a and the number of animals to be tested to produce at least one animal with a breeding value exceeding a fixed threshold. Resulting distributions of the potential offspring were quite different between the two bulls. Mating of bull 1 with an average cow of the population is expected to produce animals with the highest mbv, i.e σ a. The same mating of bull 2 will generate animals with a slightly lower expected mbv, i.e σ a. However, a bull that has the highest mean does not guarantee the highest probability of producing offspring with a breeding value greater than 3 or 4 σ a. In this case, bull 2 had the highest probability of producing such 95

96 offspring, but its probability of having progeny with an extreme negative breeding value was also greater. Similarly, the number of animals to be tested to find at least one animal with a mbv higher than 2 σ a was highest for bull 2. To produce extreme animals with a gamete breeding value higher than 3 or 4 σ a, more progeny had to be tested for bull 1 than for bull 2. Choosing a poor or a superior dam instead of an average cow changed the mean breeding value of the potential offspring, but did not substantially change the likelihood of obtaining offspring with extremely low or high breeding values. 96

97 Table 4 Results of mating two sires to a poor, average and superior female in the population for protein yield Sire σ a Dam σ a Offspring σ a p (%) N MGBV SDGBV MGBV SDGBV mbv sbv 0σ a 1σ a 2σ a 3σ a 4σ a 0σ a 1σ a 2σ a 3σ a 4σ a The table shows the mating of two sires to three cows and the resulting mean and standard deviation of the potential offspring. In addition, the table shows the probability (p) and minimum number of animals (N) to test, to generate at least one offspring over 0, 1, 2, 3, or 4 genetic standard deviations (σ a ) for protein yield. 97

98 Discussion The objective of this study was to predict the expected genetic standard deviation within groups of offspring using real data. The results indicate that gamete breeding values vary between animals and these results can be used to make specific mating decisions. Gamete variation MGBV and SDGBV for direct genetic effect for stillbirth were about half as high as for the three other traits (Figure 2 and Figure 4), which is related to differences in the reliabilities of the direct genomic breeding values (DGV) between these traits. The reliability of DGV for fat and protein yields is equal to 69% and for somatic cell score to 74%, but only 44% for the direct genetic effect for stillbirth (Liu et al., 2011). Accordingly, the SNP effects for the direct genetic effect for stillbirth are more regressed to the mean than for the other traits. In comparison to the SNP-effect reference population, high MGBV for protein and fat yields can be explained by higher selection intensities and genetic gains than for somatic cell score and the direct genetic effect for stillbirth. Comparing the three different traits with similar reliabilities indicates that protein yield had the highest MGBV but the lowest SDGBV. This is explained by a higher selection intensity for protein yield, which is caused by a higher weight on this trait in the German Total Merit Index (vit, 2014). However, up to now most genotyped animals are elite animals, which means that the genotyped animals are highly preselected. From this point of view, the high MGBV for protein and fat yields may not represent the mean breeding value of the German Holstein population. In contrast, MGBV for somatic cell score and for the direct genetic effect for stillbirth are closer to the mean value of the population since these traits are not as relevant for selection. Similarly, Cole and Null (2013), pointed out that most genotyped animals are elite animals, which have more chromosomes with a desirable DGV than chromosomes with an undesirable DGV. Negative correlations between F G and SDGBV (Table 2) are in agreement with (Cole and VanRaden, 2011). These authors reported a stronger correlation of the Mendelian sampling variance (similar to the square of SDGBV) with F G than with F P, which is caused by pedigree errors. 98

99 For animals with a low standard deviation of fat yield, the Q-Q plot (Figure 3) showed a high divergence between the theoretical normal distribution and the sampled distribution. Cole and Null (2013) indicated that mutations with large effects like DGAT1 (Grisart et al., 2004) should explain a higher proportion of the genetic variance than the expected variance based on the relative length of the chromosome. To check if the DGAT1 locus has an effect on the distribution of SDGBV, two scenarios were analyzed (Figure 6). Figure 6 Distribution of SDGBV for fat yield with and without the DGAT1 haplotype In the first scenario, the SDGBV for fat yield was predicted including all SNPs. Results showed a bivariate distribution with SDGBV ranging from 0.25 to 0.6 σ a. In the second scenario, haplotypes in a region of 2.2 Mbp surrounding the DGAT1 locus were excluded from the SDGBV prediction. Under this scenario, SDGBV showed a normal distribution with a lower mean and lower range than for scenario 1. This indicates that the SDGBV for a specific trait depends on its genetic architecture. The larger the effect on the trait and the more the allele frequency of this mutation is close to 0.5, the higher is the influence on the SDGBV, which results in a deviation from the normal distribution. Thaller et al. (2003) reported an allele frequency of 0.55 for Holstein animals for the lysine-encoding variant (K232A) of the DGAT1 gene. Furthermore, for the direct genetic effect for stillbirth, several investigations (Cole et al., 99

100 2009; Kühn et al., 2003) have indicated the presence of a quantitative trait loci (QTL) on chromosome 18 with a high influence on calving traits. Haplotype analyses demonstrated that a haplotype of 19 SNPs explains 16% of the estimated breeding value variance for the direct genetic effect for stillbirth (results not shown here). However, the influence of this QTL on SDGBV for direct genetic effect for stillbirth was less than the effect of DGAT1 on the SDGBV for fat yield. Differences in allele frequencies of the DGAT1 gene and of the QTL for the direct genetic effect for stillbirth might explain these findings. Validation of simulated gamete variation Simulated SDGBV can only be validated for sires that have large groups of offspring. A validation independent from genomic information is only possible by comparing the SDGBV of a bull with the standard deviation of the phenotype-based estimated breeding values of its sons. However, only some very popular sires have a large number of offspring with phenotype-based estimated breeding values. Using genomic information, many animals can be tested at a relatively low cost compared to the costs of progeny-testing of bulls, which makes it possible to investigate the standard deviation of genomic breeding values within groups of offspring. Another approach to investigate and validate the standard deviation within groups of offspring is to use daughter yield deviations corrected for the contribution of the dam. One benefit of this approach is that many sires have very large groups of female offspring because of artificial insemination. Figure 7 shows the trend over time of the mean haplotype breeding values that progeny inherit from their sire and dam. Results show a near linear trend for fat and protein yields, but the paternal haplotype had a higher intercept and steeper slope than the maternal haplotype. 100

101 Figure 7 Trend over time of observed MGBV for the haplotype inherited from dam and sire An interesting point is the decrease in paternal MGBV for birth year Analysis of the 2002, 2003 and 2004 tested birth cohorts (650 bulls per year) also indicate a decrease in mean breeding values for fat yield (0.33 σ a, 0.25 σ a, 0.43 σ a ) and protein yield (0.55 σ a, 0.46 σ a, 0.71 σ a ) for the 2003 birth cohort. This decrease is mainly caused by the offspring of three sires which predominated in this birth year. On average, these groups had breeding values for fat and protein yields that were more than one σ a lower than the pre-dominating groups of offspring in the birth cohorts in 2002 and In contrast to the gamete breeding values for fat and protein yields, no clear difference in gamete breeding values between maternal and paternal haplotypes was found for somatic cell score until the 2010 birth year. From birth year 2010 to 2013, the paternal haplotype was superior to the maternal haplotype. One explanation is that more and more genomically selected sires were used to produce animals born between 2010 and In 101

102 contrast, due to genotyping costs, many dams were not genomically selected, which results in lower genetic gain on the female side. For gamete breeding values for the direct genetic effect for stillbirth, there was no genetic trend for either maternal or paternal haplotype breeding values because the direct genetic effect for stillbirth does not seem to be a trait under intense selection. However, Figure 7 shows that for fat and protein yields there is a difference between sires and dams, which has to be taken into account in the validation. The gap between estimated sire and dam haplotype breeding values can be reduced by increasing genotyping and selection intensity in the dams-to-bulls and dams-to-cows selection paths. Systematic genotyping of young Holstein Friesian candidates started in This implies that animals born before 2010 were selectively genotyped because of their importance for the breeding scheme and their contribution to the reference population. The within-family variance of older families could be affected by this selective genotyping. Genotyping more animals results in larger groups of offspring from randomly genotyped sires, which should result in improved future validations. VanRaden et al (2011) and Fritz et al (2013) reported that some haplotypes are never present in the homozygous state, because embryos that are homozygous for these haplotypes are not viable. This fact and genetic defects like Brachyspina (Agerholm et al., 2007; Charlier et al., 2012), Bovine Leukocyte Adhesion Deficiency (BLAD Shuster et al., 1992) or Complex Vertebral Malformation (CVM Agerholm et al., 2001) also influence the SDGBV. However, the effect on the variation depends on the allele frequency in the population; thus a loss of variation can be observed only when sperm and ovum carrying the same genetic defect. This fact can explain the difference between simulated and observed realized gamete breeding values, because the simulation did not consider loss of variation due to genetic defects. Indeed, gamete breeding values rather than animal breeding values were simulated and a carrier of a genetic defect had no influence on SDGBV if the mating partner did not carry this defect. 102

103 Mating designs Figure 2 shows that there are animals with a high mean and a low variability that are relevant for dairy farmers. In particular, animals with a high mean and a high standard deviation are interesting for AI companies because selecting these animals will increase the probability of producing animals with extremely positive breeding values in the future. Haplotype information enables the estimation of selection limits. Summing up the best breeding value for each haplotype will give the theoretically best animal. The gamete breeding values of these hypothetical animals should reach + 30 σ a (707 kg) for fat yield, +32 σ a (539 kg) for protein yield, + 35 σ a somatic cell score and σ a for the direct effect of still birth. Cole and VanRaden (2011) showed that the selection limit for protein yield was 1138 kg. Although our results are estimated at the haplotype level and those of (Cole and VanRaden, 2011) at the animal level, they are consistent. Theoretical mating of the two best animals for protein yield in our dataset would produce animals with a mean estimated breeding value of 4.82 σ a and a standard deviation of 0.76 σ a. The probability to produce an offspring with a breeding value higher than 8 σ a is 0.14%, which is only one third of the selection limit, which illustrates that animals from the current population are far from the selection limits. Figure 5 and Table 4 show that two different mating strategies can be designed based on knowledge about MGBV and SDGBV. On the one hand, AI companies are interested in finding extremely positive offspring and, from this point of view, mating bull 2 would be the best choice. On the other hand, farmers are more interested in homogeneous groups of offspring with low SDGBV, which means that mating bull 1 would be better for breeding in these herds. For computational reasons, no covariance between sire and dam was assumed to calculate the vbv. Thus, this method has to be improved because the German Holstein population has a small effective population size which increases the level of relationships and results in a non-zero covariance between sires and dams. Finding the best combination of mating partners in mating programs that are based on genomic information requires time- and memory-intensive computing because of the large amount of data. A great benefit of the method described in this study is that MGBV and SDGBV need to be computed only once for each animal. After this step, it is computationally easy to find mating partners because mbv or vbv is the sum of maternal and paternal MGBV or SDGBV, 103

104 respectively. Calculating the probability that an animal reaches a defined threshold is simple using normal distribution functions. Based on this methodology, a software tool for breeding associations was developed, which includes MGBV and SDGBV for a portfolio of bulls of interest and for genotyped cows. Given this information, the association can specify which breeding value threshold the offspring of a given cow should exceed and the tool provides a list of bulls that are expected to reach this criterion. Future aspects and applications Decreasing genotyping costs makes it possible to genotype whole commercial herds (Weigel et al., 2012). Considering MGBV and SDGBV derived from haplotypes and SNP effect estimates is only one example of the use of additional genomic information in genomic mating programs. Ongoing research will develop new tools such as the estimation of dominance effects (Toro and Varona, 2010) or more information about haplotypes with specific genomic effects. Software solutions need efficient and highly performing programs, which can handle large amounts of data within a reasonable timeframe. Conclusions The expected SDGBV of a potential parent can be estimated from genomic information. The SDGBV differs between animals and tend to be normally distributed in the absence of QTL with a large effect on the trait. For SDGBV for fat yield, a deviation from a normal distribution that is caused by the DGAT1 mutation results in a higher SDGBV than expected. Furthermore, for all traits, SDGBV decreased slightly in recent years because of an increase in the level inbreeding. A genomic mating program was developed to find optimal mating partners with respect to expected MGBV and SDGBV. This approach also allows the probability of finding an offspring with a breeding value exceeding a chosen threshold to be calculated. 104

105 Competing interests The authors declare that they have no competing interests. Authors' contributions DS conducted the analyses and wrote the manuscript. FR helped to check the results and suggested improvements. ZL estimated the SNP effects. GT coordinated the project, added valuable comments and suggestions. All authors read and approved the manuscript. Acknowledgements German national organization FBF is thanked for financial support. References Agerholm, J. S., C. Dendixen, O. Andersen, and J. Arnbjerg Complex vertebral malformation in Holstein calves. J. Vet. Diagn. Invest. 13: Agerholm J. S., and K. Peperkamp Familial occurrence of Danish and Dutch cases of the bovine brachyspina syndrome. BMC Vet. Res. 3:8. Browning, S. R., and B. L. Browning High-resolution detection of identity by decent in unrelated individuals. Am. J. Hum. Genet. 86: Charlier, C., J. S. Agerholm, W. Coppieters, P. Karlskov-Mortensen, W. Li, G. de Jong, C. Fasquelle, L. Karim, S. Cirera, N. Cambisano, N. Ahariz, E. Mullaart, M. Georges, and M. Fredholm A deletion in the bovine FANCI gene compromises fertility by causing fetal death and brachyspina. PLoS ONE 7:e Cole, J. B., P. M. VanRaden, J. R. O Connell, C. P. Van Tassell, T. S. Sonstegard, R. D. Schnabel, J. F. Taylor, and G. R. Wiggans Distribution and location of genetic effects for dairy traits. J. Dairy Sci. 92: Cole, J. B., and D. J. Null Visualization of the transmission of direct genomic values for paternal and maternal chromosomes for 15 traits in US Brown Swiss, Holstein, and Jersey cattle. J. Dairy Sci. 96:

106 Cole, J. B., and P. M. VanRaden Use of haplotypes to estimate Mendelian sampling effects and selection limits. J. Anim. Breed. Genet. 128: Erbe, M., B. J. Hayes, L. K. Matukumalli, S. Goswami, P. J. Bowman, C. M. Reich, B. A. Mason, and M. E. Goddard Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J. Dairy Sci. 95: Fritz, S., A. Capitan, A. Djari, S. C. Rodriguez, A. Barbat, A. Baur, C. Grohs, B. Weiss, M. Boussaha, D. Esquerré, C. Klopp, D. Rocha, and D. Boichard Detection of haplotypes associated with prenatal death in dairy cattle and identification of deleterious mutations in GART, SHBG and SLC37A2. PLoS ONE 8:e Gengler, N., P. Mayeres, and M. Szydlowski A simple method to approximate gene content in large pedigree populations: application to the myostatin gene in dual-purpose Belgian Blue cattle. Animal 1: Grisart, B., F. Farnir, L. Karim, N. Cambisano, J. J. Kim, A. Kvasz, M. Mni, P. Simon, J. M. Frere, W. Coppieters, and M. Georges Genetic and functional confirmation of the causality of the DGAT1 K232A quantitative trait nucleotide in affecting milk yield and composition. Proc. Natl. Acad. Sci. USA 101: Hayes, B. J Technical note: Efficient parentage assignment and pedigree reconstruction with dense single nucleotide polymorphism data. J. Dairy Sci. 94: Heaton, M. P., G. P. Harhay, G. L. Bennett, R. T. Stone, W. M. Grosse, E. Casas, J. W. Keele, T. P. L. Smith, C. G. Chitko-Mckown, and W. W. Laegreid Selection and use of SNP markers for animal identification and paternity analysis in US beef cattle. Mamm. Genome 13: Kühn, C., J. Bennewitz, N. Reinsch, N. Xu, H. Thomsen, C. Looft, G. A. Brockmann, M. Schwerin, C. Weimann, S. Hiendleder, G. Erhardt, I. Medjugorac, M. Forster, B. Brenig, F. Reinhardt, R. Reents, I. Russ, G. Averdunk, J. Blümel, and E. Kalm Quantitative trait loci mapping of functional traits in the German Holstein cattle population. J. Dairy Sci. 86:

107 Liu, Z., F. R. Seefried, F. Reinhardt, S. Rensing, G. Thaller, and R. Reents Impacts of both reference population size and inclusion of a residual polygenic effect on the accuracy of genomic prediction. Genet. Sel. Evol. 43:19. Lund, M. S., A. P. W. de Ross, A. G. de Vries, T. Druet, V. Ducrocq, S. Fritz, F. Guillaume, B. Guldbrandtsen, Z. Liu, R. Reents, C. Schrooten, F. Seefried, and G. Su A common reference population from four European Holstein populations increases reliability of genomic predictions. Genet. Sel. Evol. 43:43. Segelke, D., J. Chen, Z. Liu, F. Reinhardt, G. Thaller, and R. Reents Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips. J. Dairy Sci. 95: Segelke, D., H. Täubert, F. Reinhardt, and G. Thaller Chancen und Grenzen der Hornloszucht für die Rasse Deutsche Holstein. Züchtungskunde 85:4. Shuster, D. E., M. E. Kehrli, M. R. Ackermann, and R.O. Gilbert Identification and prevalence of a genetic defect that causes leukocyte adhesion deficiency in Holstein cattle. Proc. Nat. Acad. Sci. USA 89: Thaller, G., W. Krämer, A. Winter, B. Kaupe, G. Erhardt, and R. Fries Effects of DGAT1 variants on milk production traits in German cattle breeds. J. Anim. Sci. 81: Toro, M. A., and L. Varona A note on mate allocation for dominance handling in genomic selection. Genet. Sel. Evol. 42:33. Weigel, K. A., P. C. Hoffmann, W. Herring, and T. J. Lawlor Potential gains in lifetime net merit from genomic testing of cows, heifers, and calves on commercial dairy farms. J. Dairy Sci. 95: Wiggans, G. R., T. A. Cooper, P. M. VanRaden, K. M. Olson, and M. E. Tooker Use of the Illumina Bovine3K BeadChip in dairy genomic evaluation. J. Dairy Sci. 95: VanRaden P.M Efficient methods to compute genomic predictions. J. Dairy Sci. 91: VanRaden, P. M., C. P. Van Tassell, G. R. Wiggans, T. S. Sonstegard, R. D. Schnabel, J. F. Taylor, and F. S. Schenkel Invited review: reliability of genomic predictions for North American Holstein bulls. J. Dairy Sci. 92:

108 VanRaden, P. M., K. M. Olson, D. J. Null, and J. L. Hutchison Harmful recessive effects on fertility detected by absence of homozygous haplotypes. J. Dairy Sci. 94: Vit Estimation of Breeding Values for Milk Production Traits, Somatic Cell Score, Conformation, Productive Life and Reproduction Traits in German Dairy Cattle [ Zimin, A. V, A. L. Delcher, L. Florea, D. R. Kelley, M. C. Schatz, D. Puiu, F. Hanrahan, G. Pertea, C. P. Van Tassek, T. S. Sonstegard, G. Marcais, M. Roberts, P. Subramanian, J. A. Yorke, and Salzberg A whole genome assembly of the domestic cow, Bos taurus. Genome Biol. 10:R

109 General Discussion Genomic selection was successfully and rapidly implemented in Holstein breeding schemes. High-throughput genotyping of thousands of animals per year allows additional utilization of the genomic information. The aim of the present study was to demonstrate that haplotyping and imputation provide a novel source for breeding strategies beyond genomic selection. Imputation enables to cost-effectively screen large numbers of potential selection candidates to increase the selection intensity. Additionally it is shown that the polled state can reliably be imputed, which is important to expand the polled genetic breeding base. Haplotypes derived during the imputation process are useful for identification and management of genetic disorders. Furthermore, they can be used to predict the expected variation within groups of offspring. Impacts on imputation accuracy Imputation is not only adopted to dairy cattle breeding schemes (Calus et al., 2014), but also to pigs (Huang et al., 2012), poultry (Fulton, 2012) and sheep (Hayes et al., 2012). Imputation accuracy is influenced by different factors across SNP array densities and species. One important influence on imputation accuracy is the presence of closely related genotyped individuals. The software package Findhap Version 2 (VanRaden et al., 2011a) considers family information during the imputation process. The effect of adding closely related animals to the reference population is higher than for the population based method Beagle (Browning and Browning, 2007) which assumes unrelatedness between the animals (see Chapter 1). In dairy cattle population, this assumption is not applicable because few sires produce a high number of offspring leading to a small effective population size and consequently high linkage disequilibrium within the population (Sargolzaei et al., 2014). However, population based approaches benefit from adding related animals because they capture close relationships between individuals by identifying long shared haplotypes with unusually low frequency (Sargolzaei et al., 2014). In agreement with findings in Chapter 1, Huang et al. (2012) showed the impact of closely related ancestors on imputation accuracy for family based imputation packages in pigs. 109

110 High relationships between polled animals in the reference and validation population resulting in large shared haplotype among polled animals and the application of a dense marker SNP chip causing a high imputation accuracy for the polled state as represented in Chapter 2. The imputation algorithm also influences the imputation accuracy (Chapter 1). In contrast to human, population based, genetic software packages (e.g. Beagle (Browning and Browning, 2007), IMPUTE 2 (Howie et al., 2009)) pedigree based imputation algorithms (e.g. Findhap (VanRaden et al., 2011a), Fimpute (Sargolzaei et al., 2014)) take large scale data sets and the availability of generations of pedigree records into account (VanRaden et al., 2013). Imputation and haplotype inference using family based approaches rely on Mendelian rules (Sargolzaei et al., 2014) and consider large haplotype segments for recent relationships and short haplotype segments for key ancestors (VanRaden et al., 2013; Sargolzaei et al., 2014). Pedigree based imputation software packages might be affected by pedigree errors. Error rate is approximately 10% in American Holsteins (Banos et al., 2001). In all scenarios in Chapter 1, Beagle was superior to Findhap, which is in agreement with Ma et al. (2013). Sargolzaei et al. (2014) showed that the family based software package Fimpute outperform Beagle for imputation from 6K to 50K. In their opinion shared haplotypes between distant relatives can be found more easily and accurately with increasing sample size. Beagle performed better for the imputation from 3K to 50K. This implies that Fimpute had difficulties to find shared haplotypes between relatives in this scenario. Beside the imputation accuracy, the computational demand and time of imputation has to be considered, especially in dairy cattle breeding where large data sets are handled. Family based imputation software packages were much more time and memory efficient than population based programs (Chapter 1; Ma et al. (2013); Sargolzei et al. (2014)). For Beagle version 3.3 a quadratic increase of computation time with growing sample size is reported (Browning and Browning, 2013). The newly released version of Beagle version 4 (Browning and Browning, 2013) needs less memory and enables the possibility of multi-threading, but still needs more computational demand than Findhap or Fimpute. Therefore, population based algorithms can not be used to impute several dense SNP chips in time-critical monthly or weekly evaluation (Alkhoder et al., 2014; Wiggans et al., 2014). These approaches should be used for the imputation of single characteristics like polled (Chapter 2) or derivation of missing homozygote 110

111 haplotypes (Chapter 3) because population based methods have higher imputation accuracy and are less depended on close relationships. Marker specific imputation accuracy is affected by the minor allele frequency (MAF). An investigation of the genome wide allele error rate (Chapter 1) does not take into account the MAF depended allele error rate. Figure 1 gives information about MAF in relation to imputation accuracy. Imputation accuracy is measured as correlation between imputed and real genotypes for the IlluminaLD chip using different reference sizes and imputation programs. The Figure illustrates that imputation accuracy depends on the MAF and especially the imputation of alleles with very low frequencies is challenging. Furthermore an increase of the reference population (old bulls 11,670 bulls, EuroG 14,405 bulls, ALLG 31,597 animals, see chapter 2) improves the imputation for all MAF. Beagle especially imputes markers with a low MAF better than Findhap. Figure 1 Genome wide imputation accuracy from the IlluminaLD chip for different validation data sets Similar to Figure 1, studies showed for imputation from BovineHD to the sequence level that the imputation accuracy, measured as correlation between imputation probability and real genotypes, increased with increasing minor allele frequencies (Brøndum et al., 2014; Van Binsbergen et al., 2014; Daetwyler et al., 2014). In contrast Ma et al. (2013) found that the number of correctly 111

112 imputed alleles was higher with lower MAF. Calus et al. (2014) demonstrated that this measurement is not independent from MAF since low MAF had a higher probability for correct imputation by chance. Effects on phasing accuracy Predicting the expected variation between offspring as well as missing homozygosity mapping relies on in silico inference of haplotypes derived from genotyping platforms by imputation and phasing algorithm. Several approaches to phase genotypes have been proposed. Browning and Browning (2011) gave a review of existing phasing methods. For livestock species phasing can be enhanced by the use of large pedigree datasets. For example the algorithm of Hickey et al. (2011) based on the long range phasing approach described by Kong et al. (2008). Druet and Georges (2015) use pedigree information to apply Mendelian segregation rules and linkage information of half sibs. To consider genotyping and mapping errors a hidden Markov step is implemented to refine the haplotypes. Methods without utilization of pedigree information take haplotypes frequencies in addition to identity by descent (e.g. Beagle). Erbe et al. (2013) investigated the phasing accuracy of Beagle and Findhap. They showed that for both packages the number of jumps (positions with phase changes across imputation runs) depends on number of animals and the number of closely related animals in the reference population. Beagle outperformed Findhap in respect to number of phase switches within an individual, but Findhap had a higher percentage of identically phased haplotypes, across multiple phasing runs, than Beagle. Ferdosi et al. (2014) showed that Beagle in some chromosomal regions incorrectly assigns parent haplotype blocks in the offspring (e.g. haplotype blocks inherited from the dam were assigned to the sire). This result in an incorrect large number of recombination within a haplotype, which is in agreement with own investigations (results not shown here). In addition, Ferdosi et al. (2014) describe that genotyping errors highly affect the phasing accuracy of Beagle. They applied Beagle to a simulated data set of 20 half sib families with 10,000 SNP markers where each consist of 40 individuals. Without genotyping errors, the mean accuracy for Beagle, measured as R² between inferred and true haplotypes, was However including one percent random genotyping errors in the simulated data sets reduce the accuracy to 0.5. Druet and Georges (2015) noted that the reconstruction of haplotypes is sensitive 112

113 to mapping errors and to structural variation like copy number variants. They showed that the mean number of crossing over per gamete decreased from 35 to 27 and the maximal crossing over drops from 108 to 56 by applying a hidden markov model which accounts for genotyping and mapping errors. Using this approach they identified 46 suggestive mapping errors of the UMD 3.1 assembly. Harmful recessive haplotypes in the German Holstein population Increasing inbreeding rate enhances the occurrence of genetic disorders. In the genomic era with genotyping of thousands of animals per year and sequencing important key ancestors (Jansen et al., 2014, Daetwyler et al. 2014) the number of known genetic disorders is rapidly increasing. VanRaden et al. (2011b) were one of the first research groups who used the huge data pool from genomic evaluation to screen for genetic defects at the genomic level. After phasing the genotypes they identified three different Holstein haplotypes (HH1, HH2 and HH3) which may cause embryo loss in the homozygote state. Fritz et al. (2013), Sahana et al. (2014) and Cooper et al. (2013) found additional recessive Holstein haplotypes which were associated with fertility traits (e.g. HH4 and HH5). The causal mutation for the haplotypes HH1, HH3 and HH4 were identified within the APAF1, SMC2 and GART gene (Adams et al., 2012; Daetwyler et al., 2014; McClure et al., 2014; Fritz et al., 2013). The missing homozygote approach introduced by VanRaden et al. (2011b) was applied to the German Holstein genotyped population. Firstly, haplotypes were derived by phasing the genotypes of 151,218 animals with Beagle version 3.3 (Browning and Browning, 2007). Afterwards missing homozygote haplotypes were identified with a sliding window approach using a haplotype length of 25 consecutive markers. Assuming random mating, the number of expected homozygote animals was calculated as the number of genotyped animals divided by 4 and multiplied by the square of the carrier frequency of the individual haplotype (VanRaden et al., 2011b). Only candidate haplotypes with a carrier frequency of more than two percent and a significant deviation (p < 10-5 ) between expected and observed number of homozygotes were chosen. Association between candidate haplotypes and fertility traits and stillbirth were conducted. As fertility trait the non-return rate (NRR) for three different time periods (day 56, 100, 150) was analyzed. The NRR characterize whether a re-insemination was registered within a 113

114 given time (e.g. 56 days) after the first insemination. For all phenotypes the animals were grouped if the sire and/or maternal grandsire was carrier of the candidate haplotype. Differences in mean NRR and stillbirth rate between carrier groups were investigated with a t-test. The most significant deficit of homozygote haplotypes are shown in Table 1. All listed haplotypes had at least one association to NRR (p < 10-5 ) or stillbirth rate (p < 10-4 ) and differences between mating at risk and control groups were greater than one percent. The analysis revealed 13 haplotypes appearing with a high homozygosity deficiency (Table 1). Table 1 List of regions with deficit of homozygotes and association to fertility and stillbirth Hap BTA Interval (Mb) Homozygotes, n Exp. Obs. Carrier freq. (%) Significant effect on phenotype NRR100, NRR NRR56, NRR100, NRR SB NRR56, NRR100, NRR150, SB NRR100, NRR150, SB NRR100, NRR150, SB NRR56, NRR100, NRR SB NRR56, NRR100, NRR NRR56, NRR100, NRR NRR56, NRR100, NRR NRR100, NRR150, SB NRR56, NRR100, NRR150 Exp: Expected; Obs: Observed SB: Stillbirth; NRR56 Non-return-rate day 56; NRR100 Non-return-rate day 100; NRR150 Non-return-rate day

115 A high agreement between haplotype number 1, 4, 9 and 10 with HH4 (BTA 1: Mb; Fritz et al., 2013), HH1 (BTA 5: Mb; VanRaden et al., 2011b), HH3 (BTA 8; Mb; VanRaden et al., 2011b) and HH5 (BTA 9: Mb, Cooper et al., 2013) shows that these known haplotypes are segregating in the German population and have a high association to the analyzed phenotypes. Haplotype 1, 4 and 10 were not identified in the homozygote state which is in consensus to Fritz et al. (2013), VanRaden et al. (2011b), respectively Cooper et al. (2013). For haplotype 9 which is highly correlated to HH3, no homozygote individual is reported by VanRaden et al. (2011), Fritz et al. (2013) and Sahana et al. (2013). However in our data set two homozygote genotyped embryos were identified. Haplotype 12 was associated to the Brachyspina underlying haplotype (BTA 21; , Charlier et al., 2012). For HH2 a homozygote deficiency was detected, but due to small carrier frequency only a minor impact on fertility phenotypes was observed (see Chapter 3). Eight additional recessive haplotypes on seven different chromosomes were detected revealing a significant difference between expected and observed number of homozygote individuals. However, for all these haplotypes homozygote animals were observed, possibly caused by imperfect LD between haplotype and causal mutation or incomplete penetrance of the unknown mutation on the homozygote state (Fritz et al., 2013). Furthermore the causal mutation can be positioned several Mb distant to the detected interval. Occurrence of the mutation in a germline of a recent founder animal, resulting in two identical haplotypes differing only for the harmful allele is another possible explanation for the identification of homozygote animals (Jung et al., 2014). The carrier frequency of these novel deficient haplotypes ranged from 3.44 to 5.59 percent. Haplotype 11 had the highest association to the NRR150; NRR150 difference between carrier group and non carrier group was five percent. Figure 2 illustrates the effect of haplotype 13 on the embryo survival rate of different mating groups. Generally for all risk mating groups a large decline of the embryo survival rate, obtained from NRR data, can be identified. Only 40% of all cows give birth to calf from of the first insemination. Nearly five percent fewer calves were born out of the risk mating group in comparison to the none risk mating group. The obvious abortion of the embryos starts around day 30 of insemination. Slight decrease of the embryo survival rate for risk carrier and non risk maternal grandsires mating and non risk sire and risk maternal grandsire mating might be caused by incomplete linkage between underlying causal 115

116 mutation and deficiency haplotype or caused by the actuality that the maternal granddam was a carrier. Figure 2 NRR based embryo survival curve for different haplotype 13 mating groups Haplotype 3 and 4 had an effect on the NRR100, NRR 150 as well as on the stillbirth rate similar to the Brachyspina-haplotype leading to the embryo death after day 56 in the gestation. Haplotype 3 and 8 were associated with an increased stillbirth rate (difference carrier noncarrier mating group +1.7% and +2.4%, respectively). In the near future sequence based fine scale mapping will be conducted for the identification of the haplotypes underlying causal mutation. Subsequently confirmed novel haplotypes will be integrated into the genetic index introduced in the third Chapter of this study. Use of recombination for mating decisions Several studies show that recombination does not always occur spontaneously in the genome. From human genetics it is known that location of crossing over and genome-wide numbers of crossing over differ between individuals (Kong et al., 2014). In human the number of recombination events is influenced by the major genes RNF212, REC8 and PRDM9 for both sexes. Especially the PRDM9 gene has an important role for recombination hotspots (Kong et al., 116

117 2010). In females an inversion on human chromosome 17 is additionally associated with recombination rate. Kong et al. (2014) showed that some variants are negatively correlated between sexes. Changes of allele frequencies due to drift or selection increases the recombination rate in one sex. This can be balanced by a reduced recombination rate in the other sex. The heritability (h²) of recombination rate was estimated to be 30% (Kong et al., 2004). Sandor et al. (2012) got the same heritability estimate for bulls varying between animals. Additionally they confirmed the effect of the three genes (REC8, RNF212 and PRDM9) on the male recombination rate in cattle. The method described in Chapter 4 uses recombination hotspots to divide the genome in haplotype segments. Due to limited number of genotyped females a general across sex and individuals recombination map was used to divide the chromosomes in haplotype segments. A further improved sex specific recombination map can enhance the estimation of expected variation between offspring. Mating considering the recombination rate would increase the probability of breaking existing good and poor haplotypes and increase the probability of new haplotypes. Moreover, previously studies showed a potential correlation between recombination rate and fertility (Ross-Ibarra, 2004; Kong et al. 2004) opening a new interesting field for special genome enabled mating. In our data set the average genome-wide recombination rate differed between sires (range 14-48). Heritability estimates based on a repeatability model for male recombination rate was A whole genome association study (Figure 3) with deregressed recombination rate breeding values showed that the major QTL linked to the REC8 (BTA 6) and RNF212 (BTA 10) which were previously identified by Sandor et al. (2012) could be confirmed. The presence of an additional QTL in the region Mb on chromosome 10 needs further investigation. 117

118 Figure 3 Whole genome association study for genome wide male recombination rate Figure 4 shows the recombination rate in dependency of the size of the sire offspring group. The Figure illustrates the average genome-wide recombination number in relation to the number of offspring per sire. For small family sizes a high varying genome wide recombination number between sires was observed due to unreliable phasing. 118