Meta-analysis of genetic diversity in goats with microsatellites

Size: px
Start display at page:

Download "Meta-analysis of genetic diversity in goats with microsatellites"

Transcription

1 Meta-analysis of genetic diversity in goats with microsatellites Han Jianlin (International Livestock Research Institute (ILRI) Nairobi, Kenya and CAAS-ILRI Joint Laboratory on Livestock and Forage Genetic Resources, Chinese Academy of Agricultural Sciences (CAAS), Beijing, China) GLOBALDIV Final Workshop 8-9 February 2011 EPFL campus, Lausanne, Switzerland

2 BT02 Improving Characterization of Farm Animal Genetic Resources Susan MacMillan To characterize, quantify and map phenotypic, neutral and functional diversity of FAnGR to inform livestock conservation and utilization

3 Gene Based Technology in Livestock Breeding: Characterization of Small Ruminant Genetic Resources in Asia

4 China II Iran Pakistan China I Saudi Arabia Bangladesh Vietnam Sri Lanka Indonesia

5 Sampling plan Jaffna local Sheep (JFL) Jaffna goat (SLJ) Indigenous goat (SNC) Kottukachchiya goat (KOT) Sri Lankan Boer goat (SLB) Indigenous goat (SLS)

6 BANGLADESH Site for sample collection Site for DNA extraction India North East MYM India India India Bay of Bengal Myanmar

7 Nearly 2000 samples of 55 goat populations from Bangladesh (5 populations), China (25), Indonesia (4), Iran (2), Pakistan (5), Sri Lanka (5) and Vietnam (5). Genotyping was done at ILRI-Nairobi using 15 microsatellite DNA markers through a group of scientists from all countries based on training.

8 Merging MS data sets - Values Microsatellite merging is the process of standardizing alleles between data sets that have been generated in different laboratories or by different experimental methods within the same facility, that permit: (1) a meta-analysis of regional and global biodiversity of livestock and poultry species; (2) an easy analysis of an accurately merged data set compared to individual data sets; (3) an increased sample size of the merged data set to empower linkage and association studies that involve alleles of small effect at a candidate locus.

9 Merging MS data sets - Difficulties Different microsatellite genotype calling procedures and missing alleles ( lumping of two consecutive true alleles into one, mistyping, or absence of particular allele(s) in specific samples) among laboratories; Even within the same laboratory, application of different experimental protocols often leads to ambiguities; Because different data sets may contain different numbers of markers and alleles, merging is unfortunately not a simple process of matching alleles one to one.

10 Merging MS data sets - Challenges Merging data sets manually is difficult, time-consuming, and errorprone due to differences in (1) genotyping process and hardware, (2) binning methods during the and after genotyping processes, (3) molecular weight standards, and (4) curve fitting algorithms; Merging is particularly difficult if few or no samples occur in common, or if samples are drawn from ethnic groups or genetic backgrounds (e.g. different ancestry or domestication centers) with widely varying allele frequencies; It is dangerous to align alleles simply by adding a constant number of base pairs to the alleles of one of the data sets.

11 Merging MS data sets - Physical options Sequencing selective alleles across the allelic distribution spectrum to match the genotyping results at a particular locus, but impossible due to a collapsed signal after a number of specific repeat units in direct sequencing profile while cloning sequencing would produce much more variation from the stutter banks; Genotyping of a common set of samples across different laboratories, but impractical due to unavailable suitable samples, facilities and staff in some of the laboratories after the projects;

12 Structural variation in microsatellite sequences

13 Structural variation in microsatellite sequences MS sequence variations link with extended flanking haplotypes!

14 Merging MS data sets - Opportunities Xu (2000) suggests combining logarithm of the odds (LOD) scores from individual data set analyses. This is effective for a linkage meta-analysis, where each family has been entirely typed in one lab, but it does not apply to association studies. Because association studies are becoming increasingly important, and genotyping in these studies is often distributed across different genotyping facilities. Daly et al. (1997) and the members of the GAMES (2003) consortium pool data, while preserving each lab s alleles and allele frequencies. The same markers typed at different facilities are treated as separate markers with the same map locations in the combined data set. This approach guards against misaligning alleles, but in an association study it adds no power to what can be achieved with a lab-by-lab analysis of the data.

15 MicroMerge (Presson et al. 2006, 2008) Builds on Dorr et al. (1997) for matching allele frequencies and renaming alleles; Applies a Bayesian model and a Markov chain Monte Carlo (MCMC) algorithm for sampling the posterior distribution under the model in treating all samples unrelated; common allele frequencies across labs in the same ethnic group are the single most important cue in the model; Implements the algorithm and almost always accurately and efficiently finds the most likely correct alignment; computes the allelic alignments with the greatest posterior probabilities under several merging options; It also reports when data sets cannot be confidently merged;

16 Models - MicroMerge (Presson et al. 2006, 2008) The difficulty of merging the marker data depends on the number of true alleles, their population frequencies, the number of bins per lab, and the number of samples in common; However, it requires a minimum of three required and two optional input files: Mendel locus file for each data set to be merged (2 or more files); Mendel pedigree file for each data set to be merged (2 or more files); MicroMerge control file (1 file); MicroMerge inclusion status file (1 file); MicroMerge samples in common file (file). Not applicable to MS datasets generated with randem sampling for biodiversity studies!

17 Models - COMBI.PL (Taűbert & Bradley 2008) Assigning allele sizes across studies using maximumlikelihood theory; Using data overlaps in samples and markers, allele shifts between two studies are calculated for each overlapping marker and a single file containing allele frequencies of consistent alleles is produced, that will afterwards be transferred to the alleles of nonoverlapping breeds; However, marker with allelic gaps in the allele lengths or a shift with a change in variance due to a sign of other problems in the data should not be combined but omitted from the data set.

18 Econogene and FAO/IAEA CRP Asian goat data sets Econogene: 1426 samples from 45 traditional and local Breeds of 15 European and Middle Eastern countries were genotyped with 30 markers; FAO/IAEA CRP Asian goats: 1629 samples from 43 local populations of eight countries (China, Bangladesh, Indonesia, Iran, Saudi Arabia, Sri Lanka, Pakistan, Vietnam) were genotyped with 15 markers; 11 markers and a few populations of Mediterranean goats were in common but genotyped separately; 200 samples from Econogene were genotyped by the CRP.

19 Canon et al. Anim. Genet. 2006, 37,

20 Econogene data set FAO/IAEA CRP Asian goat data set

21 Diversity at 11 MS markers Highest NMA per population in Mediterranean goats followed by East Asian, Central-north European and South Asian goats, while the lowest in Southeast Asian goats. However, it is hard to identified differences in in terms of heterozygosity values.

22 Econogene data (30 vs 11 markers)

23 Asia goats (15 vs 11 markers)

24 All goats (11 markers)

25 Conclusions and Perspectives No pattern of artificial selection with very limited genetic differentiation among local breeds within most of the countries; Clear genetic partitioning between large geographic regions and continents; More diversified genetic backgrounds due most likely to genetic drift leading to gradient reduction of diversity in Southeastern Asian goats from the centre of domestication; More data from Africa, Asia and Europe to be added; Merging MS data is feasible and promising to provide information for mapping global pattern of diversity distribution and understanding deeper biological questions.

26 Thank you!