Genetic Mapping of Complex Discrete Human Diseases by Discriminant Analysis

Size: px
Start display at page:

Download "Genetic Mapping of Complex Discrete Human Diseases by Discriminant Analysis"

Transcription

1 Genetic Mapping of Comple Discrete Human Diseases by Discriminant Analysis Xia Li Shaoqi Rao Kathy L. Moser Robert C. Elston Jane M. Olson Zheng Guo Tianwen Zhang ( Department of Epidemiology and Biostatistics Case Western Reserve University Cleveland Ohio 4409 USA; Department of Computer Science Harbin Institute of Technology Harbin China 5000; Department of Mathematics Harbin Medical University Harbin China 50086; Center for Molecular Genetics Lerner Research Institute Cleveland Clinic Foundation Cleveland Ohio 4495 USA; Department of Medicine University of Minnesota Minneapolis Minnesota USA In: P. R. CHINA Xia Li Professor Department of Mathematics Harbin Medical University Harbin China Phone: ( Fa: ( Liia6@yahoo.com or Liia@ems.hrbmu.edu.cn In: THE UNITED STATES OF AMERICA Shaoqi Rao PhD Center for Molecular genetics Lerner Research Institute /ND40 Cleveland Clinic Foundation 9500 Euclid Avenue Cleveland Ohio 4495 Phone: ( Fa: ( shaoqir@yahoo.com or raos@ccf.org Abstract The objective of the present study is to propose and evaluate a novel multivariate approach for genetic mapping of comple categorical diseases. This approach results from an application of standard stepwise discriminant analysis to detect linkage based on the differential marker identity-by-descent (IBD distributions among the different groups of sib pairs. For a binary human disease the three affection groups of a sib pair are defined as: concordantly affected discordant and concordantly unaffected. Two major advantages of this method is that it allows for simultaneously testing all markers together with other genetic and environmental factors in a single multivariate setting and it avoids eplicitly modeling the comple relationship between the affection status of sib pairs and the underlying genetic determinants. The efficiency and properties of the method are demonstrated via simulations. The proposed multivariate approach has successfully located the true position(s under various genetic scenarios. The more important finding is that using highly densely spaced markers (- cm leads to only a marginal loss of statistical efficiency of the proposed methods in term of gene localization and statistical power. These results have well established its utility and advantages as a fine-mapping tool. A unique property of the proposed method is the ability to map multiple linked trait loci to their precise positions due to its sequential nature as demonstrated via simulations. Key words Categorical traits IBD Linkage Analysis Multivariate 50

2 Introduction A comple disease trait refers to a phenotype that does not follow simple Mendelian segregation attributable to a single gene locus but instead can be caused by multiple disease loci their interactions polygenic inheritance and environmental effects [ ]. It is desirable to take all these factors and their interactions into account in a linkage analysis. Indeed genetic dissection of the roles and functions of various disease-causing factors has been deemed to be a main task in modern human genetic studies. Genetic analysis of a comple disease trait is complicated by its discrete phenotypic nature (usually binary for which a linear relationship between the observable phenotypes and the underlying genetic effects does not eist. Current univariate methods for genetic mapping of comple disease traits analyze one marker (or an off-marker position at a time which may generate two problems. First it does not take into account the correlated structure of multiple linked markers purely due to linkage between them resulting in correlated test statistics. Hence it tends to produce a flat test statistic profile and a wide empirical confidence interval of the estimated trait location. This phenomenon is more obvious when tightly linked marker data (eg. fine mapping data are used. Second a comple trait can be caused by effects of multiple linked trait loci and their interactions. A univariate method might generate a false peak of test statistic at a position between two close trait loci usually called a ghost trait locus. Therefore a multivariate method which is robust to the assumption of the number of trait loci involved and can account of the correlated structure of linked markers is demanding. Current methods for genetic mapping of a categorical disease trait include model-based method [3] model-free method [4] association analysis based on linkage disequilibrium [5] and novel multivariate pattern recognition techniques [6 7]. Although the model-based approaches have been very successful in mapping hundreds of disease-predisposing genes it becomes difficult to do a good model-based linkage analysis of comple human diseases for which the modes of inheritance are usually unknown prior to substantial genetic analyses. Model-free approaches in which no assumption is made about the mode of inheritance of the trait under study are popular in practice due to their simplicity. Typical model-free methods in human genetics are the affected-sib-pair (ASP and the etended affected relative pair tests. This group of methods and the transmission/disequilibrium test (TDT have one disadvantage: they do not make full use all the available data information from non-affected individuals being discarded due to the very nature of the methods. Furthermore it might be very difficult to find linkage disequibrium with comple trait predisposing genes as those genes are often very common thus leading to etreme allelic and nonallelic heterogeneity in the population or pedigrees. According to the ways of modeling the relationship between the outward discrete disease phenotypes and the underlying genetic effects the approaches for genetic mapping of a disease trait can be divided into two classes. A linear model assumes a linear relationship and the analysis is performed as if the discrete phenotype were continuous. A popular eample is (new Haseman-Elston (H-E regression [8 9] in which a second moment quantity (the squared sib pair s phenotypic differences in the old H-E and the mean-corrected product of the sib s phenotypic values in the new H-E is regressed on the proportion of alleles that the sibs share identical by descent (IBD at a marker locus to reveal a potential linkage to an unobservable trait locus. H-E enjoys simplicity robustness of a least squares approach and asymptotic efficiency. It is important to recognize that a linear model often used to analyze linkage between markers and continuous traits is a powerful and robust tool for genetic mapping of discrete disease traits as well [0~3]. A non-linear model such as a generalized linear model takes the non-linear relationship and discrete nature of phenotypes into account. Statistically 5

3 it is desirable but modeling a sophisticated genetic architecture in disease manifestations can be prohibitively comple and computational demand is high especially when random polygenic and common environmental effects are included [4 5]. In this study we propose a novel multivariate approach to mapping disease loci in human genomic studies with an intention to overcome the weakness inherent in current methods. We integrate a standard stepwise discriminant analysis into a sibpair linkage study. The rationale underlying this approach is that if a disease locus is tightly linked to a (some molecular marker(s the differential marker IBD distributions among the affection groups of sibpairs (for a binary human disease trait concordantly affected discordant and concordantly unaffected can be observed because of genetic effects of a disease locus on phenotypic manifestations and its tight linkage to nearby markers. Characteristics of this multivariate approach includes ( it uses information both from affected and unaffected sibs. Hence instead of using a reference population as a control the three affection groups will serve as controls for each other; ( discriminant analysis is a multivariate technique concerned with separating distinct sets of objects (or observations and with allocating new objects to previously defined groups. We search the best discriminants whose numerical values are such that the collections are separated as much as possible; (3 this approach avoids eplicitly modeling relationships between outward ordinal (or binary phenotypes and underlying genetic effects (major gene and polygenic background. In sibpair linkage studies the very act of taking a second moment quantity from marginal phenotypes has prohibited the direct use of a threshold model in non-linearly modeling the relationship between the re-defined phenotype (groups and its components (a quadratic form of the underlying variates; (4 Due to the sequential testing properties of stepwise discriminant analysis it enjoys robustness to an assumption on number of trait loci involved and can control multiple trait loci background. Here we demonstrate its properties using simulations. Statistical Method Consider a sibpair linkage study of a categorical disease trait. Each sib can take any possible ordinal value say c (c = C. We define an affection group (a specific combination of two ordinal values of a sib pair as: G i (i = K. Thus the total number of mutually eclusive groups (K is: C K = C +. C = corresponds to a binary human disease trait (a special case of an ordinal trait with just two categories. We have K = 3 and G i might be defined as: G = concordant affected both sibs in a sib pair are affected G = discordant only one in a sib pair is affected G 0 = concordant unaffected no sibs in a sib pair are affected so that we have a population consisting of three mutually eclusive groups. Net for each sib pair we define a discriminant vector (X which can include the following feature variables:( the estimated proportions of alleles ( πˆ shared IBD by the sib pair at L markers along a chromosome (segment; and ( other covariates (potential confounding factors for a linkage study for eample polygenic inheritance and common environment (aliasing with 5

4 so-called household effect mother effects (including maternal genetic and environmental effects and mitochondrial effects and other epidemiological factors (gender race age and so on. Modelling these covariates serves two purposes: to eliminate their interferences with assessment of linkage and to evaluate their importance in the manifestation of the disease. In a linkage study we are searching for a marker or cluster of markers (tightly linked to an unobserved trait locus whose IBD (the feature variable distributions among the disease affection groups lead to the best-fit partition (grouping to the observed one. Suppose that we have K disease affection groups and M feature variables (L marker IBD variates plus M L covariates. Let N N N K be sample sizes for Gi (i = K. Then the feature vector data for N (N = N i sib pairs can be epressed as: ( ( ( ( ( K ( ( K N ( N ( K N K. Denote µ (i = K the mean vector corresponding to the population Gi (i = i ( i K and assume that follows a multivariate normal distribution with a constant variance-covariance matri i.e. ( i ~ N ( u Σ. i A Wilks s ratio ( equivalent to a likelihood ratio test statistic [6] which is used to test the effects of the feature variates on separation of disease affection groups is constructed as: det W = ( det T where det is determinant W = {w ij } is the within group covariance matri and T = {t ij } is total covariance matri and w t ij ij = = K K a= l = N N a a a= l = where ( ( li li ( i i ( lj lj j j N a j = lj and j = Na l = N i j =... K N a a= l = M lj. A function of (N (M K / ln is a random variable asymptotically following a chi-square distribution with M(K degrees of freedom under the null hypothesis H 0 : µ = µ =... = µ K that is no differences among the mean vectors for K disease affection groups. To assess the contribution from each feature variable we use a stepwise discriminant analysis procedure [7] which combines forward selection and backward elimination by user defined criteria (with the same p-value of 0.05 for inclusion and eclusion in this study. If the feature variable is a marker IBD then assessment of it is equivalent to detection of linkage to a putative trait locus. Suppose now that have been selected from total of M feature variates at the s-th step. To assess the individual contribution of each variable say p 53

5 r from the remaining M p feature variates we partition W and T corresponding to subset p and as: r W W T T W = = W T. W T T where W T and ( i j = have similar meanings as described previously. W ij T ij W and T are p p matrices. W and T are p matrices (vectors. and are matrices (scalars. The Wilks s statistic for p feature variates is: det W p = det T and for p+ variates it is: W T p + W det det W = = W det T T det T W W T T det W det( W = det T det( T = p det( W W det( T T WW W T T T W W. T T Hence p det( T TT T det( W WW W =. det( W W W p+ W F r Under the assumption of multivariate normality we have p N p K = ( ~ F( K N p K. K p+ The variable r with the largest values of the partial F- statistic is added to the current subset consisting of p feature variates provided that it eceeds the specified critical value at α = 0.05 ( F r F. > K N p K We use a similar method in the backward elimination step. First consider deletion a single variable from a set of p feature variables. For each variable the partial F-statistic is computed to test whether it provides additional information over the remaining p- variates. The variate with the smallest partial F-statistic is eliminated first provided that the statistic does not eceed a specified critical value at α = 0.05 ( F r F K N ( p K Stepwise discriminant procedure alternates forward selection and backward elimination. The procedure stops when none of the included variables can be taken out and no further feature variables can be taken in. 54

6 We are cautious about the nominal p values given in SAS stepwise discriminant procedure. Etensive empirical simulations prove that the theoretical p value by SAS is liberal. Hence we resort to a permutation technique to obtain the empirical thresholds. The statistical power is determined by the empirical values instead by the nominal threshold values. Simulation Studies To investigate the efficiency and properties of the proposed discriminant analysis Monte Carlo simulations are carried out. We design five simulation eperiments. Factors considered include ( heritability of the trait locus/loci ( and 0.9 which is defined to be the ratio of segregation variance at the trait locus (loci over total phenotypic variance of the underlying liability (as eplained later for the ordinal disease phenotype; ( marker density ( 5 0 and 0 cm marker spacings; (3 categorical nature of the observations (ordinal versus binary; (4 disease prevalence (5% 0% 0% 30% 50% 70% 90%; (5 sample size ( sib pairs; and (6 two-linked trait loci with various distances between them ( and 90 cm. Only single chromosome segments of different lengths ( and 00 cm for 5 0 and 0 cm marker spacings respectively covered by si evenly spaced codominant markers each having eight equivalent alleles is simulated. A diallelic trait locus is simulated in the middle of the interval between the second and third marker locus in the scenarios of one trait locus. The two alleles of the trait locus are equally frequent. Hence the epected trait and marker heterozygosities are one half and 87.5% respectively. For simplicity simulated pedigrees consist of only nuclear families (two generations each having four full-sibs unless indicated otherwise. The total number of progeny N is fied at 800 ecept for Eperiment 5 in which effects of sample size are investigated. As a result the total number of sib pairs is 00. Under each design the simulation is repeated 00 times. The global statistical power is determined by counting the number of runs that have the largest F statistic among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold at α = 0.05 or 0.0 obtained by simulating 500 replicates under the null model of no trait locus (loci segregating. The local statistical power is determined similarly but now we confine a 0 cm region around the true trait locus (loci. The standard deviations calculated over the 00 replicates represent the standard errors of parameter estimates and test statistics. The simulation of genetic values of the trait locus for a liability starts by randomly assigning two alleles (and their effects to each parent drawn from a random-mating population. A dominant effect is simulated as an interaction between two parental alleles. The liability of each offspring is the sum of its genetic value the overall mean and its residual error sampled from N(0. Effects of various covariates and polygene on disease liability are not simulated. A set of fied thresholds truncate the underlying liability into mutually eclusive and ehaustive intervals which is then translated into observable categorical scores with the desired categorical distributions (prevalence. Eperiment : heritability of the disease locus A single diallelic trait locus is simulated at 5 cm on a chromosome of 50 cm. Si evenly spaced codominant markers with eight equally frequent alleles (one marker at every 0 cm on the chromosome make up the linkage group. Four fied thresholds ( and 0.5 truncate the underlying liability into five categorical scores with incidences of 0.56% 5.3% 4.99% 38.30% and 30.85% respectively. The mean and standard deviations of the location F- statistic and R and statistical power obtained from 00 simulations under each of five trait locus attributed heritabilities are given in Table. 55

7 Table. Effects of heritability of the disease locus on statistical efficiency of the proposed discriminant analysis approach in terms of statistical power and gene localization averaged over 00 replicates. Standard errors are in parentheses. Heritability Location (cm Partial R Test Statistic (F Global statistical power a Local statistical power b α =0.05 α =0.0 α = 0.05 α = ( (0.0.0 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( a Global statistical power is determined by counting the number of runs that have the largest F statistic over the whole region among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. b Local statistical power is determined by counting the number of runs that have the largest F statistic over the 0 cm region around the true trait locus among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. The proposed multivariate approach has successfully found the true position with reasonable biases under low trait locus attributed heritability (h = 0.0 due to interference of false positives. Heritability has a major impact on efficiency of genetic mapping in terms of accuracy of localization and statistical power. With heritability increasing both global and local statistical powers are dramatically increased and the mean of estimated location is closer to the true position. It is obvious that heritability plays an important role in separating disease affection groups via marker IBD information. The proportion of marker IBD variances among sib pairs eplained by disease affection grouping (R increases with the magnitude of heritability. Larger standard errors for the F-statistic under higher heritabilities are due to scaling effects but the coefficients of variation are smaller. Eperiment : Marker density We intend to eplore its potential of the proposed multivariate approach as a fine mapping tool. As increasingly dense marker maps become available a powerful statistical finemapping tool is demanding. The association analysis approaches (for eample TDT and its derivatives are one option. However they consider one marker at a time and are epected to give a large number of false positives due to tight linkage between densely spaced markers. Moreover these approaches suffer from multiple testing problems. The proposed multivariate approach allows for simultaneously testing all markers together with other genetic and environmental factors in single multivariate setting. Theoretically it can remove false positives due to tight linkage between markers which might generate a serious problem in fine mapping situations. Same eperiment design as for Eperiment is used ecept that heritability of a trait locus is fied at Five levels of marker densities from sparsely to densely distributed markers are investigated. As shown in Table marker density influences the performance of the proposed approach in the sibpair linkage analysis of an ordinal trait. There may be an optimal marker density ( 0 cm marker spacings in which genetic mapping by stepwise discriminant analysis approach reaches its maimal power and minimal estimation errors 56

8 occur. If so this would provide a guide to a multi-stage genetic mapping design. The more important finding is that using highly densely spaced markers (- cm leads to little loss of statistical efficiency in term of localization and global statistical power. These results might well establish its utility and advantages as a fine-mapping tool. Table. Effects of marker density on statistical efficiency of the proposed discriminant analysis approach in terms of statistical power and gene localization averaged over 00 replicates. Standard errors are in parentheses. Marker distance a (cm Location (cm Partial R Test Statistic (F Global statistical power b Local statistical power c α = 0.05 α = 0.0 α = 0.05 α = ( ( ( ( (5 5.8 ( ( ( ( ( ( ( (3 3. ( ( ( (.5.7 ( ( ( a The true location of a trait locus is in parentheses for this column. b Global statistical power is determined by counting the number of runs that have the largest F statistic over the whole region among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. c Local statistical power is determined by counting the number of runs that have the largest F statistic over the 0 cm region around the true trait locus among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. Eperiment 3: An binary trait and disease prevalence A comple human disease is often scored as binary (affected and non affected which can be considered as an etreme case of ordinal disease traits with just two categories. The proposed approach utilizes information for both affected and unaffected individuals and classifies phenotypes of a sib pair into three affection groups. Theoretically or suggested by empirical simulations genetic mapping for a binary disease is less efficient than for an ordinal disease trait using generalized linear model based approaches [0]. However this argument may not be applied to a sibpair linkage study using a discriminant analysis approach where we work on a second moment form (group of the marginal phenotypes instead of directly modeling the relationship between the marginal phenotypes and the underlying genetic effects. Moreover in a discriminant analysis affection groups obtained from the marginal ordinal phenotypes of a sib pair are no longer a response variable but instead serve as an eplanatory variable for marker IBD distributions. These characteristics may lead to very different conclusions than other approaches to modelling marginal categorical phenotypes. Table 3. Effects of prevalence of the disease locus on statistical efficiency of the proposed discriminant analysis approach for genetic mapping of comple binary human diseases in terms of statistical power and gene localization averaged over 00 replicates. Standard errors are in parentheses. Prevalence Location (cm Partial R Test Statistic (F Global statistical power a Local statistical power b α =0.05 α = 0.0 α =0.05 α =

9 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( a Global statistical power is determined by counting the number of runs that have the largest F statistic over the whole region among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. b Local statistical power is determined by counting the number of runs that have the largest F statistic over the 0 cm region around the true trait locus among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. We simulate a binary trait with different incidence rates of 0% 0% 30% 50% 70% and 90%. The results are shown in Table 3. Three important conclusions can be made from this simulation eperiment: ( more power can be obtained for a binary disease trait than for an ordinal trait. The proposed multivariate approach for genetic mapping of a binary trait outperforms analyses of an ordinal trait in some situations. Under the same heritability of a trait locus (h = 0.30 the power (94% at α = 0.05 for a binary disease trait with prevalence of 30% is larger than the corresponding ordinal trait (8% see Table for cross reference; ( performance of the proposed approach is not even with respect to symmetry of the marginal binary phenotype. The optimal performance in terms of localization and statistical power is attained at a prevalence of 30%. It is interesting to note that a low prevalence (0% leads to a higher statistical power than a high prevalence (90% which might imply that the proposed method is a powerful tool for genetic mapping of diseases with low prevalences and for randomly sampled data from a general population instead from hospital recruited data. We encourage further studies on this statistical issue. Eperiment 4: Sample size Sampling issues including sample size are important in eperimental design for a genetic mapping study because of a direct association with recruiting costs. Generally a sibpair linkage study requires a larger sample size to achieve an acceptable power of detection of a disease locus (loci than the approaches that eploit the linkage phase information as well for eample the generalized-linear-model-based approaches for plant and animal populations []. However our simulations in the previous section indicate that our novel method favors low disease prevalence in terms of statistical power compared with a high prevalence which implies that data might be recruited from general population instead of from a hospital setting. Hence the requirement of a large sample size in human genomic studies is less restrictive when our proposed multivariate approach is applied because recruiting data from a general population is much easier. To investigate effects of sample size we simulate five levels of sample size: and 50 nuclear families with four sibs each respectively. Hence the corresponding numbers of sib pairs are and 00 respectively. Again a trait with five 58

10 categories is simulated. A single trait locus with a trait attributed heritability of 0.30 situated at 5 cm (between the second and third marker on a 50 cm chromosomal segment covered with si evenly spaced markers is simulated. Logically our simulations show that increasing sample size will increase power and accuracy of localization averaged over 00 replicates. However the partial R the proportion of marker IBD variances among sib pairs eplained by disease affection grouping does not behave as we epected in that a small sample size leads to a large value of it for unknown reasons. Table 4. Effects of sample size on statistical efficiency of the proposed multivariate discriminant analysis approach for genetic mapping of comple ordinal human diseases in terms of statistical power and gene localization averaged over 00 replicates. Standard errors are in parentheses. Sample size Location (cm Partial R Global statistical power a Local statistical power b Test Statistic (F α =0.05 α =0.0 α = 0.05 α = ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( a Global statistical power is determined by counting the number of runs that have the largest F statistic over the whole region among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. b Local statistical power is determined by counting the number of runs that have the largest F statistic over the 0 cm region around the true trait locus among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. Eperiment 5: A binary trait that is controlled by two-linked trait loci A unique property of the proposed method is the ability to map multiple linked trait loci to their precise positions due to its sequential nature. We demonstrate this property by simulating two-linked trait loci with various distances between them. For this purpose a binary trait with an incidence of 30% controlled by two trait loci located on a single chromosome of length of 00 cm covered by eleven evenly spaced codominant markers (each at 0 cm is simulated. For convenience we assume that the two trait loci contribute an equal genetic variance to the variability of the underlying liability. No epistasis between the two loci is simulated. Note that the covariance between the two trait loci due to linkage is m m m m f f f f [( α α ( α α + ( α α ( α α ]( r 4 α q s s where r is the recombination fraction between the two trait loci and ( α represents the effect of gene substitution at the qth (q = trait locus in the sth (s = parent. It is evident that this covariance depends on the recombination fraction between the two loci. The closer the two trait loci the larger the covariance. Under the assumption of no epistasis and gender specific genetic heterogeneity this covariance simplifies to: [( α α ( α α ]( r. 59

11 We fied the total trait locus attributed heritability for the liability to be The gene substitution effects are appropriately scaled under each distance option. Distances between the two trait loci considered are cm respectively. Under the assumption of equal genetic contribution of the two trait loci the corresponding gene substitutions are and respectively which translate the marginal genetic variance of each trait locus to be (h = (h = (h = 0.7 and (h = 0.5. A total of 00 simulations for each distance are conducted for N = 00 sib pairs. The two mean trait-locations are determined by averaging the locations of the two markers with the largest F-statistics (included in the final subset during the stepwise discriminant analysis. The statistical power is obtained by counting the number of replicates with an F-statistic larger than the critical value at α = The averaged estimates of the trait locus parameters (locations the proportion of marker IBD variances among sib pairs eplained by disease affection grouping (R and test statistics and statistical power are listed in Table 5. Their standard errors calculated as deviations of over 00 replicated simulations are given in parentheses. The results prove that the proposed multivariate approach can map multiple linked loci to their precise positions even in the situation that two trait loci are close to each other (0 cm and separated by only two markers. This nice property frees a concern of a ghost quantitative trait locus a wellknown phenomenon for detection of closely linked trait loci by a single trait locus model. This has established its advantages of the proposed method as a genome screening tool as it is robust to the assumption of the number of trait loci involved which is indeed unknown prior to a substantial genetic analysis. Furthermore it saves the comple modeling work for a multiple trait loci model because it is essentially model-free in this aspect. It is interesting to note that the averaged R for each trait locus of two closely linked trait loci (e.g. 0 and 40 cm is higher than that for loosely linked loci although marginal genetic contributions under smaller distances between them are lower. Generally it is more difficult to detect multiple closely linked loci which is demonstrated by a small decrease in statistical power. Table 5. Effects of the distance between two QTLs on statistical efficiency of the proposed discriminant analysis approach for genetic mapping of comple binary human diseases in terms of statistical power and gene localization averaged over 00 replicates. Standard errors are in parentheses. Distance between Location ( cm Partial R Test statistic (F two QTLs a QTL QTL QTL QTL QTL QTL 90 ( ( ( ( ( ( (5.04 Global statistical power b QTL α = 0.05 (α = (8 QTL α = 0.05 (α =0.0 8 (79 Local statistical power c QTL α = 0.05 (α = (77 QTL α = 0.05 (α = (66 60 ( ( ( ( ( ( ( (84 93 (90 68 (66 7 (69 40 ( ( ( ( ( ( ( (8 93 (80 7 (69 80 (69 0 ( ( ( ( ( ( (.45 a The true locations of two trait loci are in parentheses for this column. b Global statistical power is determined by counting the number of runs that have the largest F statistic over the whole region among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or 0.0. c Local statistical power is determined by counting the number of runs that have the largest F statistic over the 0 cm region around the true trait locus among finally included marker IBD variates in the stepwise discriminant procedure greater than the empirical threshold value at α = 0.05 or (65 7 (67 59 (5 55 (53 60

12 3 Discussion In this work we have demonstrated the potential of a stepwise discriminant analysis as a screening tool in a linkage study. The advantage of using stepwise discriminant analyses is that it allows for the simultaneous testing of all marker and environmental factors in a single setting. Thus the discriminant analysis method identifies not only single gene actions but also combinations of gene actions (for eample epistasis. Because the analysis is carried out in a single run the method does not suffer from the problems associated with multiple testing. Another important feature of the proposed method is that it is essentially model-free in that no specific relationship between the response variable and independent variables is assumed. Hence it is not sensitive to model misspecifications as is a linear model or a generalized linear model. A discriminant analysis has some analogies to a logistic regression. For eample both can be used to analyze discrete data. However we point out a weakness in the application of logistic regression in a sibpair linkage study. The very act of taking cross-product in a sibpair regression has changed the relationship between the model components so that a threshold model implied in a logistic regression that is used to model the relationship between the redefined phenotype (affection state of a sibpair and its components might not valid. Consequently (stepwise logistic regression might have a lower statistical power [7]. It is interesting to compare a discriminant analysis with a logistic regression. Though a bit surprising it is logical that a discriminant analysis performs better than a logistic regression because it is generally inferior to a discriminant analysis when multivariate assumptions hold or are nearly true [6]. In discriminant analysis we need distributional assumptions on the feature vector but not on the grouping variable which is the key difference with logistic models. The sequential test properties of a (stepwise discriminant analysis might suggest that it can be used to refine genetic mapping of a discrete disease trait as well. Our own simulation study suggests that it can efficiently localize a trait locus to within less than a cm interval using fine map data. This advantage and others deserve to receive further investigation as it might provide some great insights on fine mapping of human diseases. We assumed in this study that the feature vectors for sib pairs are independent and follow multivariate normal distributions both of which can be violated in the contet of a sib pair linkage study especially when large sibships are recruited. For eample the IBD vectors of sib pairs within a family tend to be correlated. Also it may happen that some feature variables are not normally distributed. Moreover the p-values reported by the stepwise discriminant analysis procedure of SAS are liberal. These issues deserve attentions in future studies. Acknowledgements XL ZG and TZ are supported partly by Grant NSF from the National Science and Technology Committee of China and by China Scholarship Council. SR RCE and JMO are supported partly by the Grant HG0577 from the National Center of Human Genome Research of USA GM8356 from the National Institute of General Medical Sciences. Some of the results of this paper were obtained by using the program package S.A.G.E. supported by a United States Public Health Service Resources Grant RR03655 from the National Center for Research Resources. 6

13 References Terwilliger J D. Linkage analysis model-based. In Armitage P and Colton T Editors-in-Chief Encyclopedia of Biostatistics. West Susse: John Wiley & Sons Ltd. 998 Vol 3: 79~9 Olson J M Witte J S and Elston R C. Genetic Mapping of Comple Traits. Statist Med 999 8: 96~98 3 Ott J. Analysis of Human Genetic Linkage. Baltimore: John Hopkins University Press 99 4 Olson J M. Linkage analysis model-free. In Armitage P and Colton T Editors-in-Chief Encyclopedia of Biostatistics. West Susse: John Wiley & Sons Ltd. 998 Vol 3: Spielman R S Ewen W J. The TDT and other family-based tests for linkage disequilibrium and association. Am J Human Genet : 983~989 6 Lucek P R Ott J. Neural network analysis of comple traits. Genet Epidemiol 997 4: 0~06 7 Li X Rao S Elston R C et al. Locating the genes underlying a simulated comple disease by discriminant analysis. Genet Epidemiol 00 (in press 8 Elston R C Bubaum S Jacobs K B et al. Haseman and Elston revisited. Genet Epidemiol 000 8: ~7 9 Elston R C et al. S.A.G.E.: Statistical Analysis for Genetic Epidemiology Beta The Department of Epidemiology and Biostatistics Rammelkamp Center for Education and Research Case Western Reserve University Cleveland Rao S Xu S. Mapping quantitative trait loci for ordered categorical traits in four-way crosses. Heredity 998 8: 4~4 Rao S Li X. Strategies for genetic mapping of categorical traits. Genetica : 83~97 Rao S Li X Guo Z et al. Mapping quantitative trait loci of ordinal traits using a structural heteroskedastic threshold model: I. Theory. Mathematia Applicata 00 4(3: 46~5 3 Rao S Zhang Q Li X et al. Mapping quantitative trait loci of ordinal traits using a structural heteroskedastic threshold model: II. Numerical application. Mathematica Applicata 004(: 5~3 4 Yi N Xu S. Bayesian mapping of quantitative loci under the identity-by-descent-based variance component model. Genetics : 4~4 5 Yi N Xu S. Bayesian mapping of quantitative trait loci for comple binary traits. Genetics : 39~403 6 Mclachlan G J. Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley ~400 7 SAS Institute Inc. SAS/STAT@ User s Guide. Version 6 Forth Edition Volume Cary NC: SAS Institute Inc ~509 6

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs (3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable

More information

Experimental Design and Sample Size Requirement for QTL Mapping

Experimental Design and Sample Size Requirement for QTL Mapping Experimental Design and Sample Size Requirement for QTL Mapping Zhao-Bang Zeng Bioinformatics Research Center Departments of Statistics and Genetics North Carolina State University zeng@stat.ncsu.edu 1

More information

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping

POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping POPULATION GENETICS Winter 2005 Lecture 18 Quantitative genetics and QTL mapping - from Darwin's time onward, it has been widely recognized that natural populations harbor a considerably degree of genetic

More information

Genetics of dairy production

Genetics of dairy production Genetics of dairy production E-learning course from ESA Charlotte DEZETTER ZBO101R11550 Table of contents I - Genetics of dairy production 3 1. Learning objectives... 3 2. Review of Mendelian genetics...

More information

Human linkage analysis. fundamental concepts

Human linkage analysis. fundamental concepts Human linkage analysis fundamental concepts Genes and chromosomes Alelles of genes located on different chromosomes show independent assortment (Mendel s 2nd law) For 2 genes: 4 gamete classes with equal

More information

An introductory overview of the current state of statistical genetics

An introductory overview of the current state of statistical genetics An introductory overview of the current state of statistical genetics p. 1/9 An introductory overview of the current state of statistical genetics Cavan Reilly Division of Biostatistics, University of

More information

Quantitative Genetics

Quantitative Genetics Quantitative Genetics Polygenic traits Quantitative Genetics 1. Controlled by several to many genes 2. Continuous variation more variation not as easily characterized into classes; individuals fall into

More information

Exploring the Genetic Basis of Congenital Heart Defects

Exploring the Genetic Basis of Congenital Heart Defects Exploring the Genetic Basis of Congenital Heart Defects Sanjay Siddhanti Jordan Hannel Vineeth Gangaram szsiddh@stanford.edu jfhannel@stanford.edu vineethg@stanford.edu 1 Introduction The Human Genome

More information

Exact Multipoint Quantitative-Trait Linkage Analysis in Pedigrees by Variance Components

Exact Multipoint Quantitative-Trait Linkage Analysis in Pedigrees by Variance Components Am. J. Hum. Genet. 66:1153 1157, 000 Report Exact Multipoint Quantitative-Trait Linkage Analysis in Pedigrees by Variance Components Stephen C. Pratt, 1,* Mark J. Daly, 1 and Leonid Kruglyak 1 Whitehead

More information

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions

Near-Balanced Incomplete Block Designs with An Application to Poster Competitions Near-Balanced Incomplete Block Designs with An Application to Poster Competitions arxiv:1806.00034v1 [stat.ap] 31 May 2018 Xiaoyue Niu and James L. Rosenberger Department of Statistics, The Pennsylvania

More information

Conifer Translational Genomics Network Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural Project Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 4 Quantitative Genetics Nicholas Wheeler & David Harry Oregon

More information

Report. General Equations for P t, P s, and the Power of the TDT and the Affected Sib-Pair Test. Ralph McGinnis

Report. General Equations for P t, P s, and the Power of the TDT and the Affected Sib-Pair Test. Ralph McGinnis Am. J. Hum. Genet. 67:1340 1347, 000 Report General Equations for P t, P s, and the Power of the TDT and the Affected Sib-Pair Test Ralph McGinnis Biotechnology and Genetics, SmithKline Beecham Pharmaceuticals,

More information

Package powerpkg. February 20, 2015

Package powerpkg. February 20, 2015 Type Package Package powerpkg February 20, 2015 Title Power analyses for the affected sib pair and the TDT design Version 1.5 Date 2012-10-11 Author Maintainer (1) To estimate the power

More information

Genetic association tests: A method for the joint analysis of family and case-control data

Genetic association tests: A method for the joint analysis of family and case-control data Genetic association tests: A method for the joint analysis of family and case-control data Courtney Gray-McGuire, 1,2* Murielle Bochud, 3 Robert Goodloe 4 and Robert C. Elston 1 1 Department of Epidemiology

More information

Introduction to quantitative genetics

Introduction to quantitative genetics 8 Introduction to quantitative genetics Purpose and expected outcomes Most of the traits that plant breeders are interested in are quantitatively inherited. It is important to understand the genetics that

More information

LECTURE 5: LINKAGE AND GENETIC MAPPING

LECTURE 5: LINKAGE AND GENETIC MAPPING LECTURE 5: LINKAGE AND GENETIC MAPPING Reading: Ch. 5, p. 113-131 Problems: Ch. 5, solved problems I, II; 5-2, 5-4, 5-5, 5.7 5.9, 5-12, 5-16a; 5-17 5-19, 5-21; 5-22a-e; 5-23 The dihybrid crosses that we

More information

General aspects of genome-wide association studies

General aspects of genome-wide association studies General aspects of genome-wide association studies Abstract number 20201 Session 04 Correctly reporting statistical genetics results in the genomic era Pekka Uimari University of Helsinki Dept. of Agricultural

More information

Genomic Selection Using Low-Density Marker Panels

Genomic Selection Using Low-Density Marker Panels Copyright Ó 2009 by the Genetics Society of America DOI: 10.1534/genetics.108.100289 Genomic Selection Using Low-Density Marker Panels D. Habier,*,,1 R. L. Fernando and J. C. M. Dekkers *Institute of Animal

More information

Improved Power by Use of a Weighted Score Test for Linkage Disequilibrium Mapping

Improved Power by Use of a Weighted Score Test for Linkage Disequilibrium Mapping Improved Power by Use of a Weighted Score Test for Linkage Disequilibrium Mapping Tao Wang and Robert C. Elston REPORT Association studies offer an exciting approach to finding underlying genetic variants

More information

Strategy for applying genome-wide selection in dairy cattle

Strategy for applying genome-wide selection in dairy cattle J. Anim. Breed. Genet. ISSN 0931-2668 ORIGINAL ARTICLE Strategy for applying genome-wide selection in dairy cattle L.R. Schaeffer Department of Animal and Poultry Science, Centre for Genetic Improvement

More information

An introduction to genetics and molecular biology

An introduction to genetics and molecular biology An introduction to genetics and molecular biology Cavan Reilly September 5, 2017 Table of contents Introduction to biology Some molecular biology Gene expression Mendelian genetics Some more molecular

More information

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University

Machine learning applications in genomics: practical issues & challenges. Yuzhen Ye School of Informatics and Computing, Indiana University Machine learning applications in genomics: practical issues & challenges Yuzhen Ye School of Informatics and Computing, Indiana University Reference Machine learning applications in genetics and genomics

More information

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening.

MAS refers to the use of DNA markers that are tightly-linked to target loci as a substitute for or to assist phenotypic screening. Marker assisted selection in rice Introduction The development of DNA (or molecular) markers has irreversibly changed the disciplines of plant genetics and plant breeding. While there are several applications

More information

Using the Noninformative Families in Family-Based Association Tests: A Powerful New Testing Strategy

Using the Noninformative Families in Family-Based Association Tests: A Powerful New Testing Strategy Am. J. Hum. Genet. 73:801 811, 2003 Using the Noninformative Families in Family-Based Association Tests: A Powerful New Testing Strategy Christoph Lange, 1 Dawn DeMeo, 2 Edwin K. Silverman, 2 Scott T.

More information

Genome-wide association studies (GWAS) Part 1

Genome-wide association studies (GWAS) Part 1 Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations

More information

Case-Control the analysis of Biomarker data using SAS Genetic Procedure

Case-Control the analysis of Biomarker data using SAS Genetic Procedure PharmaSUG 2013 - Paper SP05 Case-Control the analysis of Biomarker data using SAS Genetic Procedure JAYA BAVISKAR, INVENTIV HEALTH CLINICAL, MUMBAI, INDIA ABSTRACT Genetics aids to identify any susceptible

More information

Chapter 14: Mendel and the Gene Idea

Chapter 14: Mendel and the Gene Idea Chapter 4: Mendel and the Gene Idea. The Experiments of Gregor Mendel 2. Beyond Mendelian Genetics 3. Human Genetics . The Experiments of Gregor Mendel Chapter Reading pp. 268-276 TECHNIQUE Parental generation

More information

I See Dead People: Gene Mapping Via Ancestral Inference

I See Dead People: Gene Mapping Via Ancestral Inference I See Dead People: Gene Mapping Via Ancestral Inference Paul Marjoram, 1 Lada Markovtsova 2 and Simon Tavaré 1,2,3 1 Department of Preventive Medicine, University of Southern California, 1540 Alcazar Street,

More information

Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL Version 1.1: A Tutorial and Reference Manual

Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL Version 1.1: A Tutorial and Reference Manual Whitehead Institute Mapping Genes Controlling Quantitative Traits Using MAPMAKER/QTL Version 1.1: A Tutorial and Reference Manual Stephen E. Lincoln, Mark J. Daly, and Eric S. Lander A Whitehead Institute

More information

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies

Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies p. 1/20 Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies David J. Balding Centre

More information

The principles of QTL analysis (a minimal mathematics approach)

The principles of QTL analysis (a minimal mathematics approach) Journal of Experimental Botany, Vol. 49, No. 37, pp. 69 63, October 998 The principles of QTL analysis (a minimal mathematics approach) M.J. Kearsey Plant Genetics Group, School of Biological Sciences,

More information

Genomic Selection using low-density marker panels

Genomic Selection using low-density marker panels Genetics: Published Articles Ahead of Print, published on March 18, 2009 as 10.1534/genetics.108.100289 Genomic Selection using low-density marker panels D. Habier 1,2, R. L. Fernando 2 and J. C. M. Dekkers

More information

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros DNA Collection Data Quality Control Suzanne M. Leal Baylor College of Medicine sleal@bcm.edu Copyrighted S.M. Leal 2016 Blood samples For unlimited supply of DNA Transformed cell lines Buccal Swabs Small

More information

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS

DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS DESIGNS FOR QTL DETECTION IN LIVESTOCK AND THEIR IMPLICATIONS FOR MAS D.J de Koning, J.C.M. Dekkers & C.S. Haley Roslin Institute, Roslin, EH25 9PS, United Kingdom, DJ.deKoning@BBSRC.ac.uk Iowa State University,

More information

ARTICLE ROADTRIPS: Case-Control Association Testing with Partially or Completely Unknown Population and Pedigree Structure

ARTICLE ROADTRIPS: Case-Control Association Testing with Partially or Completely Unknown Population and Pedigree Structure ARTICLE ROADTRIPS: Case-Control Association Testing with Partially or Completely Unknown Population and Pedigree Structure Timothy Thornton 1 and Mary Sara McPeek 2,3, * Genome-wide association studies

More information

Park /12. Yudin /19. Li /26. Song /9

Park /12. Yudin /19. Li /26. Song /9 Each student is responsible for (1) preparing the slides and (2) leading the discussion (from problems) related to his/her assigned sections. For uniformity, we will use a single Powerpoint template throughout.

More information

Computational Workflows for Genome-Wide Association Study: I

Computational Workflows for Genome-Wide Association Study: I Computational Workflows for Genome-Wide Association Study: I Department of Computer Science Brown University, Providence sorin@cs.brown.edu October 16, 2014 Outline 1 Outline 2 3 Monogenic Mendelian Diseases

More information

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

Linkage & Genetic Mapping in Eukaryotes. Ch. 6 Linkage & Genetic Mapping in Eukaryotes Ch. 6 1 LINKAGE AND CROSSING OVER! In eukaryotic species, each linear chromosome contains a long piece of DNA A typical chromosome contains many hundred or even

More information

Chapter 14. Mendel and the Gene Idea

Chapter 14. Mendel and the Gene Idea Chapter 14 Mendel and the Gene Idea Gregor Mendel Gregor Mendel documented a particular mechanism for inheritance. Mendel developed his theory of inheritance several decades before chromosomes were observed

More information

Use of sib-pair linkage methods for the estimation of the genetic variance at a quantitative trait locus

Use of sib-pair linkage methods for the estimation of the genetic variance at a quantitative trait locus Original article Use of sib-pair linkage methods for the estimation of the genetic variance at a quantitative trait locus H Hamann, KU Götz Bayerische Landesanstalt f3r Tierzv,cht, Prof Diirrwaechter-Platz

More information

Harbingers of Failure: Online Appendix

Harbingers of Failure: Online Appendix Harbingers of Failure: Online Appendix Eric Anderson Northwestern University Kellogg School of Management Song Lin MIT Sloan School of Management Duncan Simester MIT Sloan School of Management Catherine

More information

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action

Gene Linkage and Genetic. Mapping. Key Concepts. Key Terms. Concepts in Action Gene Linkage and Genetic 4 Mapping Key Concepts Genes that are located in the same chromosome and that do not show independent assortment are said to be linked. The alleles of linked genes present together

More information

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human?

TEST FORM A. 2. Based on current estimates of mutation rate, how many mutations in protein encoding genes are typical for each human? TEST FORM A Evolution PCB 4673 Exam # 2 Name SSN Multiple Choice: 3 points each 1. The horseshoe crab is a so-called living fossil because there are ancient species that looked very similar to the present-day

More information

Concepts: What are RFLPs and how do they act like genetic marker loci?

Concepts: What are RFLPs and how do they act like genetic marker loci? Restriction Fragment Length Polymorphisms (RFLPs) -1 Readings: Griffiths et al: 7th Edition: Ch. 12 pp. 384-386; Ch.13 pp404-407 8th Edition: pp. 364-366 Assigned Problems: 8th Ch. 11: 32, 34, 38-39 7th

More information

Adaptive Design for Clinical Trials

Adaptive Design for Clinical Trials Adaptive Design for Clinical Trials Mark Chang Millennium Pharmaceuticals, Inc., Cambridge, MA 02139,USA (e-mail: Mark.Chang@Statisticians.org) Abstract. Adaptive design is a trial design that allows modifications

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

Genomic models in bayz

Genomic models in bayz Genomic models in bayz Luc Janss, Dec 2010 In the new bayz version the genotype data is now restricted to be 2-allelic markers (SNPs), while the modeling option have been made more general. This implements

More information

http://genemapping.org/ Epistasis in Association Studies David Evans Law of Independent Assortment Biological Epistasis Bateson (99) a masking effect whereby a variant or allele at one locus prevents

More information

Genetics - Problem Drill 05: Genetic Mapping: Linkage and Recombination

Genetics - Problem Drill 05: Genetic Mapping: Linkage and Recombination Genetics - Problem Drill 05: Genetic Mapping: Linkage and Recombination No. 1 of 10 1. A corn geneticist crossed a crinkly dwarf (cr) and male sterile (ms) plant; The F1 are male fertile with normal height.

More information

Genome-Wide Association Studies (GWAS): Computational Them

Genome-Wide Association Studies (GWAS): Computational Them Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus

More information

Examination of Cross Validation techniques and the biases they reduce.

Examination of Cross Validation techniques and the biases they reduce. Examination of Cross Validation techniques and the biases they reduce. Dr. Jon Starkweather, Research and Statistical Support consultant. The current article continues from last month s brief examples

More information

Mapping and Mapping Populations

Mapping and Mapping Populations Mapping and Mapping Populations Types of mapping populations F 2 o Two F 1 individuals are intermated Backcross o Cross of a recurrent parent to a F 1 Recombinant Inbred Lines (RILs; F 2 -derived lines)

More information

Conifer Translational Genomics Network Coordinated Agricultural Project

Conifer Translational Genomics Network Coordinated Agricultural Project Conifer Translational Genomics Network Coordinated Agricultural Project Genomics in Tree Breeding and Forest Ecosystem Management ----- Module 2 Genes, Genomes, and Mendel Nicholas Wheeler & David Harry

More information

A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL. Abstracts

A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL. Abstracts Proceedings 59th ISI World Statistics Congress, 5-30 August 013, Hong Kong (Session CPS03) p.3947 A strategy for multiple linkage disequilibrium mapping methods to validate additive QTL Li Yi 1,4, Jong-Joo

More information

10. Lecture Stochastic Optimization

10. Lecture Stochastic Optimization Soft Control (AT 3, RMA) 10. Lecture Stochastic Optimization Genetic Algorithms 10. Structure of the lecture 1. Soft control: the definition and limitations, basics of epert" systems 2. Knowledge representation

More information

Chapter 5: Overview. Overview. Introduction. Genetic linkage and. Genes located on the same chromosome. linkage. recombinant progeny with genotypes

Chapter 5: Overview. Overview. Introduction. Genetic linkage and. Genes located on the same chromosome. linkage. recombinant progeny with genotypes Chapter 5: Genetic linkage and chromosome mapping. Overview Introduction Linkage and recombination of genes in a chromosome Principles of genetic mapping Building linkage maps Chromosome and chromatid

More information

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies

KNN-MDR: a learning approach for improving interactions mapping performances in genome wide association studies Abo Alchamlat and Farnir BMC Bioinformatics (2017) 18:184 DOI 10.1186/s12859-017-1599-7 METHODOLOGY ARTICLE Open Access KNN-MDR: a learning approach for improving interactions mapping performances in genome

More information

GENETIC ALGORITHMS. Narra Priyanka. K.Naga Sowjanya. Vasavi College of Engineering. Ibrahimbahg,Hyderabad.

GENETIC ALGORITHMS. Narra Priyanka. K.Naga Sowjanya. Vasavi College of Engineering. Ibrahimbahg,Hyderabad. GENETIC ALGORITHMS Narra Priyanka K.Naga Sowjanya Vasavi College of Engineering. Ibrahimbahg,Hyderabad mynameissowji@yahoo.com priyankanarra@yahoo.com Abstract Genetic algorithms are a part of evolutionary

More information

Inheritance Biology. Unit Map. Unit

Inheritance Biology. Unit Map. Unit Unit 8 Unit Map 8.A Mendelian principles 482 8.B Concept of gene 483 8.C Extension of Mendelian principles 485 8.D Gene mapping methods 495 8.E Extra chromosomal inheritance 501 8.F Microbial genetics

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures

SAS/STAT 14.1 User s Guide. Introduction to Categorical Data Analysis Procedures SAS/STAT 14.1 User s Guide Introduction to Categorical Data Analysis Procedures This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual

More information

Econ 792. Labor Economics. Lecture 6

Econ 792. Labor Economics. Lecture 6 Econ 792 Labor Economics Lecture 6 1 "Although it is obvious that people acquire useful skills and knowledge, it is not obvious that these skills and knowledge are a form of capital, that this capital

More information

Active Learning for Logistic Regression

Active Learning for Logistic Regression Active Learning for Logistic Regression Andrew I. Schein The University of Pennsylvania Department of Computer and Information Science Philadelphia, PA 19104-6389 USA ais@cis.upenn.edu April 21, 2005 What

More information

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini GENE MAPPING Questo documento è pubblicato sotto licenza Creative Commons Attribuzione Non commerciale Condividi allo stesso modo http://creativecommons.org/licenses/by-nc-sa/2.5/deed.it Genetic mapping

More information

using selected samples

using selected samples Original article Mapping QTL in outbred populations using selected samples Mario L. Martinez Natascha Vukasinovic* A.E. Freeman, Rohan L. Fernando Department of Animal Science, Iowa State University, Ames,

More information

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public

More information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1 Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the

More information

APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author

APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT YEAR DATA. Corresponding Author 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 APPLICATION OF SEASONAL ADJUSTMENT FACTORS TO SUBSEQUENT

More information

Genetic Algorithms-Based Model for Multi-Project Human Resource Allocation

Genetic Algorithms-Based Model for Multi-Project Human Resource Allocation Genetic Algorithms-Based Model for Multi-Project Human Resource Allocation Abstract Jing Ai Shijiazhuang University of Applied Technology, Shijiazhuang 050081, China With the constant development of computer

More information

Tutorial Segmentation and Classification

Tutorial Segmentation and Classification MARKETING ENGINEERING FOR EXCEL TUTORIAL VERSION v171025 Tutorial Segmentation and Classification Marketing Engineering for Excel is a Microsoft Excel add-in. The software runs from within Microsoft Excel

More information

Michelle Wang Department of Biology, Queen s University, Kingston, Ontario Biology 206 (2008)

Michelle Wang Department of Biology, Queen s University, Kingston, Ontario Biology 206 (2008) An investigation of the fitness and strength of selection on the white-eye mutation of Drosophila melanogaster in two population sizes under light and dark treatments over three generations Image Source:

More information

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium) 7-1 Biology 1001 Lab 7: POPULATION GENETICS PREPARTION Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium) OBECTIVES At the end of

More information

Chapter 6. Linkage Analysis and Mapping. Three point crosses mapping strategy examples. ! Mapping human genes

Chapter 6. Linkage Analysis and Mapping. Three point crosses mapping strategy examples. ! Mapping human genes Chapter 6 Linkage Analysis and Mapping Three point crosses mapping strategy examples! Mapping human genes Three point crosses Faster and more accurate way to map genes Simultaneous analysis of three markers

More information

Genetics and Psychiatric Disorders Lecture 1: Introduction

Genetics and Psychiatric Disorders Lecture 1: Introduction Genetics and Psychiatric Disorders Lecture 1: Introduction Amanda J. Myers LABORATORY OF FUNCTIONAL NEUROGENOMICS All slides available @: http://labs.med.miami.edu/myers Click on courses First two links

More information

White Paper. AML Customer Risk Rating. Modernize customer risk rating models to meet risk governance regulatory expectations

White Paper. AML Customer Risk Rating. Modernize customer risk rating models to meet risk governance regulatory expectations White Paper AML Customer Risk Rating Modernize customer risk rating models to meet risk governance regulatory expectations Contents Executive Summary... 1 Comparing Heuristic Rule-Based Models to Statistical

More information

"Genetics in geographically structured populations: defining, estimating and interpreting FST."

Genetics in geographically structured populations: defining, estimating and interpreting FST. University of Connecticut DigitalCommons@UConn EEB Articles Department of Ecology and Evolutionary Biology 9-1-2009 "Genetics in geographically structured populations: defining, estimating and interpreting

More information

Dr. Mallery Biology Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS

Dr. Mallery Biology Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS Dr. Mallery Biology 150 - Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS CELL REPRODUCTION The goal of today's exercise is for you to look at mitosis and meiosis and to develop the ability

More information

Observing Patterns In Inherited Traits

Observing Patterns In Inherited Traits Observing Patterns In Inherited Traits Ø Where Modern Genetics Started/ Gregor Mendel Ø Law of Segregation Ø Law of Independent Assortment Ø Non-Mendelian Inheritance Ø Complex Variations in Traits Genetics:

More information

Chapter 1. Overview. 1.1 Introduction

Chapter 1. Overview. 1.1 Introduction Chapter 1 Overview 1.1 Introduction 1 1.2 Book Organization 2 1.3 SAS Usage 4 1.3.1 Example of a Basic SAS DATA Step 4 1.3.2 Example of a Basic Macro 5 1.4 References 6 1.1 Introduction Some 100 years

More information

Mendel and the Gene Idea

Mendel and the Gene Idea LECTURE PRESENTATIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry, Michael L. Cain, Steven A. Wasserman, Peter V. Minorsky, Robert B. Jackson Chapter 14 Mendel and the Gene Idea Lectures

More information

Indentification and Mapping of Unknown Mutations in the Fruit Fly in Drosophila melanogaster. By Michael Tekin and Vincent Saraceno

Indentification and Mapping of Unknown Mutations in the Fruit Fly in Drosophila melanogaster. By Michael Tekin and Vincent Saraceno Indentification and Mapping of Unknown Mutations in the Fruit Fly in Drosophila melanogaster By Michael Tekin and Vincent Saraceno Bilology 332 Section 2 December 5, 2012 Abstract The code of the unknown

More information

Genetic drift. 1. The Nature of Genetic Drift

Genetic drift. 1. The Nature of Genetic Drift Genetic drift. The Nature of Genetic Drift To date, we have assumed that populations are infinite in size. This assumption enabled us to easily calculate the expected frequencies of alleles and genotypes

More information

HARDY-WEINBERG EQUILIBRIUM

HARDY-WEINBERG EQUILIBRIUM 29 HARDY-WEINBERG EQUILIBRIUM Objectives Understand the Hardy-Weinberg principle and its importance. Understand the chi-square test of statistical independence and its use. Determine the genotype and allele

More information

Estoril Education Day

Estoril Education Day Estoril Education Day -Experimental design in Proteomics October 23rd, 2010 Peter James Note Taking All the Powerpoint slides from the Talks are available for download from: http://www.immun.lth.se/education/

More information

Journal of Asian Scientific Research

Journal of Asian Scientific Research Journal of Asian Scientific Research journal homepage: http://aessweb.com/journal-detail.php?id=5003 A METAFRONTIER PRODUCTION FUNCTION FOR ESTIMATION OF TECHNICAL EFFICIENCIES OF WHEAT FARMERS UNDER DIFFERENT

More information

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE

COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE COORDINATING DEMAND FORECASTING AND OPERATIONAL DECISION-MAKING WITH ASYMMETRIC COSTS: THE TREND CASE ABSTRACT Robert M. Saltzman, San Francisco State University This article presents two methods for coordinating

More information

Exam 1 Answers Biology 210 Sept. 20, 2006

Exam 1 Answers Biology 210 Sept. 20, 2006 Exam Answers Biology 20 Sept. 20, 2006 Name: Section:. (5 points) Circle the answer that gives the maximum number of different alleles that might exist for any one locus in a normal mammalian cell. A.

More information

Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations

Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations Gorjanc et al. BMC Genomics (2016) 17:30 DOI 10.1186/s12864-015-2345-z RESEARCH ARTICLE Open Access Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace

More information

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation.

We can use a Punnett Square to determine how the gametes will recombine in the next, or F2 generation. AP Lab 7: The Mendelian Genetics of Corn Objectives: In this laboratory investigation, you will: Use corn to study genetic crosses, recognize contrasting phenotypes, collect data from F 2 ears of corn,

More information

DATA PREPROCESSING METHOD FOR COST ESTIMATION OF BUILDING PROJECTS

DATA PREPROCESSING METHOD FOR COST ESTIMATION OF BUILDING PROJECTS DATA PREPROCESSING METHOD FOR COST ESTIMATION OF BUILDING PROJECTS Sae-Hyun Ji Ph.D. Candidate, Dept. of Architecture, Seoul National Univ., Seoul, Korea oldclock@snu.ac.kr Moonseo Park Professor, Ph.D.,

More information

COMPUTER SIMULATIONS AND PROBLEMS

COMPUTER SIMULATIONS AND PROBLEMS Exercise 1: Exploring Evolutionary Mechanisms with Theoretical Computer Simulations, and Calculation of Allele and Genotype Frequencies & Hardy-Weinberg Equilibrium Theory INTRODUCTION Evolution is defined

More information

CHAPTER 4 STURTEVANT: THE FIRST GENETIC MAP: DROSOPHILA X CHROMOSOME LINKED GENES MAY BE MAPPED BY THREE-FACTOR TEST CROSSES STURTEVANT S EXPERIMENT

CHAPTER 4 STURTEVANT: THE FIRST GENETIC MAP: DROSOPHILA X CHROMOSOME LINKED GENES MAY BE MAPPED BY THREE-FACTOR TEST CROSSES STURTEVANT S EXPERIMENT CHAPTER 4 STURTEVANT: THE FIRST GENETIC MAP: DROSOPHILA X CHROMOSOME In 1913, Alfred Sturtevant drew a logical conclusion from Morgan s theories of crossing-over, suggesting that the information gained

More information

Overview of Statistics used in QbD Throughout the Product Lifecycle

Overview of Statistics used in QbD Throughout the Product Lifecycle Overview of Statistics used in QbD Throughout the Product Lifecycle August 2014 The Windshire Group, LLC Comprehensive CMC Consulting Presentation format and purpose Method name What it is used for and/or

More information

Control Charts for Customer Satisfaction Surveys

Control Charts for Customer Satisfaction Surveys Control Charts for Customer Satisfaction Surveys Robert Kushler Department of Mathematics and Statistics, Oakland University Gary Radka RDA Group ABSTRACT Periodic customer satisfaction surveys are used

More information

1 why study multiple traits together?

1 why study multiple traits together? Multiple Traits & Microarrays why map multiple traits together? central dogma via microarrays diabetes case study why are traits correlated? close linkage or pleiotropy? how to handle high throughput?

More information

Practical integration of genomic selection in dairy cattle breeding schemes

Practical integration of genomic selection in dairy cattle breeding schemes 64 th EAAP Meeting Nantes, 2013 Practical integration of genomic selection in dairy cattle breeding schemes A. BOUQUET & J. JUGA 1 Introduction Genomic selection : a revolution for animal breeders Big

More information

Basic Concepts of Human Genetics

Basic Concepts of Human Genetics Basic Concepts of Human Genetics The genetic information of an individual is contained in 23 pairs of chromosomes. Every human cell contains the 23 pair of chromosomes. One pair is called sex chromosomes

More information

GENETIC DRIFT INTRODUCTION. Objectives

GENETIC DRIFT INTRODUCTION. Objectives 2 GENETIC DRIFT Objectives Set up a spreadsheet model of genetic drift. Determine the likelihood of allele fixation in a population of 0 individuals. Evaluate how initial allele frequencies in a population

More information

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron Genotype calling Genotyping methods for Affymetrix arrays Genotyping

More information