GENES IN POPULATIONS and MULTIFACTORIAL INHERITANCE Peter D'Eustachio

Size: px
Start display at page:

Download "GENES IN POPULATIONS and MULTIFACTORIAL INHERITANCE Peter D'Eustachio"

Transcription

1 GENES IN POPULATIONS and MULTIFACTORIAL INHERITANCE Peter D'Eustachio GOALS OF THIS SEGMENT OF THE COURSE Understand the use of the Hardy-Weinberg equation to relate allele and genotype frequencies in ideal populations; the factors that cause real human populations to deviate from ideality limited population size, assortive mating, mutation, migration, and selection; and the kinds of genetic variation that can be found in a human population (key terms STR, SNP, haplotype, linkage disequilibrium). Be able to apply these qualitative and quantitative concepts to risk calculations. Understand the concepts of familiality, association analysis, modifier genes, and their use in determining whether a phenotype has a genetic basis and, if so, in identifying loci that might contribute to determining that phenotype. (IN)HOMOGENEITY OF POLYMORPHIC POPULATIONS: HARDY-WEINBERG EQUI- LIBRIUM AND DEVIATIONS FROM IT You have already used the Hardy-Weinberg equation to estimate the frequency of genotypes involving a recessive allele from the frequencies of observable phenotypes in a population: p 2 + 2pq + q 2 = 1 where p and q are the frequencies of the dominant and recessive allelic forms of the gene, respectively. The equation can also be generalized to account for multiple alleles, including codominant ones. Deviations from Hardy-Weinberg Equilibrium. The Hardy-Weinberg equation depends on several unrealistic assumptions about human populations. Looking at these assumptions helps us to understand both how variations in allele frequency between populations arise, and when the Hardy-Weinberg equation can safely be used to estimate allele frequencies. The assumptions are that: - there are no outside influences such as natural selection; - the population is large; - no new mutations occur at the locus of interest; and - mating within the population occurs at random. Natural selection. Allele frequency represents a balance between two factors: the mutation rate of the gene and the effects of selection. Consider selection first. Selection can favor or disfavor a particular genotype. Genetic fitness is the ability to reproduce. If an allele increases fitness, it will gradually become more prevalent in the population, and if it decreases fitness, it will gradually become less prevalent. Some alleles can either increase or decrease fitness, depending on the rest of an individual s genotype and the individual s environment. The high prevalence of some genetic disorders can be attributed to a balance between a decreased fitness of affected individuals and an increased fitness of heterozygotes. Since most of the alleles are present in heterozygotes, a small increase in the fitness of heterozygotes can lead to a large increase in allele frequency. The sickle mutant allele of β-globin is a classic example; another is glucose 6-phosphate dehydrogenase deficiency (below). Changes in the selective pressure on affected individuals (e.g. abortion or new therapies) will alter the allele frequency of a population only very slowly, because most affected alleles are in heterozygous individuals. 1

2 Founder effects, genetic drift and bottlenecks. If one of the founders of a small population is a carrier for a rare allele and has a substantial number of children, the frequency of the allele within this sub-population will be much greater than in the population as a whole. Because of the increased frequency of the allele, later generations of the sub-population may experience a high incidence of genetic disorders that are only rarely seen in the population as a whole, if the subpopulation remains reproductively isolated. Founder effects are seen in many populations throughout the world. If a population is small, some generations may by chance receive a greater than expected percentage of a particular allele. This increase in allele frequency will then be maintained in subsequent generations. Thus, in small populations, genetic drift can lead to an increase in allele frequency and hence to an incidence in corresponding genetic disorders. Gene flow. Migrations into and out of a population can lead to changes in allele frequencies. For example, the most common mutant alleles of the PKU gene are of Celtic origin. In European and US Caucasian populations, a good correlation can be seen between Celtic ancestry and the incidence of PKU. One of the assumptions of the Hardy-Weinberg Law is random mating. People often mate assortively, however. To the extent that members of sub-groups within a population choose mates from the same sub-group, the population is stratified into genetically distinct sub-groups, allowing for founder effects and genetic drift. AMOUNTS AND KINDS OF POLYMORPHISM IN THE HUMAN GENOME Humans are remarkably homogeneous genetically compared to many other mammalian species, but considerable polymorphism has been found in the human genome. By definition, a locus is polymorphic if two or more alleles of it are each present at a frequency of 1% or more. This definition says nothing about whether the alternative alleles have distinguishable phenotypes. As previously discussed, there are tens of thousands of STR loci, and millions of SNP loci, distributed over the genome. Short Tandem Repeat loci. A short nucleotide sequence, typically three to six bases in length, is nearly perfectly repeated. These repeat structures are mutable, perhaps due to slippage within the repeats during DNA replication. Mutation Rates at Human STR Loci, Measured in Paternity Tests STR System Maternal Meioses (%) Paternal Meioses (%) TH01 5/42100 (0.01) 12/74426 (0.02) TPOX 2/28766 (0.01) 10/45374 (0.02) VWA 20/58839 (0.03) 851/ (0.34) D3S1358 0/4889 (<0.02) 9/8029 (0.11) D8S1179 5/6672 (0.07) 29/10952 (0.26) D13S317 33/59500 (0.06) 106/69598 (0.15) These data for some of the markers used in human identity testing illustrate four general points: - STR polymorphisms occur both in coding regions of the genome (e.g., VWA) and non-coding regions (e.g., D3S1358); - mutation rates at STR loci are often very high; - mutation rates can differ from locus to locus; and - male and female mutation rates at any one locus can differ significantly. 2

3 This high mutation rate makes these loci extremely polymorphic, as shown by this summary of typing data for a short tandem repeat locus near the G6PD gene on the long arm of the human X chromosome. Allele frequencies at an STR locus near G6PD in African and Mediterranean populations Allele Population (N) Het Bantu (mixed) (78) Mende (46) Temne (36) Ga (46) Tunisian (58) Lebanese (43) Cypriot (33) Single Nucleotide Polymorphisms. About one base per 1000 in the human genome is polymorphic. Such polymorphisms typically have two alleles, and the mutation rates associated with these polymorphisms are low. SNPs are found in both coding and non-coding sequences, as shown here for the region of the G6PD gene. When a chromosome is transmitted from parent to child, crossing over normally occurs. Nevertheless, large segments are transmitted intact. Even over many generations, a contiguous block of chromosomal material may not be affected by crossing over. Two definitions allow us to talk about such blocks clearly. The alleles at each of a succession of polymorphic sites on a chromosome (SNPs, STRs, or other polymorphisms) together make up a haplotype. When such a haplotype is transmitted over many generations without its alleles getting shuffled by crossing over, this group of markers is said to be in linkage disequilibrium (LD). The Y chromosome, transmitted unchanged from father to son, is an extreme example of linkage disequilibrium. Because SNP markers are so abundant, and new SNP alleles arise by mutation only very rarely, and recent technical developments have made SNP typing on a large scale reliable and cheap, SNPs are the markers of choice for identifying linkage disequilibrium blocks in the human genome. Compiling a list of these blocks for the entire genome is a goal of the Hapmap project ( DISTRIBUTION OF POLYMORPHISMS IN HUMAN POPULATIONS These data also raise the issue of differences in polymorphism between human populations. In the case of the STR polymorphism linked to G6PD, above, while allele 198 is present in all 3

4 groups sampled, it is more common in Mediterranean than African populations, while alleles 201 and 204 are commoner in African than in Mediterranean populations. Surveys of genes and kinds of polymorphism in many human populations show that most variation (~85%) is public : these variants can be found in all human populations, although often at different frequencies. The remaining 15% of variant alleles are private and are found only in one population, or in one closely related group of populations. Nevertheless, even for public variants, numerical differences between groups can be important. For instance, in Finland, the incidence of PKU is ~1/200,000, versus ~1/10,000 for US Caucasians. Also, even when the frequency of a genetic disease is broadly constant over a group such as Caucasians or African-Americans, frequencies of specific mutant alleles can differ sharply between sub-populations within the broad group. Recall, for example, the efficiency of population screening for cystic fibrosis among Ashkenazi Jews versus Caucasians generally, from the second case study. THE EXAMPLE OF GLUCOSE 6-PHOSPHATE DEHYDROGENASE DEFICIENCY Glucose 6-phosphate dehydrogenase (G6PD) function is critical in red blood cells. Its biochemistry will be discussed later, as part of carbohydrate metabolism. Here, we review its population genetics. The gene is X-linked, and mutations that reduce its catalytic efficiency are common. A hemizygous male or homozygous female often is asymptomatic except that drugs that subject red cells to oxidative stress can trigger hemolysis. As the figure shows, such mutant alleles are very common in certain regions of the world. These turn out to be regions where malaria is endemic, as G6PD-mutant individuals are relatively malariaresistant. The figure also shows that different mutant alleles are common in different regions of the world, and that the present-day distributions of these alleles correlate well with known human population movements over the past 5,000 years. Recent SNP typing studies have identified the haplotype background on which each of these mutations arose. That is, each of the various mutations occurred spontaneously in an individual human, several thousand years ago, in a G6PD allele that already had a particular set of nearby SNP polymorphisms associated with it. Because of its selective advantage, each of those mutant alleles became common in the location / human population where it arose. Because of linkage disequilibrium, as particular functional variant G6PD alleles have been transmitted from generation to generation, they have been transmitted together with those neighboring SNPs as a haplotype. 4

5 PHENOTYPIC POLYMORPHISM: CYP2D6 AND DRUG METABOLISM Many drugs are inactivated and cleared from the body by specific metabolic reactions. A common first step is hydroxylation of the drug catalyzed by a member of the cytochrome P450 family of enzymes. Extensive metabolizers Poor metabolizers Each of these enzymes targets multiple related substrates, and both its substrate specificity and overall activity can be affected by polymorphic variation. This variation is shown here for an especially well characterized family member, CYP2D6. The graph shows variation in the ability of members of a British study population to metabolize a model compound, debrisoquine, in vivo. A particularly thorough study was carried out on a German population. This figure shows the results obtained for over 500 individuals, each tested for his or her ability to metabolize dextromethorphan and by PCR assays for his or her CYP2D6 genotype. individual value mean & 95% c.i. The key points from this study are 1) that enzymatic activity correlates well with the number of active alleles, but 2) there is substantial variation in drug-metabolizing activity within each group of CYP2D6-identical people; sometimes over a thousand fold. 5

6 MULTIFACTORIAL INHERITANCE This survey of human polymorphism at a single genetic locus also introduces the problem of multifactorial inheritance: individuals with identical genotypes can differ by orders of magnitude in their quantitative phenotypes. How do we distinguish modifying effects due to polymorphism elsewhere in the genome from modifying effects of the environment? Even if we know that an effect is genetic, how do we identify the specific genes and alleles involved? How many versions of human genetics do we need to deal with the full range of human populations? Genetic factors are thought to play significant roles in determining susceptibility to major diseases such as atherosclerosis and type II diabetes, and in modifying the outcome in affected individuals. Attempts to find single genes responsible for these disease susceptibilities have failed. Indeed, as we have just seen, clear-cut variation in ability to synthesize the relevant enzyme cannot by itself account for variation in people s ability to metabolize drugs. Instead, interactions involving variant alleles at multiple genetic loci and variable environmental effects are all required to determine an individual s phenotype. Familiality, the increased risk of disease in relatives of an affected proband compared to otherwise similar unrelated individuals, is a useful but risky indicator that a trait has a genetic basis, as illustrated by the data in this table. Recurrence Risk in Relatives of an Affected Proband Category of Relative Schizophrenia Med School Attendance Parents 14.1% 11.1% Siblings 21.9% 10.6% Uncles, Aunts 2.1% 2.5% Unrelated 0.8% 0.2% A genetic model in which being affected is due primarily to inheritance of recessive susceptibility alleles at a single autosomal locus fits quite well to phenotypic data from extended families at risk for medical careers and ones at risk for schizophrenia. Twin studies, comparing concordance rates for a trait between monozygotic and dizygotic twins are difficult to carry out on the large scale needed for statistical power. However, they are a gold standard for distinguishing genetic and environmental determinants, as illustrated by data for susceptibility to infectious diseases: Disease Susceptibility in Monozygotic and Dizygotic Twins % Concordance Disease Population MZ DZ Tuberculosis Germany USA UK Leprosy India H. pylori Sweden Poliomyelitis USA 36 6 Hepatitis B Taiwan 35 4 Given a compelling reason to believe that the familial pattern of a trait is truly genetic, researchers can then turn to association studies to try to identify loci that contribute to an individual s risk of showing the trait. 6

7 An example of an association study is the correlation of CAPN10 genetic variation with NIDDM susceptibility in some populations. CAPN10 Haplotype Frequencies in Human Populations Haplotype Mex.Am. Finland Germany Ch.Am. Jp.Am. Pima Zapoteca A G 3x C B G 2x T C A 3x C Calpains are processing proteases. They cleave specific substrates at limited numbers of sites to cause activation or inactivation of the substrate protein function. Calpains have been implicated in the regulation of a variety of cellular functions including ones potentially related to glucose uptake and metabolism. Large scale linkage analyses have defined a locus on chromosome 2, NIDDM1, associated with increased susceptibility to NIDDM in some populations. Recent analyses have shown that a calpain gene, CAPN10, maps to this locus, and have allowed the definition of a SNP-based haplotype. Classification of a Mexican-American population (from Texas) according to CAPN10 SNP haplotype and disease status showed a statistically significant correlation between NIDDM susceptibility and possession of a heterozygous A / B genotype. These are not conventionally dysfunctional alleles: homozygosity for A or for B is associated with no increase in risk, and heterozygosity for A / C is associated with decreased risk. These data leave the molecular basis of the CAPN10 effect unclear, but nicely explain its population specificity. While all of these haplotypes are present in diverse human populations, both haplotypes are present at a high frequency only in some populations, so a protective effect would only be expected there. Advantages of association analysis are that we don t have to know the molecular basis of the phenotype being mapped, and we don t have to study large extended families, as in conventional linkage analysis. Disadvantages are that, in highly stratified populations, associations can be found between unlinked genes (false positives) and linkage disequilibrium blocks are small - sometimes as little as a few kilobases - so SNP markers that in fact are truly near the target gene can fail to show association (false negatives). Also, even a real association result does not show that the polymorphism actually typed is responsible for the variant phenotype of interest - it could equally well be a neutral polymorphism in a nearby gene or in intergenic DNA a few kilobases away from the gene of interest. MATERIALS FOR FURTHER STUDY ON YOUR OWN Basic review Gelehrter, Collins, Ginsberg textbook, pages More advanced and optional The genetics of G6PD and human malaria resistance are reviewed by Luzzato and Notaro, Science 293: (2001). The issue of population-specific variation in drug metabolism is reviewed systematically in Chapter 9 ( Pharmacogenetics, by W Kalow and DM Grant) in Scriver. The specific issue of applying pharmacogenetic concepts to tailor drug treatments to specific American populations was raised, and vigorously debated, recently in the New England Journal of Medicine 344: and 345: (2001). 7