DNA pooling identifies QTLs on chromosome 4 for general cognitive ability in children

Similar documents
Report of Analyzing Short Tandem Repeats for Parentage Testing

1) (15 points) Next to each term in the left-hand column place the number from the right-hand column that best corresponds:

Mutations during meiosis and germ line division lead to genetic variation between individuals

PowerPlex. Y System Validation

Sequence structure of 12 novel Y chromosome microsatellites and PCR amplification strategies

Marker types. Potato Association of America Frederiction August 9, Allen Van Deynze

Module1TheBasicsofRealTimePCR Monday, March 19, 2007

INTERNATIONAL UNION FOR THE PROTECTION OF NEW VARIETIES OF PLANTS GENEVA

!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"!"

Male/female DNA mixtures: a challenge for Y-STR analysis

Basics of AFLP and. microsatellite analysis

SolCAP. Executive Commitee : David Douches Walter De Jong Robin Buell David Francis Alexandra Stone Lukas Mueller AllenVan Deynze

Mapping and Mapping Populations

GENOTYPING BY PCR PROTOCOL FORM MUTANT MOUSE REGIONAL RESOURCE CENTER North America, International

Genomic Sequencing. Genomic Sequencing. Maj Gen (R) Suhaib Ahmed, HI (M)

Overview. Background ~30 min. Lab activity ~50 min. DNA profiling Polymerase Chain Reaction (PCR) Gel Electrophoresis PCR

AmpFlSTR Identifiler PCR Amplification Kit

Why do we need statistics to study genetics and evolution?

b. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.

NEW PARADIGM of BIOTECHNOLOGY - GENET BIO. GeNet Bio Global Gene Network

A comparative performance evaluation of illustra Ready-To-Go GenomiPhi V3 and illustra GenomiPhi V2 DNA amplification kits

rjlflemmers, LUMC, Leiden, The Netherlands 6/3/2010

Authors: Vivek Sharma and Ram Kunwar

BIOLOGY - CLUTCH CH.20 - BIOTECHNOLOGY.

Multiplex SNP Genotyping of Field Corn Crude Samples with Probe-Based Assays using the IntelliQube from Douglas Scientific INTRODUCTION

WORKING GROUP ON BIOCHEMICAL AND MOLECULAR TECHNIQUES AND DNA PROFILING IN PARTICULAR. Eleventh Session Madrid, September 16 to 18, 2008

Biology 445K Winter 2007 DNA Fingerprinting

Executive Summary. clinical supply services

Microsatellite markers

Cross Haplotype Sharing Statistic: Haplotype length based method for whole genome association testing

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Short Tandem Repeat (STR) Analysis

Characterisation of microsatellite DNA markers for Mirbelia bursarioides A.M.Monro & Crisp ms.

STUDY OF VNTR HUMAN POLYMORPHISMS BY PCR

Gene Mapping in Natural Plant Populations Guilt by Association

GlobalFiler TM Extra Cycle Evaluation Report

What is DNA? Deoxyribonucleic Acid The inherited genetic material that makes us what we are

Strategy for Applying Genome-Wide Selection in Dairy Cattle

White Paper: High Throughput SNP Genotyping Using Array Tape in Place of Microplates

INTERNATIONAL UNION FOR THE PROTECTION OF NEW VARIETIES OF PLANTS

Y-chromosome STR haplotypes in a Swedish population

Introduction to some aspects of molecular genetics

HLA-DR TYPING OF GENOMIC DNA

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

QTL Mapping, MAS, and Genomic Selection

Application Note Detecting low copy numbers. Introduction. Methods A08-005B

Comparison of ExoSAP-IT and ExoSAP-IT Express reagents to alternative PCR cleanup methods

7.1 Techniques for Producing and Analyzing DNA. SBI4U Ms. Ho-Lau

Molecular studies (SSR) for screening of genetic variability among direct regenerants of sugarcane clone NIA-98

Y chromosomal STRs in forensics

QTL Mapping Using Multiple Markers Simultaneously

Carcass Traits Association with GH/AluI Gene Polymorphism in Indonesian Aceh Cattle

User Bulletin. Veriti 96-Well Thermal Cycler AmpFlSTR Kit Validation. Overview

GDMS Templates Documentation GDMS Templates Release 1.0

PCR multiplexing for maximising genetic analyses with limited DNA samples: an example in the collared flycatcher, Ficedula albicollis

Genet. Sel. Evol. 39 (2007) c INRA, EDP Sciences, 2007 DOI: /gse: Original article

Course Syllabus for FISH/CMBL 7660 Fall 2008

Book chapter appears in:

Outline of lectures 9-11

Product Sheet Huntington Disease Genemer Control DNA*

GENOTYPING SERVICES. McGill University and Génome Québec Innovation Centre JANUARY 30, Version 2.1

BIOLOGY Dr.Locke Lecture# 27 An Introduction to Polymerase Chain Reaction (PCR)

Comparative assessment of wheat landraces from AWCC, ICARDA and VIR germplasm collections based on the analysis of SSR markers

Molecular LDT in Newborn Screening Laboratories

Overview: The DNA Toolbox

Guidelines for Developing Robust and Reliable PCR Assays

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work

Laboratory Exercise 4. Multiplex PCR of Short Tandem Repeats and Vertical Polyacrylamide Gel Electrophoresis.

Characterization of microsatellites in the fungal plant pathogen, Sclerotinia sclerotiorum

Association studies (Linkage disequilibrium)

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

Concepts: What are RFLPs and how do they act like genetic marker loci?

The Evolution of Short Tandem Repeat (STR) Multiplex Systems

Detection of Biological Threat Agents by Real-Time PCR - Comparison of Assay

Uniparental disomy (UPD) analysis of chromosome 15

Supplementary Methods

The Polymerase Chain Reaction. Chapter 6: Background

Single- and double-ssr primer combined analyses in rice

Genetics and Genomics in Medicine Chapter 3. Questions & Answers

Lecture 8: Sequencing and SNP. Sept 15, 2006

..C C C T C A T T C A T T C A T T C A T T C A..

AmpF STR NGM PCR Amplification Kit - Overview

Genetic Identity. Steve Harris SPASH - Biotechnology

A Modified Digestion-Circularization PCR (DC-PCR) Approach to Detect Hypermutation- Associated DNA Double-Strand Breaks

On the Power to Detect SNP/Phenotype Association in Candidate Quantitative Trait Loci Genomic Regions: A Simulation Study

GreenMasterMix (2X) b i o s c i e n c e. G E N A X X O N b i o s c i e n c e. High ROX (500nM)

POPULATION GENETICS studies the genetic. It includes the study of forces that induce evolution (the

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population.

HELINI Hepatitis B virus [HBV] Real-time PCR Kit (Genotype A to H)

Quantitation of mrna Using Real-Time Reverse Transcription PCR (RT-PCR)

Only for teaching purposes - not for reproduction or sale

Introduction to Add Health GWAS Data Part I. Christy Avery Department of Epidemiology University of North Carolina at Chapel Hill

The Polymerase Chain Reaction. Chapter 6: Background

Population Genetics. If we closely examine the individuals of a population, there is almost always PHENOTYPIC

Exploring Genetic Variation in a Caffeine Metabolism gene LAB TWO: POLYMERASE CHAIN REACTION

Thermo Scientific Equine Genotypes Panel 1.1

QIAGEN Whole Genome Amplification REPLI-g Eliminating Sample Limitations, Potential Use for Reference Material

Lecture Four. Molecular Approaches I: Nucleic Acids

Transcription:

1999 Oxford University Press Human Molecular Genetics, 1999, Vol. 8, No. 5 915 922 DNA pooling identifies QTLs on chromosome 4 for general cognitive ability in children Paul J. Fisher 1, Dragana Turic 1, Nigel M. Williams 1, Peter McGuffin 1,2, Philip Asherson 2, David Ball 2, Ian Craig 2, Thalia Eley 2, Linzy Hill 2, Karen Chorney 3, Michael J. Chorney 3, Camilla P. Benbow 4, David Lubinski 4, Robert Plomin 2 and Michael J. Owen 1, * 1 Department of Psychological Medicine, University of Wales College of Medicine, Cardiff CF4 4XN, UK, 2 Social, Genetic and Developmental Psychiatry Research Centre, Institute of Psychiatry, De Crespigny Park, London SE5 8AF, UK, 3 Department of Microbiology and Immunology, Milton S. Hershey Medical Center, Pennsylvania State University, Hershey, PA 17033, USA and 4 Department of Psychology and Human Development, Vanderbilt University, Nashville, TN 37203, USA Received January 13, 1999; Revised and Accepted February 15, 1999 General cognitive ability (g), which is related to many aspects of brain functioning, is one of the most heritable traits in neuroscience. Similarly to other heritable quantitatively distributed traits, genetic influence on g is likely to be due to the combined action of many genes of small effect [quantitative trait loci (QTLs)], perhaps several on each chromosome. We used DNA pooling for the first time to search a chromosome systematically with a dense map of DNA markers for allelic associations with g. We screened 147 markers on chromosome 4 such that 85% of the chromosome were estimated to be within 1 cm of a marker. Comparing pooled DNA from 51 children of high g and from 51 controls of average g, 11 significant QTL associations emerged. The association with three of these 11 markers (D4S2943, MSX1 and D4S1607) replicated using DNA pooling in independent samples of 50 children of extremely high g and 50 controls. Furthermore, all three associations were confirmed when each individual was genotyped separately (D4S2943, P = 0.00045; MSX1, P = 0.011; D4S1607, P = 0.019). Identifying specific genes responsible for such QTL associations will open new windows in cognitive neuroscience through which to observe pathways between genes and learning and memory. INTRODUCTION Diverse measures of cognitive ability, such as speed of processing, memory and spatial ability, intercorrelate at a modest level, typically 0.20 0.40. This common factor is known as general cognitive ability (g) (1). g is assessed as a total score across diverse cognitive tests as in intelligence (IQ) tests or as an unrotated principal component score that best reflects what is in common among the tests (2). Although g has scarcely entered the lexicon of cognitive neuroscience, genetic research suggests that it may provide an important perspective on brain functions such as learning and memory. It is of considerable interest that variation in g is substantially due to genetic factors. The substantial heritability of g is one of the best documented findings in the behavioural sciences (3). Model-fitting meta-analyses based on dozens of twin and adoption studies estimate that 50% of the total population variance in IQ can be attributed to genetic factors (4,5). A second relevant result comes from multivariate genetic research which analyses the covariance among traits rather than the variance of each trait considered separately. The results of such analyses indicate that the same genetic factors influence different cognitive abilities, which implies that g reflects the genetic foundation for cognitive functioning (6). Genetic research on g will have its greatest impact on cognitive neuroscience when specific genes responsible for its heritability are identified. Heritability of quantitatively distributed traits such as g is likely to be due to multiple genes of varying effect size, called quantitative trait loci (QTLs) (7). Although traditional linkage methods for identifying single-gene effects are not able to identify QTLs of small effect size, allelic association studies do have the statistical power to detect them; for example, by comparing allelic frequencies for a selected group and controls (8,9). The major strength of linkage is that it is systematic in the sense that a few hundred DNA markers can be used to scan the genome. In contrast, because allelic association with a quantitative trait can only be detected if a DNA marker is itself the QTL or very close to it, thousands of DNA markers would need to be genotyped in order to scan the genome. For this reason, allelic association has been used primarily to investigate associations with candidate genes. In our earlier work, we genotyped 100 DNA markers in or near genes involved in brain functioning, primarily neurotransmitters, but no replicated associations with g were found (10). The problem with *To whom correspondence should be addressed. Tel: +44 1222 743058; Fax: +44 1222 746554; Email: owenmj@cardiff.ac.uk

916 Human Molecular Genetics, 1999, Vol. 8, No. 5 Figure 1. Allele image patterns (AIPs) generated by GENOTYPER for D4S1607 for the original control group (middle) and the original high g group (bottom), and their overlaid images (top). The numbers above the AIP represent the peak numbers. The numbers below and to the right of the control and high g AIPs represent peak heights in fluorescence units. AIP was calculated from the overlaid images by measuring the total area that was not shared by the two images irrespective of how many times the curves from the two pools crossed. This was then expressed as a fraction of the total shared and non-shared area (13). such a candidate gene approach is that any of the tens of thousands of genes expressed in the brain could be considered as candidate genes for g. Such association studies can be made more systematic by using a dense map of markers with an inter-marker interval of <1 cm so that no QTL would be more than 0.5 cm from a marker. A first attempt to use a dense map of markers to identify QTL associations with g reported a replicated association for insulin-like growth factor-2 receptor (IGF2R; 11), which has been shown to be especially active in brain regions most involved in learning and memory (12). The problem with this approach is the amount of genotyping required. In order to scan the entire genome at 1 cm intervals, one would need to genotype 3500 microsatellite markers, which would require 700 000 genotypings in a study of 100 high g individuals and 100 controls. We have developed a technique based on DNA pooling which greatly reduces the need for genotyping by pooling DNA from all individuals in each group and comparing the pooled groups, so that only 7000 genotypings are required to scan the genome in the previous example (13). The main purpose of the present study was to provide proof of principle for the use of DNA pooling for systematic, large-scale association studies by using it to search for QTLs for g on an entire chromosome. We therefore sought replicable QTLs for g on chromosome 4 using a three-stage strategy: (i) nominate significant QTLs using pooled DNA samples of high g and control individuals (referred to as original high g and control groups); (ii) replicate these nominated QTLs using pooled DNA from independent samples of high g and control individuals (referred to as replication high g and control groups); and (iii) confirm the results of these replicated QTLs by genotyping each individual separately. RESULTS The samples were restricted to non-hispanic, Caucasian children so that differences in marker allele frequencies between the groups were less likely to be due to ethnic differences. Two samples were obtained that intentionally differed in sampling frame and procedures in order to provide a constructive replication rather than a literal replication (14); i.e. rather than obtaining a large sample using a single measure of g and dividing the sample into original and replication groups, we chose to obtain a replication sample with extremely high g scores in order to increase our power to replicate QTLs of small effect size. Because individuals of such high g are off the scale of standard IQ tests, a different measure was used to select high g individuals for the replication sample. Constructive replication is a conservative procedure that may increase the rate of false-negative results but permits broader generalizations from positive results. The original high g and control samples were selected from children living in a six-county area around Cleveland, OH, who were between 6 and 15 years of age. g was assessed by a widely used IQ test, the Wechsler Intelligence Scale for Children (15). The high g sample included 51 children (mean IQ = 136; SD = 9.3) and the control group included 51 children of average g (mean IQ = 103; SD = 5.6). A replication high g group was obtained from the Study of Mathematically Precocious Youth (SMPY) in the USA, which began in the 1970s as a study of mathematical talent but since the late 1970s put as much emphasis on verbal as mathematical talent (16). The highest-scoring SMPY individuals were selected from the more than one million seventh and eighth graders who performed in the top 3% on a standardized test administered in their schools and were invited to take the Scholastic Aptitude Test (SAT) college entrance exam 4 years early before the age of 13 years. The SAT correlates highly with g and with standard IQ tests in the normal range [e.g. 0.84 for SAT-Math (M) and 0.89 SAT-Verbal (V) corrected for unreliability; 17]; using the SAT at 13 instead of the usual age of 17 makes it possible to estimate IQ scores even though standard IQ tests do not cover scores as high as these. Fifty of the highest-scoring individuals were targeted for the high g replication sample. These participants earned scores of at least SAT-V 630 and SAT-M 630, or SAT-V 550 and SAT-M 700. They were required to have flat SAT profiles in the sense that their SAT-V and SAT-M were required to be within one standard deviation of each other. These participants represent a selection intensity of 1 in 30 000 as indicated by scores four standard deviations above the mean (equivalent to an IQ score of 160) as estimated from their composite (V + M) SAT scores. A replication control group consisting of 50 individuals (mean IQ = 101; SD = 7.2) was selected in the same manner (same geographical area, same age) as the original control group. Informed consent was obtained from all participants. For all subjects, DNA was extracted from permanent cell lines established from blood. Primers for 179 microsatellite markers on chromosome 4 were purchased from MWG-Biotech in Germany. These DNA markers were selected from the LDB summary map (http://cedar.genetics.soton.ac.uk/public_html ) (18; Materials and Methods) with an average interval of 1.2 cm across the chromosome. However, due to uneven distribution of available microsatellite markers, many of them are >1.2 cm apart. For example, there are eight gaps between adjacent markers of >3 cm (the largest gap is 5.5 cm). Genotyping pooled DNA using microsatellite markers Two replicate DNA pools were constructed from individuals in each of the four groups (Materials and Methods). Duplicate PCRs

917 Human Nucleic Molecular Acids Research, Genetics, 1994, 1999, Vol. Vol. 22, No. 8, No. 1 5 917 were conducted for each of the pools. Allele image patterns (AIPs) were generated on an ABI DNA sequencer for each group s four PCR products for each marker (Materials and Methods). The four unmodified AIPs for each group (high g, or control) were overlaid and the consensus AIP was taken to represent the relative allele frequencies of the marker. In order to compare the results of pooled genotyping of the original high g and control groups, we measured the total area that was not shared by the two superimposed consensus allele image patterns and expressed this as a fraction of the total shared and non-shared area according to the method of Daniels et al. (13). This test statistic is called AIP (Fig. 1). Rather than optimizing each primer pair, standard conditions were used for PCR amplification. Using our standard optimizing conditions, 73% (129) of the 179 markers yielded replicable amplification products (where at least three of each group s four replications gave near-identical overlays). A second amplification protocol was attempted for the 50 markers that failed to yield amplimeres in the initial PCR (see the Taq gold procedure in Materials and Methods). Eighteen of these markers yielded replicable amplification products, bringing the total number of scoreable markers to 147. As a result, 65% of the chromosome was covered at the 1 cm grid (representing 0.5 cm on each side of the marker) or, alternatively, 85% of chromosome 4 was covered at the 2 cm grid level of resolution (within 1 cm of a scored marker). The AIP was calculated for each marker (Table 1). An example of overlaid and non-overlaid control and high g AIPs is shown for marker D4S1607 (Fig. 1) with a AIP of 0.22 (Table 1). Table 1. A sample of 147 chromosome 4 markers and their AIPs for the high g and control groups in the original sample Marker AIP Marker AIP Marker AIP Marker AIP D4S3038 0.15 D4S2912 0.04 D4S1534 0.05 D4S3008 0.07 D4S127 0.04 D4S3027 a 0.28 D4S2460 0.16 D4S1549 0.09 D4S1614 0.10 D4S2955 0.13 D4S1544 0.02 D4S1588 0.08 D4S3034 0.02 D4S3001 a 0.25 D4S3006 0.12 D4S2999 0.07 D4S412 0.04 D4S2995 0.02 D4S3037 0.03 D4S3016 0.16 D4S2957 0.03 D4S1587 0.07 D4S423 0.29 D4S2980 0.05 MSX1 a 0.21 D4S2950 0.06 D4S2407 0.18 D4S3033 0.16 D4S3023 0.10 D4S405 0.22 D4S1559 a 0.23 D4S3046 0.09 D1S503 0.08 D4S2919 0.00 D4S2973 0.05 D4S2952 0.09 D4S431 0.08 259 0.21 D4S1560 0.12 D4S1636 0.15 D4S2935 0.07 D4S174 0.17 D4S2986 a 0.24 D4S1566 0.04 D4S394 0.07 D4S1547 0.16 D4S1591 0.07 D4S1502 0.07 D4S2923 0.05 D4S1536 0.10 D4S2961 0.07 D4S2910 0.10 D4S2928 0.09 D4S2971 0.10 D4S1570 0.10 D4S243 0.15 DRD5 0.19 D4S3002 0.05 D4S3026 0.05 D4S1545 0.05 D4S3009 0.20 D4S1577 0.06 D4S2917 0.13 D4S1617 0.15 D4S1582 0.13 D4S2996 0.07 D4S2940 0.12 D4S622 0.02 D4S2906 0.13 D4S428 0.14 D4S1580 0.03 D4S2977 0.13 D4S2944 0.06 D4S2916 0.12 D4S191 0.16 D4S3030 0.01 D4S1602 0.09 D4S3000 0.20 D4S2392 0.10 D4S1529 0.22 D4S1511 0.13 D4S1592 0.13 D4S1612 0.06 D4S2967 a 0.22 D4S3048 0.10 D4S1518 0.12 D4S430 0.24 D4S1607 a 0.22 D4S1567 0.17 D4S1569 0.14 D4S3024 0.17 D4S3015 0.10 D4S2926 0.05 D4S1600 0.05 D4S1524 a 0.36 D4S2951 0.17 D4S419 0.11 D4S3004 0.09 D4S1527 0.06 D4S3041 0.06 D4S3020 0.19 D4S1541 0.06 D4S1615 0.07 D4S2920 0.11 D4S3017 0.29 D4S1568 0.09 D4S2938 0.06 D4S2943 a 0.20 D4S2953 0.14 D4S392 0.14 D4S2286 0.12 D4S1554 0.22 D4S2933 0.05 D4S2931 0.11 D4S3039 0.08 D4S2954 0.05 D4S1590 0.07 D4S1517 0.10 D4S422 0.23 D4S1535 0.12 D4S1551 0.15 D4S2990 0.02 D4S1576 0.11 D4S408 0.17 D4S3044 0.13 D4S2947 0.18 D4S175 0.25 D4S171 0.18 D4S391 0.16 D4S2361 0.12 D4S397 0.08 D4S1540 0.13 D4S1609 0.02 D4S2964 0.24 D4S1565 a 0.21 D4S426 0.14 D4S418 0.17 D4S2922 0.15 D4S1561 0.09 D4S2921 a 0.24 D4S1618 0.01 D4S2932 0.07 D4S424 0.10 D4S2975 0.23 D4S2408 0.15 D4S1538 0.08 D4S2998 0.13 The 147 markers and their respective AIPs are listed according to their position on the genetic map (see text). The upper left entry represents the marker closest to the top of the telomere of 4p, and the lower right entry represents the marker closest to the bottom of the telomere of 4q. AIP was calculated as detailed in the text. a P < 0.05.

918 Human Molecular Genetics, 1999, Vol. 8, No. 5 Table 2. Specific alleles from markers with significant AIP values in the original pools were tested for significance in the replication pools Marker Original pool Original pool Replication pool AIP P-value AST-χ 2 P-value AST-χ 2 P-value D4S2943 0.20 0.03 3.25 0.07 2.62 <0.05 D4S1565 0.21 0.03 1.52 0.22 0.11 0.37 MSX1 0.21 0.02 4.68 0.03 3.04 0.04 D4S1607 0.22 <0.05 6.34 0.02 3.29 0.03 D4S2967 0.22 0.01 1.78 0.18 0.60 0.22 D4S1559 0.23 0.01 4.50 0.03 N/A N/A D4S2921 0.24 0.02 4.61 0.03 0.47 0.25 D4S2986 0.24 <0.05 1.19 0.28 0.40 0.26 D4S3001 0.25 0.02 4.26 0.04 0.93 0.17 D4S3027 0.28 0.01 4.51 0.03 N/A N/A D4S1524 0.36 0.01 3.86 0.05 N/A N/A This table shows the AIP and P-values for the 11 markers where AIPs were significant on pooled analysis. The allele from each of these 11 markers showing the greatest frequency difference in the original sample was identified and tested using an allele-specific χ 2 test (AST) as described in the text. The χ 2 and P-values are given for the most significant allele (columns 4 and 5). Each of these alleles was then tested in the replication pools. The χ 2 and P-values are given in columns 6 and 7. Significant allele-specific values (P < 0.05) are in bold. The actual P-values for the D4S1607 and D4S2986 markers original pool AIPs were just <0.05, but are shown as 0.05 in this table after rounding up. N/A, not applicable, because the association between the tested high g and control peak of the original group changed direction in the replication group, i.e. a positive association with high g in the original group was negatively associated with high g in the replication group or vice versa. Table 3. P-values of individual genotypings for three markers in the original (O) and replication (R) samples Marker P-value Specifically P-value P-value P-value (CLUMP O ) tested allele (χ 2 O) (χ 2 R) (CLUMP total ) D4S1607 0.049 Allele 6 0.006 0.026 0.019 D4S2943 0.010 Allele 5 0.024 0.012 0.00045 MSX1 0.12 Allele 3 0.028 0.031 0.011 CLUMP analysis of the original high g and control samples revealed two significant associations at the 0.05 level (for markers D4S1607 and D4S2943). A negative association between D4S1607 allele 6 and high g was found in the original sample (column 4) and was tested for significance in the replication group (see text and column 5). A positive association between D4S2943 allele 5 and high g was found in the original sample, as was a positive association between MSX1 allele 3 and high g. These positive associations were tested for significance in the replication group (see text and column 5). The CLUMP total analysis included individuals from both original and replication populations (see text and column six). We used a three-stage strategy that provides a better balance between false-positive and false-negative errors by permitting a lenient significance level (P < 0.05) in the first stage (which reduces false negatives but increases false positives) and then removing false positives in the second stage. In the first stage, AIPs were compared for the 147 markers for DNA pooled from the original group of children of high g and the group of controls of average g. In the second stage, markers that yielded significant AIPs in the first stage were tested using DNA pooling in an independent sample of children of extremely high g and an independent control group. In the third stage, markers that yielded significant (P < 0.05) differences for DNA pools in both the original and replication samples were genotyped individually for all subjects in order to confirm the results of DNA pooling using traditional methods. For pooled comparison of the original high g and control pools, markers were tested for significance using the simulation program described by Daniels et al. (13) and in Materials and Methods. The simulation program estimates the significance of the AIP for a particular marker. Eleven markers showed significant (P < 0.05) AIPs between the high g and control group (Table 2). More markers were observed that showed a significant AIP (11) than would be expected by chance (7.4) given the lenient criterion of P < 0.05 which does not correct for multiple testing (Discussion). These 11 markers were selected for pooled genotyping in the replication sample. For each marker, the individual peak (allele) showing the greatest difference in the original high g and control groups was identified (see below). The replication sample was then used to test this allele-specific hypothesis for the 11 markers; i.e. rather than accepting any significant pattern of allelic differences in the replication sample, we required that the same allele yielded a significant difference in the replication sample. We also required that the replication sample yielded an allelic difference in the same direction as in the original sample; for this reason, we used a one-tailed test of significance in the replication sample. The strength of the multi-stage replication design is that, of the 11 markers significant at P < 0.05 in the original sample, none (i.e. 0.6) would be expected to be significant by chance alone in the replication sample with P < 0.05. The allele-specific test was significant and in the same direction for three markers (D4S2943, MSX1 and D4S1607) in the replication sample. These were then genotyped separately for all individuals in order to confirm the results of DNA pooling. Genotyping individuals Each individual was genotyped separately for the three markers that showed significant differences between high g and control groups in both original and replication samples on pooled DNA analyses. In the original sample, the CLUMP program was used to determine whether the overall frequencies of the control individuals alleles were significantly different from those of the high g individuals using Monte Carlo simulations (19). The analysis (Table 3) revealed significant differences for markers D4S2943 (P = 0.01) and D4S1607 (P = 0.049), whereas MSX1 did not reach statistical significance (P = 0.12). For each of the three markers, the allele-specific hypothesis (allele 5 of D4S2943, allele 3 of MSX1 and allele 6 of D4S1607) was also tested for significance in the original groups using Pearson χ 2 analysis (Table 3). For all three markers, the difference between the high g and control groups was significant at the 0.05 level. More importantly, the allele-specific hypothesis was also significant for all three markers in the replication samples, with P-values of 0.012, 0.031 and 0.026, respectively, using a one-tailed Pearson χ 2 test (Table 3). Finally, combined analysis of individual genotyping data from both original and replication samples using CLUMP revealed significant differences between

919 Human Nucleic Molecular Acids Research, Genetics, 1994, 1999, Vol. Vol. 22, No. 8, No. 1 5 919 high g and control groups for the three markers (D4S2943, P = 0.00045; MSX1, P = 0.011; D4S1607, P = 0.019). Allele counts for D4S2943, MSX1 and D4S1607 are shown in Figure 2 for the original and replication samples as well as for the combined populations. DISCUSSION Systematic screening of the genome for allelic association requires genotyping a dense map of markers, which is facilitated by DNA pooling. Application of DNA pooling to 147 markers on chromosome 4, a larger than average chromosome, yielded three alleles (one from each of markers D4S2943, MSX1 and D4S1607) that showed significant associations with g in both original and replication samples. These DNA pooling results were confirmed when each individual was genotyped separately and the data analysed by standard statistical procedures. In addition to applying DNA pooling in a systematic approach to allelic association using a dense map of markers, another novel aspect of the present study is its use of a multi-stage replication strategy with more lenient criteria in the first stage in an attempt to strike a balance between false-positive and false-negative results. In view of the large number of markers studied, we only accepted the presence of marker QTL associations when there was an accumulation of evidence in their favour (20,21) including the following: (i) an overall excess of statistically significant results at P < 0.05 over the number expected by chance alone; (ii) concentration of significant results in a few markers; (iii) replications of significant results in a second independent sample; and (iv) concordance between the original and replication samples with respect to the associated allele and the direction of association. This issue of what is an acceptable level of significance in studies of this kind is complex, with divergent views being expressed (8). However, we feel that our approach of using a multi-stage design with built-in replication offers the best balance between type I and type II error, especially in the quest for QTLs of small effect size. It should be emphasized that allelic association using DNA pooling to screen a dense map of markers can only identify some, but not all, QTLs. For example, it will only detect old mutations that have perfused through many generations. Haplotype analyses of even denser maps would increase the likelihood of detecting other QTLs, leading eventually to saturation mapping of all functional polymorphisms (8). In addition, greater power to detect QTLs of smaller effect size can be obtained by increasing the sample sizes or by selecting even more extreme samples. Although it is reasonable to screen for old and relatively large QTLs because these are likely to be most useful in terms of understanding links between genes, brain and cognitive functioning, by no means does the approach exclude all other QTLs. Encouraged by these first results of the application of DNA pooling for a systematic analysis of allelic association screening a dense map of markers, we are proceeding with a scan of 3500 markers across the genome to find other QTLs for g. Although none of these QTLs is expected to account for a large amount of the variance, we expect that a systematic genome scan will yield QTLs that together account for a substantial portion of the genetic variance for g. We are doubling our sample sizes in order to increase the power to detect QTLs of even smaller effect size. We are also obtaining DNA from parents of the high g subjects. These parental data will provide a within-family replication to test our hypothesis that by limiting the sample to non-hispanic Caucasian individuals we have attenuated the possibility that QTL associations are due to ethnic differences in marker allele frequencies. As well as confirming associations with information from parents and larger sample sizes, we intend to test more markers in the close vicinity (within 1 cm) of those that yielded positive results. Finally, we will target genes in close proximity to our associated markers in an attempt to find the genes specifically responsible for such associations. Identifying replicable QTLs associated with g will make it possible to address questions about development, multivariate analysis and gene environment interplay through the use of measured genotypes rather than indirect inferences about heritable influence based on familial resemblance (22). In terms of developmental questions, the present samples are children, which raises the question of whether similar results will be found with adult samples. Quantitative genetic research shows that the heritability of g increases linearly throughout development (23), which suggests that QTL associations for g may be stronger later in life. Multivariate questions include whether QTLs found by selecting extremes will also be correlated with the normal range of variation as predicted by QTL theory. Also, QTLs for g assessed using standard psychometric tests may be associated with other types of behavioural tests such as informationprocessing tests and with brain-imaging measures of brain structure and function. QTLs for g will provide discrete windows through which to view pathways in the brain between genes and learning and memory. As is the case with most important advances, identifying genes for g will also raise new ethical issues. These concerns must be taken seriously but they are largely based on misconceptions about genetic research on complex traits that are influenced by multiple genes as well as multiple environmental factors (24,25). As well as having implications for research into cognitive ability, our results suggest that DNA pooling can be used to detect group differences in allele frequencies and is thus of potential importance in large-scale genome scanning for allelic association. MATERIALS AND METHODS Pooling of DNA for groups Genomic DNA was extracted from permanent cell lines that were derived from lymphocytes using a standard protocol. Each individual DNA sample was diluted to 8 ng/µl. DNA quantification prior to pooling was performed in triplicate using the PicoGreen fluorescent assay and a Fluoroskan Ascent fluorometer. Two sets of pools, original and replication, were constructed. Each set consisted of four separately prepared pools two from the control groups of average g and two from high g individuals. Primer selection Markers containing di-, tri- and tetranucleotide repeats were selected from the Location Database (LDB) composite map (18; http://cedar.genetics.soton.ac.uk/public_html/ ). Ideally, the markers were 1 cm apart, had between five and nine alleles, heterozygosity scores between 0.5 and 0.9 and their genetic map order was confirmed with physical map data. However, due to a scarcity of markers in some regions, less than ideal (but still

920 Human Molecular Genetics, 1999, Vol. 8, No. 5 Figure 2. Allelic counts for individual genotyping of markers MSX1, D4S1607 and D4S2943 for high g and control groups in the original, replication and combined samples. Control individuals are represented by black bars, and high g individuals are represented by white bars. Note that allele 5 of D4S2943 corresponds to peak 4 of the AIP for the original group (see Fig. 1). Small differences in total number of alleles between markers reflect failed genotypes.

921 Human Nucleic Molecular Acids Research, Genetics, 1994, 1999, Vol. Vol. 22, No. 8, No. 1 5 921 informative) markers were chosen. Marker positions initially were determined using an average of the LDB male and female genetic map values assuming that the multiple recombinational events that occurred after any marker QTL associations were independent of gender. In rare cases, where genetic and physical map orders of a marker relative to its neighbours disagreed, the physical map order was taken, and a new approximate genetic map distance (in cm) was estimated. Amplification of pooled DNA samples Touchdown PCR (26) was carried out to amplify pooled DNA. Each of the two replicate pools from the original set of high g and control groups was amplified in duplicate, resulting in eight PCR products. For the markers that gave significant AIP values, amplification was performed on the replication set of DNA pools using the same PCR protocol. Each PCR contained the following reagents: 48 ng of pooled genomic DNA, dntps (1.2 mm each), 1 Taq DNA polymerase buffer (Qiagen, Crawley, UK; with 1.5 mm MgCl 2 ), Taq DNA polymerase (Qiagen; 0.6 U), 1.4 pmol of each primer and water to 12 µl. The DNA was denatured initially at 94 C for 5 min, followed by five cycles at 94 C (30 s), [56 C for the first cycle, then subtract 1 C per cycle] (30 s), 72 C (30 s), and 28 cycles at 94 C (30 s), 50 C (30 s), 72 C (30 s). A final extension was performed at 72 C (10 min). Note that the annealing temperatures for the above PCRs were reduced by 10 C when primer pair IFNG was used. PCR using Taq Gold (Perkin-Elmer, Norwalk, CT) included the same reagents as above except that Qiagen buffer and enzyme were replaced with 10 Taq Gold buffer, MgCl 2 solution (to 2.5 mm final concentration) and Taq Gold polymerase. The cycling parameters were as follows: 1 cycle at 95 C for 10 min, followed by 35 cycles at 95 C (45 s), 50 C (45 s), and a final cycle at 50 C for 10 min. Elimination of A-overhang from PCR products Klenow fragment (United States Biochemical, Amersham, UK) was used to eliminate A-overhangs from PCR products. The PCR products that were to be electrophoresed in the same gel lane (see below) were mixed prior to Klenow treatment. Klenow (0.25 U) was added to 12.5 µl of mixed PCR product, 1 µl of water and 1.5 µl of reaction buffer (United States Biochemical). The mixture was incubated at 30 C for 1 h. Gel electrophoresis Up to four fluorescently labelled (Hex, Fam or Tet) markers were electrophoresed in each gel lane. Only markers whose products did not overlap (regardless of which dye they contained) were put in the same lane. Due to differences in intensity of the three dyes, Fam-labelled markers were diluted 10-fold, Tet-labelled markers were diluted 5-fold and Hex-labelled markers were diluted 3-fold. Diluted pools of PCR products (1.5 µl) were mixed with loading dye (1.5 µl) and a GS350 size ladder (0.5 µl; Perkin- Elmer). These mixes were loaded, typically on denaturing gels, containing at least 8% acrylamide, and run at 12 W on ABI 373A sequencers. AIP analysis of PCR products from pooled samples ABI 373A image patterns of the pooled PCR products were overlaid using GENOTYPER software, imported into DeBabelizer and AIPs calculated as described (13). The statistical significance of a AIP depends upon the number of marker alleles and their frequency (13). We therefore obtained an estimate of the P-value by simulating case and control samples from the population with allele frequencies estimated from the peak heights of the control sample as described (13). Allele-specific analysis of pooled samples The height of the peak for both groups was converted to a ratio of the total of all the AIP peak heights so that the score represented the number of alleles in that peak. This procedure can be illustrated using the AIPs from the original group pools of D4S1607 shown in Figure 1. In this case, the sixth peak (from left to right) has a height of 811 in the control group and 461 in the high g group. The proportion of peak 4 in the control group is 811/(230 + 1037 + 80 + 61 + 231 + 811 + 105 + 230 + 269 + 394 + 88 + 79) 100 (the number of alleles in the pooled sample) = 22.4. Pearson χ 2 with a 2 2 contingency table was used to compare the high g and control values. For D4S1607 peak 6, a two-tailed Pearson χ 2 analysis gave a χ 2 -value of 5.91 with a P-value of 0.02. Individual genotyping PCR was performed using the same protocol as described for pooled DNA, except that only 30 ng of DNA was used per PCR. For the original sample, the significance of differences between the control and high g individuals was determined for the overall pattern of allelic differences using the CLUMP program which is based on Monte Carlo simulations (19). The significance of the allele showing the greatest differences in the original sample was tested using Pearson χ 2 comparing the frequency of that allele against all other alleles. ACKNOWLEDGEMENTS The research is supported by a grant from the US National Institute of Child Health and Human Development (HD27694). REFERENCES 1. Jensen, A.R. (1998) The g Factor: The Science of Mental Ability. Praeger, London. 2. Brody, N. (1992) Intelligence. 2nd edn. Academic Press, New York. 3. Plomin, R., DeFries, J.C., McClearn, G.E. and Rutter, M. (1997) Behavioral Genetics. 3rd edn. W.H. Freeman, New York. 4. Chipuer, H.M., Rovine, M.J. and Plomin, R. (1990) LISREL modeling: genetic and environmental influences on IQ revisited. Intelligence, 14, 11 29. 5. Devlin, B., Daniels, M. and Roeder, K. (1997) The heritability of IQ. Nature, 388, 468 471. 6. Plomin, R. and Petrill, S.A. (1997) Genetics and intelligence: what s new? Intelligence, 24, 53 77. 7. Plomin, R., Owen, M.J. and McGuffin, P. (1994) The genetic basis of complex human behaviors. Science, 264, 1733 1739. 8. Risch, N. and Merikangas, K. (1996) The future of genetic studies of complex human diseases. Science, 273, 1516 1517. 9. Risch, N. and Teng, J. (1998) The relative power of family-based and case control designs for linkage disequilibrium studies of complex human diseases. I. DNA pooling. Genome Res., 8, 1273 1288.

922 Human Molecular Genetics, 1999, Vol. 8, No. 5 10. Plomin, R., McClearn, G.E., Smith, D.L., Skuder, P., Vignetti, S., Chorney, M.J., Chorney, K., Kasarda, S., Thompson, L.A., Detterman, D.K., Petrill, S.A., Daniels, J., Owen, M.J. and McGuffin, P. (1995) Allelic associations between 100 DNA markers and high versus low IQ. Intelligence, 21, 31 48. 11. Chorney, M.J., Chorney, K., Seese, N., Owen, M.J., McGuffin, P., Daniels, J., Thompson, L.A., Detterman, D.K., Benbow, C.P., Lubinski, D., Eley, T.C. and Plomin, R. (1998) A quantitative trait locus (QTL) associated with cognitive ability in children. Psychol. Sci., 9, 159 166. 12. Wickelgren, I. (1998) Tracking insulin to the mind. Science, 280, 517 519. 13. Daniels, J., Holmans, P., Williams, N., Turic, D., McGuffin, P., Plomin, R. and Owen, M.J. (1998) A simple method for analyzing microsatellite allele image patterns generated from DNA pools and its application to allelic association studies. Am. J. Hum. Genet., 62, 1189 1197. 14. Lykken, D. (1968) Statistical significance in psychological research. Psychol. Bull., 70, 151 159. 15. Wechsler, D. (1974) Wechsler Intelligence Scale for Children Revised. The Psychological Corporation, New York. 16. Lubinski, D. and Benbow, C.P. (1994) The Study of Mathematically Precocious Youth (SMPY): the first three decades of a planned 50-year study of intellectual talent. In Subotnik, R. and Arnold, K. (eds), Beyond Terman: Longitudinal Studies in Contemporary Gifted Education. Ablex, Norwood, NJ, pp. 255 281. 17. Brodnick, R.J. and Ree, M.J. (1995) A structural model of academic performance, socioeconomic status, and Spearman s g. Educ. Psychol. Measurement, 55, 583 594. 18. Collins, A., Frezal, J., Teague, J. and Morton, N.E. (1996) A metric map of humans: 23,500 loci in 850 bands. Proc. Natl Acad. Sci. USA, 93, 14771 14775. 19. Sham, P.C. and Curtis, D. (1995) Monte Carlo tests for associations between disease and alleles at highly polymorphic loci. Ann. Hum. Genet., 59, 97 105. 20. Lander, E. and Kruglyak, L. (1995) Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nature Genet., 11, 241 247. 21. Lipkin, E., Mosig, M.O., Darvasi, A., Ezra, E., Shalom, A., Friedmann, A. and Soller, M. (1998) Quantitative trait locus mapping in dairy cattle by means of selective milk DNA pooling using dinucleotide microsatellite markers: analysis of milk protein percentage. Genetics, 149, 1557 1567. 22. Plomin, R. and Rutter, M. (1999) Child development, molecular genetics, and what to do with genes once they are found. Child Dev., in press. 23. McGue, M., Bouchard, T.J.Jr, Iacono, W.G. and Lykken, D.T. (1993) Behavioural genetics of cognitive ability: a life-span perspective. In Plomin, R. and McClearn, G.E. (eds), Nature, Nurture and Pschology. American Psychological Association., Washington, DC, pp. 59 76. 24. Rutter, M. and Plomin, R. (1997) Opportunities for psychiatry from genetic findings. Br. J. Psychiatry, 171, 209 219. 25. Sherman, S.L., DeFries, J.C., Gottesman, I.I., Loehlin, J.C., Meyer, J.M., Pelias, M.Z., Rice, J. and Waldman, I. (1997) Behavioral Genetics 97: ASHG Statement. Recent developments in human behavioral genetics: past accomplishments and future directions. Am. J. Hum. Genet., 60, 1265 1275. 26. Rithidech, K.N., Dunn, J.J. and Gordon, C.R. (1997) Combining multiplex and touchdown PCR to screen murine microsatellite polymorphisms. BioTechniques, 23, 36 44.