Detecting copy number variants and runs of homozygosity on a single array challenges and applications

Size: px
Start display at page:

Download "Detecting copy number variants and runs of homozygosity on a single array challenges and applications"

Transcription

1 Detecting copy number variants and runs of homozygosity on a single array challenges and applications Douglas Hurd and Ruth Burton Abstract In constitutional genetics research, analysis of single nucleotide polymorphisms (SNPs) provides invaluable insight into a number of conditions. When analysed in conjunction with copy number variation (CNV) data from array comparative genomic hybridisation (acgh) arrays, this insight can aid in the identification of additional genetic variants to those yielded by the CNV data alone. Protocols for high-resolution SNP arrays can be time consuming whereas acgh array protocols are less laborious, and as the gold-standard for CNV detection, well established in laboratory workflows. Recent advances have made it possible to combine CNV probes with probes able to detect SNPs on a single acgh+snp array, affording the benefits of shorter processing time and dual data with easy integration into the workflow. Although these combined arrays do not have the resolution capabilities of traditional SNP platforms, they have been research-validated to provide informative SNP data for various genetic aberrations such as uniparental disomy (UDP), mosaic aneuploidy and runs of homozygosity (ROH), without compromising on high quality CNV data. Researchers commonly report biologically relevant SNP data at lower resolutions and indeed the argument exists that increased resolution does not necessarily equal an increase in informative data. This review explores the various applications of combined arrays, the challenges faced in their implementation and their many advantages such as the easy to interpret, flexible data they provide. Introduction Identifying DNA variants that contribute to a disease or syndrome is a key objective in human genetics. Copy number variants (CNVs) and other forms of structural variation are important in understanding the underlying mechanisms to many common diseases. CNVs are defined as chromosomal segments, at least 1000 bases in length that vary in copy number (CN) between individuals 1. A second major contributor to human variation is at the resolution of a single base. Single nucleotide polymorphisms (SNPs) are genome positions at which there are two distinct alleles each of which appear at high frequency in the population. Array comparative genomic hybridisation (acgh) is the gold-standard for detecting CNV 2 ; however, until recently it was not possible to combine the long 60- mer oligonucleotide probes used for CNV detection with probes able to detect SNPs. This review highlights the importance of combined copy number (CN) and SNP platforms in constitutional genetics research and describes the advantages of using such long oligonucleotide acgh arrays over short oligonucleotide SNP genotyping platforms. The primary considerations when selecting an array platform are typically integration into existing workflows and the resolution of the array. Array resolution is particularly important when studying uniparental disomy (UPD) and consanguinity. UPD is the presence of a homologous chromosome pair derived from only one parent. The absence of any heterozygous SNPs over an entire chromosome is a clear indication of UPD. Smaller runs of homozygosity (ROH) are common in offspring of consanguineous relationships; these vary considerably in size and frequency. Workflow acgh not only delivers the highest quality CNV data 2 but also provides a more streamlined and rapid workflow when compared to SNP-based array platforms (Figure 1). This is particularly useful for high-throughput research laboratories that require fast access to results. One of the challenges in combining CN and SNP content is the selection of probes that reliably detect and discriminate between SNP alleles while working under hybridisation conditions developed for CN detection; however, using the standard acgh protocol greatly reduces total and hands-on time in the lab.

2 Figure 1: A comparison of two typical array processing workflows. The acgh +SNP workflow offers considerable time savings when compared to a typical SNP genotyping platform. A: The CGH +SNP protocol as used by the OGT CytoSure arrays and B: A typical protocol for a SNP genotyping platform. Applications There are three distinct uses for SNP probes in constitutional genetics research: Aiding in the identification of mosaic aneuploidy and chimerism Identification of UPD by the detection of runs of homozygosity (ROH) Identification of ROH by inheritance by descent and consanguinity Mosaic aneuploidy and chimerism Mosaic aneuploidy can be detected in a normal acgh experiment; however, the B-allele frequency (BAF) of SNP probes can help in the identification of mosaicism 3,4 as the distribution of homozygous and heterozygous SNPs can reinforce the subtle changes in CN that occur in mosaic samples. The BAF generated using SNP probes can, (in addition), help to determine if chimerism is present 3,4. An advantage of using a combined CGH and SNP platform is that complex conditions like mosaicism and chimerism can be studied (Figure 2). ROH in outbred populations It is now well known that individuals in many different population groups have ROH in their genome. The natural frequency and size these ROH in normal outbred populations has been well studied. It is important to consider this when choosing a SNP detection platform, particularly if the goal of the study is to report biologically relevant ROH as well as changes in CN. In normal European populations, ROH covering on average 93Mb (1.5%) of DNA were present throughout the genome. The ROH can be up to 4Mb in length 5 and were found in populations from all parts of Europe with the average number of ROH in a person being approximately 40 with a median length of approximately 1.25 Mb 6. Similar ROH have been reported in other outbred populations. For example, in a Chinese population the size of the ROH varied from 2.94 to Mb in length 7. Using HapMap samples, obtained from a diverse population set, DNA obtained from CEPH Utah residents were found to have a mean of

3 77% 94% Figure 2: An example of a mosaic deletion of 20q analysed using CytoSure Interpret Software. The top panel displays the CN probes, in blue and the bottom panel the SNP probes in black and red. The SNP probes are displayed in a BAF plot which clearly shows the mosaic region. The values of 77% and 94% indicate the percentage of cells containing that aberration. Mosaicism is also shown by the CN probes by a shift in the average log ratio away from zero*. 8.3 LOH regions with the maximum region being 6.48 Mb in length. Meanwhile samples from Japanese residents of Tokyo had an average of 8.4 regions with a maximum of Mb length 8. Finally, a large study of a diverse population set reported by Kirin et al (2010) showed that many other populations also contain ROH 9. However, a ROH of over 10 Mb is considered very rare in cosmopolitan populations 9. All ROH have the potential to cause an autosomal recessive disease. However, it is the excessively long ROH that are likely to greatly increase the chance of a discernible phenotype. Long ROH are most commonly caused by UPD, but can also be due to consanguinity or shared parental ancestry 9. A recent report 10 found that the definition of ancestral ROH varied between laboratories but included definitions such as the presence of ROH on a few chromosomes and 1 Mb blocks and higher of ROH. Uniparental disomy Uniparental disomy occurs when both copies of a chromosome are inherited from a single parent. If only parts of a chromosome are inherited this is called segmental UPD. It is possible to inherit two copies of the same chromosome, which is known as isodisomy. With isodisomy, regions of LOH are seen. When two chromosomes from the same parent are inherited, this is known as heterodisomy. Detection of UPD has largely been performed through screening DNA using microsatellite markers. Other methods of UPD detection rely on identifying imprinted genes through changes in methylation patterns. Both approaches are time consuming and challenging. It is not possible to detect UPD using a traditional CGH array as there are no changes in CN, so a platform containing SNP probes must be used. To distinguish between isodisomy and heterodisomy it is necessary to analyse the inheritance of the ROH. It is important to be able to distinguish between isodisomy and heterodisomy when studying UPD and recessive diseases. Unless the mutated gene is carried by both parents, uniparental isodisomy is a prerequisite for a recessive disease to occur. It is important to study UPD using a combined CN and SNP platform because UPD is often associated with chromosomal aberrations. Interestingly it is not UPD which causes the phenotype per se 11 but the aberration. There are several well-known constitutional diseases that can arise due to UPD, typically by affecting Chromosome Maternally inherited chr6 Maternally inherited (in 5% of cases) ch7 Paternally inherited chr11 Maternally inherited chr14 Maternally inherited (in 25% of cases) chr15 Paternally inherited (in 2-3% of cases) chr15 Syndrome Transient neonatal diabetes Silver Russell syndrome Beckwith-Wiedemann Temple syndrome Prader Willi Angelman syndrome Table 1: Common imprinting syndromes

4 imprinting 11. The most common imprinting syndromes are shown in Table 1. been shown that the ROH are present throughout the genome 17. Typically the type of the UPD in these syndromes is either whole chromosome or segmental isodisomy or a combination of segmental heterodisomy and isodisomy caused by meiotic recombination events. The segments are typically very large, exceeding well over 10 Mb 12, 13. In cases of Beckwith- Wiedemann, paternally inherited, segmental isodisomy of chromosome 11 is always seen; however, the size of the segments varies. In a study by Cooper et al (2007), the sizes of the segments were shown to vary from less than 3 Mb to whole chromosome UPD, with the majority of samples having segments of greater than 17 Mb. From this study the critical regions could be narrowed down to between Mb 14. An example of whole chromosome UPD on chromosome 6 is shown in Figure 3. Consanguinity In clinical genetics, consanguinity is defined as the union of individuals related as second cousins or closer and it is estimated that such couples account for 10.4% of the world s population 15. As discussed above in a normal outbred population ROH are short and are typically under 5 Mb. Consanguinity samples however have a significantly increased number and size of ROH exceeding 10 Mb 16. This therefore increases the chance of homozygosity for recessive mutations. It is estimated that the offspring of first cousins has an increased risk of % of congenital malformations. It has The number and size of ROH in offspring of consanguineous unions depends on the degree of parental relatedness 16, 17 and can theoretically vary from 25% of the genome (800 Mb) for first degree relatives to 1.56% of the genome (50 Mb) for fifth degree relatives. It has been suggested that the actual ROH might be larger than predicated by the theoretical calculations. A study by Woods et al (2006) showed that an offspring of a first cousin union had ROH covering 11% of the genome; the theoretical calculations predicted that this should only be 6.25% 18. An example of a consanguineous sample is shown in Figure 4 showing multiple long ROH across the genome. Challenges of identifying biologically significant ROH Identification of homozygosity can be useful for understanding underlying disease mechanisms. As discussed above, normal outbred populations rarely have ROH above 10 Mb but commonly have smaller ROH 9 (Kirin et al, 2010), occurring across all populations and are termed ancestral ROH. Although the detection of ROH is useful it raises complex legal and ethical issues and it is important to be able to distinguish between naturally occurring ancestral ROH and ROH that is biologically relevant. To detect biologically relevant ROH it is necessary to use a cut-off value to exclude ancestral ROH. There is conflicting evidence in the literature regarding Figure 3: A BAF plot showing the distribution of the individual SNP probes analysed using CytoSure Interpret Software. Shown here is an example of whole chromosome UPD so the majority of the probes have a BAF value of 1. The lefthand graph shows the overall percentage of homozygous probes for all the chromosomes, here chromosome 6 is selected and this is highlighted in red. The centre dial gives the percentage of homozygous SNP probes for the whole chromosome which is 95%. The right-hand table details the ROH. In this example there are two continuous ROH, one on the p-arm containing 155 SNPs and a second on the q-arm containing 240 SNPs. The score reflects the quality of the ROH, with a higher score indicating increased quality.

5 Figure 4: A consanguineous sample on an OGT CytoSure ISCA +SNP array analysed using CytoSure Intrepret Software. ROH are indicated by the red solid bars to the left-hand side of the chromosome ideograms. The bright red blocks to the right-hand side of the ideograms indicate deletions and the green blocks amplifications*. what value should be used, these are summarized in Table 2. The variation in cut-off values reported in the literature is reflected in research laboratories reporting policy. A recent study 10 found that each laboratory made its own decision regarding the cutoff value for classifying biologically relevant ROH. These values ranged from 10 Mb to 5 Mb. In some laboratories the total percentage of homozygosity across the genome was considered, whereas in other laboratories the frequency of ROH was considered to be important. Overall there was considerable variability in what was considered biologically relevant and highlighted the need for the introduction of guidelines to standardise the process. Frequency of biologically significant ROH There are few reports on the frequency of ROH and UPD found in samples typically analysed by cytogenetics research laboratories and it is interesting to consider whether using a combined CN and SNP array could increase the discovery of biologically relevant ROH. Approximately 80% of developmental disorder samples of unknown cause have a normal result when a traditional acgh platform is used. It is estimated that the frequency of UPD in newborns is approximately 1 in 3,500 with not all UPDs causing a phenotypic effect. Around 1,100 cases of whole chromosome UPD and approximately 120 reports on segmental UPD have been described in the literature 11. In a large study by Papenhausen et al 20 where 13,000 samples were tested, 92 samples were found to have ROH greater than 13.5 Mb on single chromosome or multiple ROH amounting to 15 Mb over two chromosomes. These samples were suspected to have UPD. From studying the inheritance patterns of the ROH, where available, there was an even mix of complete isodisomy and heterodisomy combined with isodisomy. The ROH varied in size from 13.5 Mb to Study Kearney et al 19 Conlin et al 3 Sund et al 16 Papenhausen et al 20 Bruno et al 4 ROH Threshold Suggested a conservative clinical threshold of between 3 Mb and 10 Mb 20 Mb 10 Mb on two separate chromosomes 13.5 Mb on single chromosome (15 Mb total on two chromosomes) 5.3 Mb, with most regions not clinically significant Table 2: Several recent studies present conflicting recommendations regarding the cut-off value that should be used to distinguish ancestral ROH from biologically relevant ROH.

6 127.8 Mb with an average size of Mb. Smaller studies have also reported a low frequency of detection of ROH and a complex range in size and frequency 3, 16, 4. A comparatively small study of 35 samples that had a known development disorder of unknown cause and a normal acgh result showed that using a high-resolution SNP array did not detect additional pathogenic CN aberrations. A vast amount of data was generated and changes were identified per sample. More aberrations were detected in samples with reduced technical quality. Stringent filtering had to be applied to identify potentially relevant aberrations. Four samples were identified that had a ROH associated with an OMIM disease gene. Inheritance studies showed that these ROH were not true segmental UPD. This result is not unexpected as the samples came from a small founder population 21. Conclusion The studies reviewed here highlight the current complexities in defining and detecting ROH. Although the frequency of biologically relevant ROH is low, detecting ROH and distinguishing ancestral ROH from biologically relevant ROH is important and can be useful for discerning the underlying cause of the disease. What is clear though is that there is little additional benefit to identifying small ROH. This adds to the complexity of the data and does not improve the identification of biologically relevant regions. Combined CN and SNP platforms offer goldstandard CNV analysis but also SNP probe resolution that enable accurate detection of biologically relevant ROH. The CytoSure ISCA +SNP array After careful optimization and considerable experimental validation OGT has identified a number of informative SNP probes that work effectively using the standard acgh protocol allowing easy integration into existing workflows. In addition, OGT s CytoSure CGH +SNP arrays allow any reference DNA to be used and no restriction digest of the sample is required. This means that the labeling and hybridisation steps can be competed in a single day which is significantly quicker than a typical SNP workflow (Table 3). The OGT workflow is scalable and amenable to automation, particularly when using OGT s CytoSure HT Genomic Labeling Kit. No dedicated PCR areas or specialist equipment, other than the hybridisation oven and chambers, are required and any standard microarray scanner can be used. The array design itself is flexible and custom CN +SNP designs are straightforward to produce. Each array purchase comes with complimentary access to CytoSure Interpret Software, a powerful, user-friendly CN and SNP data analysis package. Innovative features such as the Accelerate Workflow enable the automation of data analysis workflows, minimising the need for user intervention and maximising the consistency and speed of data interpretation. CytoSure Interpret Software also includes extensive annotation tracks covering syndromes, genes, exons, CNVs and recombination hotspots each of which link to publically available databases such as ISCA, Ensembl and the Database of Genomic Variants, providing results in context. OGT s CytoSure ISCA +SNP array has been specifically developed to offer sufficient resolution to detect abnormally long LOH stretches present in consanguineous samples or in samples containing UPD, whilst excluding standard length ROH that are not biologically relevant without compromising CN detection. Total time required OGT s CGH + SNP protocol hours (dependant on format) Standard SNP protocol 39 hours 45 min 41 hours 45 min Total hands-on time 1 hour 5 min 6 hours 45 min Time to hybridisation set -up 1 day 3 days Time to results 3 days 4 days Table 3: Overview of workflows. The OGT acgh +SNP workflow offers considerable time savings compared when compared to a typical SNP genotyping platform. To find out more about OGT s CytoSure CGH +SNP arrays, visit or contact products@ogt.com.

7 References 1. Feuk, L. et al (2006) Structural variation in the human genome. Nat. Rev. Genetics 7, Curtis, C. et al (2009) The pitfalls of platform comparison: DNA copy number array technologies assessed. BMC Genomics 10, Conlin, L.K. et al (2010) Mechanisms of mosaicism, chimerism and uniparental disomy identified by single nucleotide polymorphism array analysis. Human Molecular Genetics 7, Bruno, D.L. et al (2009) Detection of cryptic pathogenic copy number variations and constitutional loss of heterozygosity using high resolution SNP microarray analysis in 117 patients referred for cytogenetic analysis and impact on clinical practice. Journal of Medical Genetics 46, McQuillian, R. et al (2008) Runs of homozygosity in European populations. American Journal of Human Genetics 83, Nothnagel, M. et al (2010) Genomic and geographic distribution of SNP-defined runs of homozygosity in Europeans. Human Molecular Genetics 1, Li, L. et al (2006) Long contiguous stretches of homozygosity in the Human Genome. Human Mutation 27, Gibson, J. et al (2006) Extended tracts of homozygosity in outbred human populations. Human Molecular Genetics 14, Kirin, M. et al (2010) Genomic runs of homozygosity record population history and consanguinity. PLoS ONE 5(11): e doi: /journal.pone Grote, L. et al (2012) Variability in laboratory reporting practices for regions of homozygosity indicating parental relatedness as identified by SNP microarray testing. Genetics in Medicine 14, L 11. Liehr, T. et al (2010) Cytogenetic contribution to uniparental disomy (UPD). Molecular Cytogenetics 3, Bruce, S. et al (2005) Global analysis of uniparental disomy using high density genotyping arrays. Journal of Medical Genetics 42, Altug-Teber, Ö. et al (2005) A rapid microarray based whole genome analysis for detection of uniparental disomy. Human Mutation 26, Cooper, W.N. et al (2007) Mitotic recombination and uniparental disomy in Beckwith-Wiedemann syndrome. Genomics 89, Bittles, A.H. and Black, M.L. (2010) Consanguinity, human evolution, and complex diseases. Proceedings of the National Academy of Sciences USA 26, Sund, K.L. et al (2012) Regions of homzygosity identified by SNP microarray analysis aid in the diagnosis of autosomal recessive disease and incidentally detect parental blood relationships. Genetic Medicine 15, Bennett, R.L. et al (2002) Genetic counselling and screening of consanguineous couples and their offspring: recommendations of the National Society of Genetic Counselors. Journal of Genetic Counseling, 11, Woods, C.G. et al (2006) Quantification of homozygosity in consanguineous individuals with autosomal recessive disease. The American Journal of Human Genetics 78, Kearney, H.M. et al (2011) Diagnostic implications of excessive homozygosity detected by SNP-based microarrays: consanguinity, uniparental disomy, and recessive single-gene mutations. Clinics in Laboratory Medicine 31, Papenhausen, P. et al (2011) UPD detection using homozygosity profiling with a SNP genotyping microarray. American Journal of Medical Genetics Part A 155, Siggberg, L. et al (2012) High-resolution SNP array analysis of patients with developmental disorder and normal array CGH results. BMC Med Genet 13:84 * Data kindly provided by Emory Genetics Laboratory. Data kindly provided by Dr Deborah J G Mackay and Dr Rebecca Poole, Wessex Regional Genetics Laboratory, Salisbury District Hospital, Salisbury. Begbroke Science Park, Begbroke Hill, Woodstock Rd Begbroke, Oxfordshire, OX5 1PF United Kingdom T:+44 (0) (US: ) F: +44 (0) CytoSure: This product is provided under an agreement between Agilent Technologies, Inc. and OGT. The manufacture, use, sale or import of this product may be subject to one or more of U.S. patents, pending applications, and corresponding international equivalents, owned by Agilent Technologies, Inc. The purchaser has the non-transferable right to use and consume the product for RESEARCH USE ONLY AND NOT for DIAGNOSTICS PROCEDURES. It is not intended for use, and should not be used, for the diagnosis, prevention, monitoring, treatment or alleviation of any disease or condition, or for the investigation of any physiological process, in any identifiable human, or for any other medical purpose. This document and its contents are Oxford Gene Technology IP Limited All rights reserved. OGT and CytoSure are trademarks of Oxford Gene Technology IP Limited.