Structural variation Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona
Genetic variation How much genetic variation is there between individuals? What type of variants exist and how are they generated? What is the genetic basis of phenotypic traits?
Overview 1. Types of structural variants (SVs) 2. Methods for detecting SVs 3. Copy number variants (CNVs) 4. Indels and transposable element (TE) insertions 5. Inversions 6. Mechanisms of generation 7. Functional effects and examples
INDELS Types of structural variants a b c d e f w x y z w x y c d e f a b e f DELETION a b c d e f a b c d e f a b TRANSLOCATION a b c d e f z a b c d e f INSERTION a b c d e f a e d c b f INVERSION a b c c d e f DUPLICATION
Pang et al. (2010) Genome Biology 11: R52 Structural variation vs SNPs Structural variants (SV) Genomic alterations that change the organization of the DNA molecule In comparison with SNPs: SVs represent a lower number of mutations SVs affect a higher number of nucleotides in the genome Comparison Venter Reference genomes 808.346 structural variants 48,8 Mb (1,5%) 808.179 indels 39,54 Mb (1,2%) 167 inversions 9,26 Mb (0,3%) 3.213.401 SNPs 3,2 Mb (0,1%) Mutations 80% SNPs 20% SVs Variable bases 6,15% SNPs 93,85% SVs
Methods for detection of SVs Cytogenetic techniques - Comparative genomic hybridization (CGH) arrays Paired-end mapping (PEM) Sequencing and de novo assembly of complete genomes + RESOLUTION THROUGHPUT
Cytogenetic techniques Karyotyping FISH Deletion Duplication Deletion Chromosome painting Fiber FISH Translocation Copy number variant
Fluorescence in situ hybridization (FISH) Labelled probe hybridization Final result Figure 3.32. Genomes 3. Brown. 3rd edition (2007)
Figure 3. Feuk et al. (2005) PLoS Genetics 1: e56. Inversion detection by FISH in interphase nucleus Fixed inversion between humans and chimpanzees STD INV inversion Polymorphic inversion in humans
Comparative genomic hybridization arrays (acgh) The ratio of fluoresecence intensity of the test and the reference DNA indicates the differences in copy number for a particular location in the genome Figure 2. Feuk et al. (2006) Nature Reviews Genetics 7: 85-97
Genomic DNA Fragmentation Genomic DNA library DNA digestion with restriction enzymes Cloning inside vector 2 4 1 3 5 GENOMIC LIBRARY
Paired-end mapping (PEM) 1. Construction of a DNA library of fragments of a defined size from the DNA of interest (test DNA) 2. Sequencing of both ends of a large number of fragments DNA test 40 kb 3. Mapping of both ends to a reference genome and prediction of SVs Ref DNA 40 kb 30 kb 60 kb X kb Ref DNA 20 kb Punt trencament 1 Punt trencament 2 Test DNA No variant 10 kb Insertion Deletion Inversion
Copy number variants (CNV) CNV DNA segment present in a variable number of copies compared to a reference genome Individu 1 Individu 2 2 copies 3 copies Individu 3 5 copies 8599 validated CNVs spanning a total of 112.7 Mb (3.7% of the genome) detected in 450 individuals of European, African and Asian ancestry Two genomes show different copy number in 1098 CNV regions Detected CNVs have sizes between 443 bp and 1.28 Mb (average size 2.9 kb) CNVs can include genes Some CNVs do not seem to have any influence in phenotype but others have been associated to diseases Conrad et al. (2010) Nature 464: 704-712
CNVs and segmental duplications CNV Ind. 1 Ind. 2 Ind. 3 DNA segment present in a variable number of copies compared to a reference genome 1 copy 2 copies 5 copies SD Ind. 1 Ind. 2 Ind. 3 Segment of DNA with very similar sequence present in more than one copy in the genome 2 copies 2 copies 2 copies Lesson 6. Structural variation 14 Mario Cáceres
Redon et al. (2006) Nature 444:444-454 Copy number variants (CNV) Chromosomal distribution of 1447 regions with CNVs 24% of CNVs associated with SDs 58% of CNVs overlap known genes
Montgomery et al. (2013) Genome Res 23: 749-761 Short indels <50 bp 1.6 million indels from 179 individuals representing 3 diverse human populations Purifying selection against indels in functional regions 43-48% of indels occur 4% of the genome (indel hotspots), while in the remaining 96% their prevalence is 16 times lower than that for SNPs Polymerase slippage can explain ¾ of all indels Indel density in 6 genic regions Coding indel lengths
Kidd et al. (2008) Nature 453: 56-64 Large indels Fosmid (40 kb) PEM in 8 humans found 747 deletions and 724 insertions >5 kb 32 kb deletions between 12-kb direct SDs with 94% identity Identification of novel sequence not included in the reference genome
TE insertion confirmed by PCR TE insertion polymorphisms Present in reference Absent in test Read pairs mapping into the insertion Reads containing part of the insertion Insertion in test Absent in reference Stewart et al. (2011) PLoS Genetics 7:e1002236 Read pairs with longer distance between them
Stewart et al. (2011) PLoS Genetics 7:e1002236 Figure 1. Li et al. (2011) Nature Biotechnology 29: 723-730 Polymorphic TE insertions in humans Data from 1000 genomes project (185 individuals from 3 populations) Active elements are Alu, L1 and SVA Size distribution of structural variants <1 kb in two sequenced genomes Alu 7380 polymorphic insertions detected Polymorphisms between two individuals: 2 European 600 2 African 1400 1 European and 1 African 2000 Polymorphic TE insertions within genes De novo insertion frequency = 1 insertion per 20 births
Inversions Change of orientation of a segment of DNA 2st 2j Distal Inverted regions Proximal Cen. Cen. STD INV Types of inversions Inversions have been associated to phenotypical traits Mechanisms by which inversions are able to affect phenotype remain unknown Balanced events are difficult to study They can present repeats in opposite orientation at their breakpoints
Effects and consequences of inversions Suppression of recombination Within the inverted sequence in STD/INV heterozygotes Alleles found together within an inversion tend to be inherited together http://mhanswers-auth.mhhe.com/biology/genetics/mcgraw-hill-answers-changes-chromosome-structure-and-number
Effects and consequences of inversions Position effects Altered gene expression of adjacent genes caused by the mutational effects of inversion breakpoints BP location Consequences Between genes Change of positions Within genes Disrupted gene Between regulatory elements and genes Disrupted regulatory elements Expression Normal expression No expression Altered expression patterns http://mhanswers-auth.mhhe.com/biology/genetics/mcgraw-hill-answers-changes-chromosome-structure-and-number
Mechanisms of generation of SVs SVs are typically generated during DNA break-induced repair, recombination or replication by different possible mechanisms: Non-Allelic Homologous Recombination (NAHR) (duplications, deletions, inversions, translocations) Non-Homologous End Joining (NHEJ) (deletions, inversions) Transposition of transposable elements (insertions, deletions) Fork Stalling and Template Switching (FoSTeS) (duplications, deletions, inversions, translocations)
Non-Allelic Homologous Recombination (NAHR) Intra or interchomosomal recombination between copies of a sequence in different genomic positions Duplications and deletions Translocations Inversions Figure 4. Bailey and Eichler (2006) Nature Reviews Genetics 7: 552-564
Repeated sequences in the human genome v Gaps SD Segmental duplications Intra/interchromosomal duplicated sequences with length 1 kb and identity 90% Figure 4. International Human Genome Sequencing Consortium (2004) Nature 409: 931-945. Represent 5.3% of the human genome Transposable elements Almost 50% of the human genome are transposable elements High number of copies of each TE: 850000 LINEs 1.5 million SINEs 450000 LTR 300000 DNA Figure 1. Cordaux and Batzer (2009) Nature Reviews Genetics 10:691-703
Non-Homologous End Joining (NHEJ) Original DNA molecules Generation of an inversion Double strand breaks Generation of a translocation Repair Repaired DNA molecules
FoSTeS Fork Stalling and Template Switching (FoSTeS) Replication based mechanism Could be combined with microhomology Typically generates very complex rearrangements Figure 5. Gu et al. (2008) PathoGenetics 1:4
Altered gene dosage and expression (CNVs) Disruption of gene or regulatory elements (insertions, deletions, inversions) Gene fusion (deletions, inversions) Change in the exon-intron structure (insertions, deletions, CNVs, inversions) Functional consequences of SVs Modification of gene regulatory regions (insertion, deletions, CNVs, inversions) Indirect effects though increased susceptibility of genomic rearrangements (CNVs, inversions)
SVs and disease Tuzun et al. (2007) Nature Genetics 37:727-732
CNVs and complex diseases Summary of Common Disorders for Which Associations to CNVs Have Been Reported Table 3. Estivill and Armengol (2007) PLoS Genetics 3:e190
CNV example: the amilase gene Japanese individual High-starch diet (14 copies) The amylase protein levels in saliva are proportional to the number of the AMY1 gene copies African individual Low-starch diet (6 copies) Chimpanzee Low-starch diet (2 copies) Figures 1, 2 and 3. Perry et al. (2007) Nature Genetics 39: 1256-1260 Individuals from populations with high-starch diets have on average more AMY1 copies than those with traditionally low-starch diets.
González et al. (2005) Science 307: 1434-1440 CNV example: CCL3L1 Individuals with low copy numbers of the chemokine gene, relative to their ethnic background, are associated with markedly enhanced HIV-1 (AIDS) susceptibility.
Feschotte (2008) Nature Reviews Genetics 9: 397-405 Effects of TEs on genes
Stefansson et al. (2005) Nature Genetics 37: 129-137 Chromosome 17 inversion in humans 900-kb polymorphic inversion originated by NAHR between 200-500 kb segmental duplications Detected mainly in European populations where it has a 20% frequency It is possible that this inversion is positively selected because it may be associated to an increased fertility in female carriers
It affects flowering time causing reproductive isolation Figures 1 and 2. Lowry and Willis (2010) PLoS Biology 8: e1000500 Inversion in the plant Mimmulus guttatus Mimulus guttatus ecotypes coastal perennial inland annual North-American plant Mimulus guttatus A polymorphic inversion causes the differences between the annual and perennial forms adapted to different environments
The 1000 genomes project http://www.1000genomes.org Objective Experiments Identify all genetic variants with a frequency higher than 1% in the studied populations Sequencing using next-generation techniques of 2500 whole genomes from 25 world-wide populations with a 4x redundancy Pilot phase 179 individuals from 4 populations 15 million SNPs 1 million short insertions and deletions 20000 structural variants >95% of variants with frequencies >5%) Phase I 1092 individuals from 14 populations 38 million SNPs 1.4 millions short indels 14000 larger deletions 98% of SNPs with frequencies >1% The 1000 Genomes Project Consortium (2010) Nature 467: 1061 1073 The 1000 Genomes Project Consortium (2012) Nature 491: 56-65