Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content

Size: px
Start display at page:

Download "Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content"

Transcription

1 Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content Nathan M. Springer 1., Kai Ying 2,3., Yan Fu 4,5, Tieming Ji 6, Cheng-Ting Yeh 4,7, Yi Jia 8, Wei Wu 4,7, Todd Richmond 9, Jacob Kitzman 9, Heidi Rosenbaum 9, A. Leonardo Iniguez 9, W. Brad Barbazuk 10, Jeffrey A. Jeddeloh 9, Daniel Nettleton 6, Patrick S. Schnable 2,3,4,5,7,8 * 1 Department of Plant Biology, University of Minnesota, Saint Paul, Minnesota, United States of America, 2 Interdepartmental Genetics Graduate Program, Iowa State University, Ames, Iowa, United States of America, 3 Department of Genetics, Development, and Cell Biology, Iowa State University, Ames, Iowa, United States of America, 4 Department of Agronomy, Iowa State University, Ames, Iowa, United States of America, 5 Center for Carbon Capturing Crops, Iowa State University, Ames, Iowa, United States of America, 6 Department of Statistics, Iowa State University, Ames, Iowa, United States of America, 7 Center for Plant Genomics, Iowa State University, Ames, Iowa, United States of America, 8 Interdepartment Plant Biology, Iowa State University, Ames, Iowa, United States of America, 9 Roche NimbleGen, Madison, Wisconsin, United States of America, 10 University of Florida, Gainesville, Florida, United States of America Abstract Following the domestication of maize over the past,10,000 years, breeders have exploited the extensive genetic diversity of this species to mold its phenotype to meet human needs. The extent of structural variation, including copy number variation (CNV) and presence/absence variation (PAV), which are thought to contribute to the extraordinary phenotypic diversity and plasticity of this important crop, have not been elucidated. Whole-genome, array-based, comparative genomic hybridization (CGH) revealed a level of structural diversity between the inbred lines B73 and Mo17 that is unprecedented among higher eukaryotes. A detailed analysis of altered segments of DNA conservatively estimates that there are several hundred CNV sequences among the two genotypes, as well as several thousand PAV sequences that are present in B73 but not Mo17. Haplotype-specific PAVs contain hundreds of single-copy, expressed genes that may contribute to heterosis and to the extraordinary phenotypic diversity of this important crop. Citation: Springer NM, Ying K, Fu Y, Ji T, Yeh C-T, et al. (2009) Maize Inbreds Exhibit High Levels of Copy Number Variation (CNV) and Presence/Absence Variation (PAV) in Genome Content. PLoS Genet 5(11): e doi: /journal.pgen Editor: Joseph R. Ecker, The Salk Institute for Biological Studies, United States of America Received July 1, 2009; Accepted October 19, 2009; Published November 20, 2009 Copyright: ß 2009 Springer et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Funding: This research was supported in part by grants from the National Science Foundation to PSS and others (DBI and DBI ). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: Todd Richmond, Jacob Kitzman, Heidi Rosenbaum, A. Leonardo Iniguez, and Jeffrey A. Jeddeloh are employees of Roche NimbleGen. * schnable@iastate.edu. These authors contributed equally to this work. Introduction Although many analyses of genetic variation have focused on single nucleotide polymorphisms (SNPs), there is a growing appreciation for the roles of structural variation as a cause for phenotypic variation [1 7]. Indeed, structural variation can have major phenotypic consequences [6]. The term copy number variation has been used to describe duplications, deletions and insertions among individuals of a species [5]. Herein the term copy number variation (CNV) is reserved to describe sequences that are present in both genomes being compared, albeit in different copy number. The term presence-absence variation (PAV) is used to describe sequences that are present in one genome but entirely missing in the other genome. Maize is phenotypically diverse [8 9] and this phenotypic diversity is reflected by substantial variation in phenotypic and transcript levels among maize lines [8,10 11]. In addition, the maize genome exhibits extraordinarily high levels of genetic diversity as assayed at the level of SNPs, InDel Polymorphisms (IDPs), and structural variation [9,12]. The frequency of SNPs among maize inbreds is higher than the frequency of SNPs between humans and chimpanzees [9]. The inbred lines B73 and Mo17 are important models for the structural and functional genomics of maize. On average, B73 and Mo17 contain an IDP every,300 bp and SNPs every,80 bp [13 14] and within transcripts SNPs are found between the inbred lines B73 and Mo17 on average every 300 bp [15]. These levels of diversity are not limited to comparisons between B73 and Mo17. When comparing any two randomly chosen maize inbred lines, there is, on average, one polymorphism every 100 bp [16 17]. Collectively, these studies indicate that maize has relatively high levels of SNPs and IDPs as compared to many other species [9]. There is also cytogenetic evidence for structural variation in the genomes of maize inbreds. Structural genomic variation involves alterations in DNA sequence beyond SNPs or small IDPs, and includes large-scale differences in chromosomal structure, altered locations of genes or repetitive elements, copy number variation (CNV) and presence/absence differences among haplotypes. Large-scale differences in chromosomal structure between maize inbred lines were first identified through cytogenetic studies. Barbara McClintock and others analyzed heterochromatic knob (highly condensed, tandem repeat regions) content and size to PLoS Genetics 1 November 2009 Volume 5 Issue 11 e

2 Author Summary There is a growing appreciation for the role of genome structural variation in creating phenotypic variation within a species. Comparative genomic hybridization was used to compare the genome structures of two maize inbred lines, B73 and Mo17. The data reinforce the view that maize is a highly polymorphic species, but also show that there are often large genomic regions that have little or no variation. We identify several hundred sequences that, while present in both B73 and Mo17, have copy number differences in the two genomes. In addition, there are several thousand sequences, including at least 180 sequences annotated as single-copy genes, that are present in one genome but entirely missing in the other genome. This genome content variation leads to differences in transcript content between inbred lines and likely contributes to phenotypic diversity and heterosis in maize. characterize genome variation [18 20]. Recent studies have documented differences in the content of several classes of repetitive DNA between maize inbreds at the chromosomal level [21]. Flow cytometry studies have also documented significant variation in overall genome sizes among inbred lines [22]. Sequence-based methodologies have documented structural diversity at a higher resolution (reviewed by [9,12]). Sequencing of BACs containing the bz1 gene from eight different inbred lines revealed two significant findings [23 24]. First, there is variation for the presence of several genic fragments such that these genes are found at this locus in some inbreds but not in others [23]. These genes were subsequently found to be gene fragments that had been mobilized by Helitron transposons [25 26]. These are not PAVs because although a genome may lack a copy in the vicinity of the bz1 locus, such a genome typically contained one or more copies of these genes (or gene fragments) elsewhere. Second, comparison of multiple haplotypes revealed major differences in the amount and types of repetitive elements between genes. The same gene can be flanked by very different repetitive elements in different inbred lines [23]. At the same time, similar kinds of repeat diversity between haplotypes were reported in the a1-sh2 interval [27]. Both of these findings have been supported by analyses of other genomic regions in B73 and Mo17 [28]. A study of the presence and location for many genic fragments in B73 and Mo17 BAC libraries suggested that many sequences can vary in location or even presence between B73 and Mo17 [28]. There is also evidence for variation in the presence of nearly identical paralogs (NIPS) in different maize inbred lines [29]. Understanding the intraspecific variation of maize has important implications for crop improvement and plant breeding. Longterm selection experiments have demonstrated a surprising wealth of potential; even when starting with relatively little genetic diversity it has been possible to continue to make phenotypic gains for traits such as oil content for over a century [30]. In addition, the combination of variation from different maize inbred lines in hybrids results in heterosis [31]. The availability of genomic resources for maize, particularly the B73 maize genome sequence [32] has provided an opportunity to conduct genome-wide analyses of structural variation. We have used high-density oligonucleotide microarrays to identify patterns of structural variation across the maize genome. We find evidence for a high rate of CNVs. In addition, we identify several thousand DNA segments, often including genic sequences, that are present in the B73 genome but absent from the Mo17 genome (i.e., PAVs). By assessing genome-wide structural variation in maize we have gained a better understanding of the nature of variation among different maize inbred lines. Results Development and annotation of a CGH microarray for maize Genomic variation within a species can be assessed using comparative genomic hybridization (CGH). A high-density (2.1 million feature) oligonucleotide microarray was designed using the sequences of B73 BACs. Probes range in sizes from bp were selected using slightly relaxed criteria (due to the overlap of adjacent BAC sequences and lack of assembly at the time of design) relative to those traditionally used for CGH probe design. The 2.12M probes were aligned to the B73 RefGen_v1 [32] released by the maize genome sequencing project (MGSP). It was possible to identify perfect matches (100% ID and 100% coverage) for 93% (1.98M/2.10M) of the probes. Approximately,1.78 million of the probes had only a single perfect match and were therefore deemed to be single copy,,120k probes had two perfect matches and,34k probes had three perfect matches (Figure S1). All of these perfectly matched probes were classified based on their repetitiveness and locations relative to predicted genes (see Methods for details and Table 1 for numbers). Approximately 30% of the probes exhibited evidence of containing repetitive sequences (Methods; Table 1). Probes were also mapped relative to genes and other types of annotation produced by the MGSP. The distributions of probes relative to these types of annotation were assessed by visualizing the locations of probes that aligned to several genomic regions for which high-quality assembled sequence and manual annotation were available for both B73 and Mo17 (Figure 1 and Figure S2; [23,28]). There are 1,604 probes within the,1 Mb of B73 sequence from these four regions (selected portions are shown in Figure 1 and Figure S2). Probe density is generally high for those regions in which the B73 and Mo17 haplotypes align well. These regions tend to be genic or low copy and have 3 4 probes per kb. Consistent with the NimbleGen probe design strategy, fewer probes are located in regions consisting of a high percentage of repetitive element sequences. Analyses of these regions were used to evaluate the quality of our genome-wide probe annotation. Several tracks in Figure 1A provide information about the repetitive and genic classifications of probes. Our genome-wide annotations generally agree with the detailed annotation information available for these four regions. The probes that were classified as repetitive in the genome-wide analyses were often found within sequences that were annotated as repetitive or retrotransposon based on the four regions that had been subjected to manual annotation. All probes were designed based on the B73 haplotype. To determine whether probe sequences were conserved in Mo17, probe sequences were aligned to a collection of 42,206,644 Mo17 whole-genome shotgun (WGS) reads generated by the DOE s Joint Genome Institute (JGI) and provided to us prior to publication by the Rohksar group. Based on these alignments each probe was classified as being a perfect match (100% identity and coverage), highly conserved (.97% identity and coverage), conserved (.90% identity and coverage), poorly conserved (.75% identity and.70% coverage) or as having no significant match in the JGI Mo17 data set. Over 80% of the probes were at least 90% identical to Mo17 sequences with over 90% of probe sequence coverage (Table 1). The analysis of the four regions that have complete coverage of the Mo17 haplotype permitted us to compare the results of our PLoS Genetics 2 November 2009 Volume 5 Issue 11 e

3 Table 1. Probe classifications and enrichment in specific categories. Classifications Categories # of Probes Significant probes (q,0.0001) B.M significant probes M.B significant probes All probes 2,110, , ,963 33,850 Probe classification by repetitiveness a Non-repeat 1,461,771 (69%) 278,390 (85%) 249,188 (85%) 29,202 (86%) Total repetitive 630,586 (30%) 47,423 (15%) 42,775 (15%) 4,648 (14%) Cereal repeat 226,706 (11%) 12,349 (4%) 11,586 (4%) 763 (2%) Crosshyb 585,105 (28%) 40,907 (13%) 36,616 (13%) 4,291 (13%) Multi-copy 54,791 (3%) 6,276 (2%) 5,497 (2%) 779 (2%) Probe classification by Mo17 Perfect match (100%) 871,664 (41%) 44,062 (14%) 22,586 (8%) 21,476 (63%) conservation annotation b Highly Conserved (.97%) 331,590 (16%) 42,992 (13%) 36,659 (13%) 6,333 (19%) Conserved (.90%) 505,519 (24%) 106,996 (33%) 103,817 (35%) 3,179 (9%) Poorly conserved (.75%) 182,570 (9%) 55,410 (17%) 53,929 (18%) 1,481 (4%) No match in Mo17 (,75%) 219,325 (10%) 76,353 (23%) 74,972 (26%) 1,381 (4%) Genic annotation Exon 98,886 (4.5%) 11,885 (3.6%) 9,803 (3.4%) 2,082 (6.2%) Exon-intron 62,448 (2.8%) 8,447 (2.6%) 7,134 (2.4%) 1,313 (3.9%) Intron 145,483 (6.6%) 24,047 (7.4%) 21,159 (7.2%) 2,888 (8.5%) bp 121,133 (5.5%) 26,682 (8.2%) 24,569 (8.4%) 2,113 (6.2%) bp 134,408 (6.1%) 29,284 (9%) 26,618 (9.1%) 2,666 (7.9%) Intergene 1,650,875 (74.6%) 225,468 (69.2%) 202,680 (69.4%) 22,788 (67.3%) a The repetitive nature of each probe was determined by comparing to the B73 reference genome. All probes that satisfy criteria (see Methods for details) for cereal repeat, crosshyb or multi-copy were designated as repetitive. b The probes were each classified based upon the most significant similarity to the Mo17 WGS sequence from JGI. The full definition for each category can be found in the Methods section. doi: /journal.pgen t001 genome-wide classifications with actual alignments of complete B73 and Mo17 sequences. Because the JGI collection of Mo17 WGS reads provides approximately 46 coverage of the genome we expect some probes to be mis-classified as poorly conserved or as having no match in Mo17 simply due to incomplete sampling of the Mo17 genome. Overall, there was strong agreement between our classification of probes based on alignments to the Mo17 WGS reads and the genomic alignments shown in Figure 1. As expected, there were few cases of probes within highly conserved regions that had erroneously been classified as poorly conserved or no match. Even so, most probes within regions of the B73 haplotype for which there was no significant similarity in allelic regions of the Mo17 haplotype did not match the WGS Mo17 sequences. Some of the probes that matched regions of the B73 haplotype for which there was no significant similarity in allelic regions of the Mo17 haplotype (i.e., positions 257, ,000 in Figure 1A) did have similarity to WGS Mo17 sequences. This suggests that regions of the B73 haplotype that can not be aligned to allelic positions of the Mo17 haplotype are of two types. In some cases the non-aligning sequences are B73-specific (PAVs), while in other cases Mo17 contains these sequences but at non-allelic positions similar to those reported by Fu and Dooner [23]. Structural variation detected by CGH B73 and Mo17 genomic DNA samples were hybridized to the microarray using dye swaps as well as technical replication (Methods). Analysis of the CGH data reveals a bias towards stronger hybridization signals from B73 genomic DNA than from Mo17 genomic DNA (Figures S3, S4). This bias is likely due to the fact that the array design was based upon the B73 genomic sequence and that polymorphisms between B73 probes and the labeled Mo17 genomic DNA may reduce signal strength. This imbalance in signals between the genotypes violates an assumption required to perform typical global normalization. Consequently, we implemented a normalization procedure that utilized a subset of probes for array normalization. This strategy employed the raw signals from those 840,289 probes whose sequences are absolutely conserved between B73 and Mo17 (based on our analysis of the Mo17 WGS data) to normalize the remaining data (,60% of the probes). A linear model was used to estimate the signal from each genotype and to determine q-values to control false discovery rates. To understand the biological causes of differences in hybridization signals between B73 and Mo17 we initially focused on the four regions shown in Figure 1 and Figure S2 for which highquality B73 and Mo17 sequence were available. We found significant (q,0.0001) differences in hybridization signal in B73 relative to Mo17 for 234 of the 1,604 probes within these regions (Table 2). As expected it was much more common to observe higher hybridization signal in B73 (210 probes) than the reverse (24 probes). There are at least three biological reasons why a probe exhibits significant differences in signal after being hybridized to genomic DNA from two inbred lines. First, the probe sequence may have polymorphisms in the two genotypes (SNPs and IDPs). Second, the copy number of the probe in the genomic DNA might be different in the two genotypes being compared (CNV). Third, the probe sequence may be present in the genomic DNA of the reference genotype but not the other (PAV). It is important to remember that while all three reasons could explain why a probe would have a higher signal in B73 than in Mo17, only the second reason is likely to cause probes to have higher signals in Mo17 than in B73 because all probes were designed based on the B73 sequence. The impact of sequence polymorphisms on hybridization can be observed by comparing the average log 2 (Mo17/B73) in probes PLoS Genetics 3 November 2009 Volume 5 Issue 11 e

4 Figure 1. Significant hybridization differences are due to structural variation. (A) The B73 and Mo17 sequences for a portion of the 9,009 locus (sequenced by [28]) were aligned using Vista [71] which displays the percent identity as a sliding window of 100 bp (y-axis is 50% to 100% identity). The location of genes annotated by Brunner et al. [28] (indicated by light blue sequences in the alignment) and repeat elements (the colorcoded track right above the alignments; pink indicates retrotransposons and orange indicates transposons) are shown above the VISTA alignment. The log 2 (Mo17 signal/b73 signal) is shown for each probe in this region. The red probes exhibit significantly different (q,0.0001) signal in B73 and Mo17. The blue line indicates a segment with altered hybridization that was identified using DNAcopy. There are also data tracks that display the repeat annotation and B73/Mo17 similarity for each probe. Note that these annotations are based on the genome-wide analysis, not detailed analyses of these regions. In (B) we present the annotation, alignment and CGH data for a portion of the 9008 loci (sequence and annotated by Brunner et al., [28]). doi: /journal.pgen g001 with different levels of polymorphism between B73 and Mo17 (Table 2). For probes with no polymorphisms the average log 2 (Mo17/B73) is zero. As the number of polymorphisms between B73 and Mo17 increases, the log 2 (Mo17/B73) value decreases and the percentage of probes that exhibit statistically significant differences in signal strength (q,0.0001) increases. Most of the probes with significant variation (68%) have 5 or more SNPs (note that often these probes can not be aligned to the Mo17 Table 2. Influence of polymorphisms on hybridization variation. # SNPs # Probes B.M significant probes M.B significant probes Average log2(m/b) (1%) (7%) (19%) (25%) (26%) or more (22%) Total (13%) 24 * Significant probes indicate a q value, from the linear model. doi: /journal.pgen t002 WGS reads at all and many have multiple IDPs or may even be absent altogether from Mo17). Overall, this finding indicates that the majority of the significant differences in hybridization signals are due to the presence of multiple polymorphisms within the,70 bp probe sequence or due to sequences that encompass or overlap the probe sequence that are present in B73 but absent from the Mo17 genome. Further support for the concept that many of the probes that exhibit significant differences in hybridization signals are reporting structural variation was provided by visualization of the distribution of log 2 (Mo17/B73) signals relative to the four B73/Mo17 haplotype alignments (Figure 1 and Figure S2). For example, each of the four probes in Figure S2A that have significantly lower signals in Mo17 than in B73 are in regions in which the two haplotypes differ substantially. Similarly, in Figure S2B the six probes with significantly lower signal in Mo17 than in B73 all fall near regions of structural variation. Many of the probes with significant signal differences between B73 and Mo17 occurred in the regions surrounding non-shared repetitive elements. It was surprising that some of the probes with 5 or more SNPs (in alignments between these two regions only) did not exhibit significant differences in hybridization signals. However, we noted that although some of these probes (several examples shown in Figure S2) do not have a similar sequence at an allelic position in Mo17, they do have one present elsewhere in the Mo17 genome based on alignments to the Mo17 WGS sequences. PLoS Genetics 4 November 2009 Volume 5 Issue 11 e

5 Genome-wide analysis of probes with variable signal in B73 and Mo17 After analyzing in detail probes that aligned to the regions presented in Figure 1 and Figure S2, we assessed the characteristics of all probes that exhibit significant variation in B73 relative to Mo17. At a cut-off of q, there are 325,813 probes with significant differences in hybridization signals between B73 and Mo17 (15% of all probes, Table 1). The majority (90%) of these probes exhibit higher signals from B73 than from Mo17 (B.M; Table 1), as can be readily observed in volcano and MA plots (Figure S5). In general, and as expected, repeat probes tend to have higher signals than non-repetitive probes (as seen in MA plots in Figure S6). Both B73.Mo17 and Mo17.B73 probes are enriched for non-repetitive probes and consequently depleted for repetitive probes (Table 1 and Figures S7, S8A). This is not surprising because the signals associated with repetitive probes reflect cross-hybridization from multiple genomic sites and therefore a change at a single site will have less impact on signal strength. There are striking differences in the B73.Mo17 and Mo17.B73 probes when comparing annotation based on alignments of probe sequences to the Mo17 WGS sequences (Table 1, Figures S8B, S9, S10). Consistent with our analysis of probes from Figure 1, genome-wide B73.Mo17 probes are enriched for sequences with no match or poor conservation in the Mo17 WGS sequence and are correspondingly depleted for highly conserved or identical probes. Those B73.Mo17 probes that do have an identical or highly conserved sequence among the Mo17 WGS sequences are likely to be examples of CNV and can be used to estimate the rate of CNV. The Mo17.B73 probes follow the opposite pattern with enrichment for probes that have a highly conserved or identical sequence in both B73 and Mo17. This indicates that many of the probes with no match in the Mo17 WGS sequence reflect actual sequence differences, not simply a lack of coverage in the Mo17 WGS sequence. Probes were compared to the full working set of genes predicted by the MGSP ( This permissive gene set (n = 129,891) includes low-copy transposons as well as pseudogenes. The B73.Mo17 probes exhibit a distribution of genic and intergenic matches that is very similar to all probes. Interestingly, the Mo17.B73 probes are slightly depleted for intergenic probes and show an enrichment for probes near or within genes (Table 1; Figure S8C). A very similar distribution is observed using the filtered set of high-quality gene annotations from the MGSP (data not shown). Distribution of structural variation throughout the maize genome The probes were aligned to the B73 RefGen_v1 to visualize the patterns of structural variation along the B73 and Mo17 chromosomes (Figure 2). It should be noted that while the B73 reference genome generally place segments of DNA in the proper order at the level of a single BAC, the local orientation and order of sequence contigs within a BAC has not always been determined. Therefore, our genomic localization of the probes is likely only accurate within the average size of a BAC (,170 kb). The log 2 (Mo17/B73) signals for each probe were plotted relative to the genomic localization of the probes. As noted above, the majority of probes with significant B73.Mo17 hybridization detect structural variation. The genomic view provided in Figure 2 reveals that Figure 2. Genomic distribution of log 2 (Mo17/B73) signals. The log 2 (Mo17/B73) hybridization intensities are plotted for each chromosome. Data points below the line indicate higher hybridization in B73 than in Mo17. The positions of the centromeres [72] are indicated by black boxes. Note that there are chromosomal regions with high rates of variation (example near MB on chromosome 6) and regions with low rates of variation (example from MB on chromosome 8). doi: /journal.pgen g002 PLoS Genetics 5 November 2009 Volume 5 Issue 11 e

6 structural variation between these two inbreds is not evenly distributed throughout the maize genome. The large number of data points plotted on this graph (,.2 million) can make it difficult to visualize the relative rates of variation across the genome. Therefore, we implemented a sliding window analysis to observe the frequency of probes with significant B73.Mo17 variation in regions that are the approximately the size of 10 average BACs (Figure 3). There are a number of highly conserved genomic regions that have very little or no structural variation between B73 and Mo17 (Figure 3). For example, there is an,19 Mb region on chromosome 8 (Figure S11A; positions 140,904, ,897,190) and a 17 Mb region on chromosome 1 (positions 121,420, ,984,608) with no evidence for structural variation. The sliding window analysis identified 104 regions that exhibit little to no structural variation (fewer than 4% of the probes exhibit significant variation). Seven of these low diversity regions (on chromosomes 1, 3, 4, 5, 8 and 9) are over 10 Mb. We performed further characterization of the large regions on chromosomes 1 and 8 with low rates of structural variation. The majority of probes within these regions (83%) are 100% conserved in B73 and Mo17 suggesting that these are low diversity regions. None of 388 primer pairs designed to amplify sequences within these regions revealed sequence variation between B73 and Mo17 that could be detected via agarose gel electrophoresis. In comparison, 13% of all primer sets designed for random genomics sites detect variation. We then used Temperature Gradient Capillary Electrophoresis (TGCE) to test whether B73 and Mo17 amplification products from 156 of the 388 primer pairs from the conserved regions contain SNPs or small IDPs. TGCE is sensitive enough to detect a single SNP in amplicons of over 800 bp and 1 bp IDPs in amplicons of,500 bp [33], which is the typical size of these amplification products. Of these 123/156 (79%) exhibited no evidence of even a single SNP or IDP between B73 and Mo17, indicating the high level of sequence conservation within these two intervals. In contrast, only 39% of randomly selected sites are not polymorphic using TCGE assays. There is a tendency for these large low diversity regions to be located near the central portions of the chromosomes and the centromere to be located near one side of a low diversity region for all chromosomes except 9. However, there are many low diversity regions that are not centromeric (for example, the large region on chromosome 8). These low diversity regions are likely to represent regions in which B73 and Mo17 are identical by descent or regions with no structural variation in the maize species. These low diversity regions also exhibit very low levels of differential gene expression. Only three of the 196 genes from the MGSP filtered gene set that are located in the conserved chromosome 1 or chromosome 8 regions and queried by the Affymetrix 17K microarray exhibit evidence for differential expression in seedling, embryo or endosperm tissue from B73 and Mo17 [11]. The few cases of differential expression within sequence-conserved regions may reflect the action of trans-acting factors that are polymorphic between B73 and Mo17. Mega-base sized B73-specific sequence One visually striking feature in Figure 2 and Figure S11B is the region on chromosome 6 (positions 42,211,131 44,706,565) that contains a cluster of B73.Mo17 probes. Closer inspection of this region indicates that the region of elevated structural variation is,2.6 Mb (Figure 4A). The majority of probes in this region are either poorly conserved or not present among the Mo WGS sequences. This finding suggests that this 2.6 Mb sequence is present in the B73 genome but entirely absent from the Mo17 genome. Primer pairs designed based on the B73 sequence of this region were used to conduct PCR on B73 and Mo17 (Table S1). All 38 primer pairs amplified B73 but not Mo17. These primer pairs were also used to query for the presence of this 2.6 Mb segment in 22 other maize inbred lines. The data suggest that 16 of the inbreds contain this segment while the other 6 did not (Figure 4B). These inbreds seemed to contain (or lack) the entire segment as a haplotype block. It should be noted that both the CGH and PCR analyses suggest that all 2.6 Mb of sequence is missing in its entirety from the Mo17 genome and from the other six inbreds; neither it, nor components of it, are located at nonallelic positions. Based on the filtered gene set from the MGSP there are 31 genes within the,2.6 Mb B73-specific interval. RNA-seq experiments provide evidence for expression of 14/31 genes located within this interval in B73 shoot apical meristem tissue (Yi Jia, Kaz Ohtsu and Patrick S. Schnable, unpublished data) suggesting that many of the genes in this interval may be functional in B73. In addition, three of the genes within this interval are detected by the Affymetrix 17K microarray. The expression of all three of these genes are detected in B73 but not in Mo17 [11]. Notably, intermediate levels of expression of these genes are also detected in B73xMo17 and Mo17xB73 hybrids. The sequence of the B73-specific region does not exhibit similarity to the chloroplast or mitochondrial genomes. The genes present within this region do not shown synteny to any specific region of the rice genome but are found scattered across different rice chromosomes. Maize chromosome 6 is syntenic to rice chromosomes 5 and 6 [32]. However, there is a region near the centromere that does not show synteny with any rice chromosome and the 2.6 Mb segment is located within this region. Fine-scale analysis of the synteny in this region indicates that the distal sequence shows synteny to rice chromosome 5 while the sequence proximal to the B73-specific sequence is syntenous to rice chromosome 6. Hence, the B73-specific region is right at the point where the syntenic regions of maize chromosome 6 appear to have fused relative to rice chromosomes 5 and 6. The facts that many of these genes are expressed in maize and that many of the genes within this region are conserved in rice implies that the B73- region was likely selected in maize and have been deleted in the Mo17 haplotype. Identification of copy number variants and genome content differences In addition to this large region of genome variation on chromosome 6, we expected to identify numerous smaller copynumber variants (CNVs). As seen in the analysis of several wellannotated BACs, there are probes every,400 bp in low-copy genomic DNA (Figure 1). CNVs can be discovered by assessing the behavior of adjacent probes to identify segments of DNA that give consistently altered signal from two genomes. The DNAcopy algorithm [34 35] was used to identify segments within the CGH dataset with a minimum length of 5 probes (Methods). This resulted in the identification of 53,589 segments that are within a single intra-bac DNA sequence contig. The distribution of the average log 2 (Mo17/B73) values for each segment was well approximated by a normal mixture model with four components, each corresponding to a different class of segments (Figure 5). Because the component distributions overlap, there is uncertainty about the class membership of each segment. However, it is possible to calculate the probability that any particular segment belongs to a specific class based on the segment s average log 2 (Mo17/B73) value (Methods). Using such probabilities, each segment was classified into its most likely class. This is a relatively PLoS Genetics 6 November 2009 Volume 5 Issue 11 e

7 Figure 3. Identification of regions of low structural diversity. The proportion of probes that exhibit significantly higher hybridization to B73 genomic DNA than Mo17 (q,0.0001) was determined for a sliding window of 1 Mb probes with increments of 0.33 Mb. The approximate position of each centromere (from Wolfgruber et al., [72]) is indicated by a red circle on each chromosome. The locations of the tb1 [41] and y1 [40] genes, which are known to have undergone selective sweeps, are indicated. The gene density (based on the filtered gene set from the MGSP) is shown below each chromosome. The gene density was determined based on the number of genes per Mb. The dark color indicates low gene density while the yellow color indicates higher gene density. doi: /journal.pgen g003 PLoS Genetics 7 November 2009 Volume 5 Issue 11 e

8 Figure 4. Characterization of 2 Mb region on chromosome 6 that is present in B73 but missing in the Mo17 genome. (A) A 10 Mb region on chromosome 6 is shown. The color-coding for each probe indicates the level of conservation of the probe sequence to the Mo17 WGS sequence. The coordinates on the x-axis refer to base pair position within chromosome 6 of the B73 Refgen_v1. The 2.6 Mb region from 42.2 to 44.8 is enriched for probes that are poorly conserved or have no match in the Mo17 sequence and the majority of these probes exhibit much higher signal in B73 than in Mo17. (B) The data from 38 primer pairs are shown. Blue indicates successful amplification for a particular inbred by primer combination while red indicates no amplification. The full set of 38 primer pairs (see Table S1 for details) amplify products in B73 but not in Mo17. doi: /journal.pgen g004 permissive approach towards identifying segments. We proceeded to further restrict the results to generate a subset of stringent segments that are at least 2,000 bp in length, include at least 10 probes, and, for B73.Mo17 and Mo17.B73 classes, exhibit at least a two-fold difference between average B73 and Mo17 signals (Table 3). The DNA segments from the different classes exhibit different distributions for segment length, probe number/segment and repetitive DNA content (Table 3; Figure S12). The B73.Mo17 and Mo17.B73 segments represent DNA sequences that are variable between B73 and Mo17. B73.Mo17 DNA segments could be the result of CNV or differences in genomic content (PAV) between the two lines. In an attempt to distinguish between these two possibilities we determined the proportion of each segment that was non-repetitive and that could be aligned to Mo17 WGS sequence reads. If a large proportion of the segment was found in Mo17 then it is likely that the segment is a CNV, while segments that are missing from the Mo17 WGS likely represent PAVs. The distribution of Mo17 coverage was very different for B73.Mo17 segments compared to the other categories of segments (Figure S13). Over 50% of the B73.Mo17 DNA segments have less than 20% sequence coverage by Mo17 sequences. In the other classes, a majority of segments have.60% coverage by Mo17 WGS. We decided to split the B73.Mo17 segments into three subgroups. B73.Mo17_PAV (present-absent variation) segments exhibit less than 20% coverage by Mo17 WGS reads and are therefore likely present in the B73 genome and absent from the Mo17 genome. B73.Mo17_CNV segments exhibit at least 80% coverage in the Mo17 WGS sequences and are likely examples of CNV. The remaining B73.Mo17 sequences (20% 80% coverage) are denoted as B73.Mo17_Int. (intermediate). As expected, the B73.Mo17_PAV segments have a greater signal difference between B73 and Mo17 than do the B73.Mo17_CNV segments (Table 3). The segments from the middle two distributions in Figure 5 represent DNA sequences that are present at the same copy number in B73 and Mo17. The segments in the distribution with a peak at log2(mo17/b73) = were classified as B73<Mo17_SNP while the segments in the distribution with a peak at log2(mo17/b73) = 0 were simply classified as B73<Mo17. An additional class, B73<Mo17_Int. (intermediate), includes DNA sequences that couldn t be definitively classified in either one of these two distributions but had a cumulative estimated probability of membership in these two classes that was greater than 0.8 for these two classes. In general, the B73<Mo17_SNP, B73<Mo17_Int. and B73<Mo17 segments have similar characteristics (Table 3). The B73<Mo17_SNP and B73<Mo17_Int. DNA segments have slightly higher levels of signal in B73 than in Mo17 and are likely the result of the inclusion within the segment PLoS Genetics 8 November 2009 Volume 5 Issue 11 e

9 Figure 5. Distribution of average log2(m/b) for DNA segments. The distribution of the average log 2 (M/B) across segments was modeled using a four-component normal mixture model [68]. The EM algorithm [69] was used to estimate the mixing proportion, the mean, and the variance associated with each of the four normal component densities, corresponding to four segment classes (labeled with arrows). Class membership probabilities for each segment were computed using the EM estimates. doi: /journal.pgen g005 of several polymorphic probes. This is supported by the slightly higher rates of polymorphic probes or molecular markers within B73<Mo17_SNP segments than B73<Mo17 segments (Figure S12B; Table 4). Characterization of CNVs and PAVs The segment analysis identified a large number of DNA segments with variation in B73 and Mo17. There are 60 stringent Mo17.B73_CNV segments that are predicted to occur in more copies in Mo17 than in B73. There are 3,681 stringent B73.Mo17 segments including 356 segments that are CNVs and another 1,783 PAV segments that are putative examples of genome content variation. Several different approaches were used to validate the structural variants identified in this study. The 1,783 stringent B73. Mo17_PAV segments are predicted to be present in the B73 genome but absent from the Mo17 genome. Over 20,000 primer pairs were designed (usually from B73 sequences) and used to perform amplification from B73 and Mo17 genomic DNA. The numbers of primer pairs within each class of segment were determined (Table 4). The proportion of primers that were polymorphic between B73 and Mo17 is much higher for B73.Mo17 segments. The fact that the majority of the B73.Mo17_PAV polymorphic primer pairs only amplify a band in B73 and not in Mo17 confirms that many of these segments are present in the B73 genome and missing in the Mo17 genome. The 356 B73.Mo17_CNV segments are predicted to occur in more copies in the B73 genome than in the Mo17 genome. BLAST searches of 100 stringent B73.Mo17_CNV sequences against the B73 genome find that 92% are present in at least two copies. In comparison, only 7% of the B73<Mo17 segments have multiple matches within the B73 RefGen_v1. A large proportion (55%) of the B73.Mo17_CNV segments include tandem duplications. This suggests that there are a number of haplotype-specific tandem duplications. The 60 Mo17.B73 segments are predicted to occur in more copies in Mo17 than B73. qpcr was used to assess the copy number for 12 of the 60 Mo17.B73 segments in B73 and Mo17 genomic DNA (Table S2). The increase in copy number in Mo17 relative to B73 was validated for 11 of the 12 segments tested. In three of the cases tested qpcr provides evidence for greater copy number differences than the CGH data, suggesting that the CGH copy number estimates may be conservative. In combination, these approaches provide validation for the three major classes of CNV segments. The CGH analysis identified hundreds of candidate CNVs and thousands of PAVs. These sequences are spread throughout all ten of the maize chromosomes (Figure 6). The filtered set of 32,540 high quality gene annotations from the MGSP were compared to the stringent DNA segments (Table 5). Using fairly strict criteria (80% of gene sequence is contained within segment sequence) we find approximately 80% of the genes are located within the stringent segments. Almost 600 of these genes are located in the B73.Mo17 or Mo17.B73 segments, including 180 gene models located within B73.Mo17_PAV segments and another 50 gene models located within CNV segments. These genes within the PAV and CNV type segments include many different annotations and are not enriched for putative uncharacterized proteins. Interestingly, the proportion of genes with a paralog (defined as.85% identity and coverage) is higher for the B73.Mo17 segments (Table 5). A portion of the genes within these segments are queried by the existing 17K maize Affymetrix microarray. The proportion of genes that are differentially expressed (in B73 and Mo17 seedling tissue; data from [11]) is much higher for B73.Mo17 and Mo17.B73 segments than for B73<Mo17 classes (Table 5). As expected, the B73.Mo17 segments are enriched for genes with higher expression in B73 than in Mo17 and the Mo17.B73 segments are enriched for genes with higher expression in Mo17. Discussion There is wide-spread appreciation for the high level of diversity within the maize species [8,12,31]. This diversity is critical for breeders to select for novel agronomic traits and is important for heterosis. The availability of a reference genome sequence for one inbred (B73; Schnable et al., in press) coupled to CGH technology, has provided the opportunity to study the structural variation present between two inbred lines, B73 and Mo17. The extensive structural variation between B73 and Mo17 includes copy number variation (CNV) and present-absent variation (PAV). However, despite the high level of variation genome-wide, there are many regions of the genome that have little or no variation. We will discuss the types of variation observed throughout the maize genome as well as the implications of this variation for phenotypic diversity and heterosis. Low diversity regions in a highly polymorphic species It is tempting to assume that all genomic regions are different in these two lines. However, by assessing the levels of variation along the B73 RefGen_v1 it quickly becomes obvious that this variation is not randomly distributed. We identified a number of large regions (.1 Mb) that have little or no variation. The fact that these regions co-localized with chromosomal regions that lack genetic markers that exhibit polymorphisms in the Intermated B73xMo17 (IBM) mapping population [14,36] demonstrates that PLoS Genetics 9 November 2009 Volume 5 Issue 11 e

10 Table 3. Characteristics of each category of DNA segment. All Segments Type n Total # probes Avg. log2(m/b) Avg. length (bp) Avg. probe # Avg. % repetitive DNA Avg. # genes Avg. gene density (per kb) Avg. % Mo17 coverage B73.Mo17_PAV 4,222 52, , % % B73.Mo17_Int. 3,224 49, , % % B73.Mo17_CNV , , % % B73<Mo17_SNP 15, , , % % B73<Mo17_Int. 9, , , % % B73<Mo17 15, , , % % Mo17.B73_CNV , , % % Unclassified 3, , , % % Stringent segments a Type n Total # probes Avg. log2(m/b) Avg. length (bp) Avg. probe # Avg. % repetitive DNA Avg. # genes Avg. gene density (per kb) Avg. % Mo17 coverage B73.Mo17_PAV 1,783 33, , % % B73.Mo17_Int. 1,542 33, , % % B73.Mo17_CNV 356 6, , % % B73<Mo17_SNP 13, , , % % B73<Mo17_Int. 7, , , % % B73<Mo17 12, , , % % Mo17.B73_CNV 60 1, , % % Unclassified 2, , , % % a minimum of 10 probes, 2000 bp and 2 fold change between B73 and Mo17 for B73.Mo17 and Mo17.B73 classes. doi: /journal.pgen t003 PLoS Genetics 10 November 2009 Volume 5 Issue 11 e

11 Table 4. # Polymorphic markers within segments. # Primers % polymorphic % PA a B73.Mo17_PAV % 83% B73.Mo17_Int % 65% B73.Mo17_CNV 36 53% 66% B73<Mo17_SNP 7,946 28% 35% B73<Mo17_Int. 4,484 17% 29% B73<Mo17 5,756 6% 27% Mo17.B73_CNV 9 0% 0% Unclassified % 46% Total 19,613 20% 37% a The proportion of polymorphic primers that amplify a product in B73 but not in Mo17. doi: /journal.pgen t004 these marker-depleted regions are simply due to low/no diversity between the two parents of the mapping population. In general, almost all of the large low diversity regions occur within regions of low recombination frequency. This could contribute to the inheritance of large chromosomal regions that are identical-bydescent. The centromeres of most chromosomes are located within or at one end of low-diversity regions. Several groups have assessed molecular diversity in maize populations in studies designed to identify the targets of domestication and/or selection in maize [37 39]. We noticed that two of the genes known to have been targets of selection or domestication, y1 [40] and tb1 [41] are located within large low diversity regions. In addition, a 1 Mb region on chromosome 10 with evidence for a selective sweep [42] also occurs in a region with low levels of structural variation. The chromosomal positions of 42 genes with evidence for selective sweeps [37 38] were compared with the level of structural variation. Nearly half of these genes (20/42) were located within large blocks of low diversity identified in this study. This finding is consistent with the hypothesis that many putative selection genes identified by virtue of their limited sequence diversity were not actual targets of selection but simply happen to be located within large blocks of reduced variation, some of which may have arisen via selective sweeps. Frequency of structural variation in maize genome We have identified thousands of examples of structural variation between the B73 and Mo17 genomes. The term structural variant is used to describe both sequences that are present in both individuals but have different copy numbers (copy number Figure 6. Distribution of CNV and PAV throughout the maize genome. The position and average log 2 (M/B) for each Mo17.B73_CNV, B73.Mo17_CNV, B73.Mo17_I and B73.Mo17_PA segment is plotted for all 10 maize chromosomes. The color-coding indicates the type of segment. The positions of the centromeres [72] are indicated by the black boxes. doi: /journal.pgen g006 PLoS Genetics 11 November 2009 Volume 5 Issue 11 e

12 Table 5. Genes in stringent segments. Type n Avg. length (bp) # FGS genes a Genes per segment % of genes with paralog # Affymetrix genes b %DE genes c %DE with B.M d B73.Mo17_PAV 1,783 10, % 36 69% 92% B73.Mo17_Int. 1,542 12, % 68 56% 92% B73.Mo17_CNV 356 9, % 7 71% 100% B73<Mo17_SNP 13,183 41, % 3,347 24% 49% B73<Mo17_Int. 7,526 42, % 1,806 18% 48% B73<Mo17 12,720 49, % 2,306 12% 33% Mo17.B73_CNV 60 6, % 7 100% 0% Unclassified 2,661 32, % % 66% a The FGS refers to the filtered gene set of high-quality annotations produced by the MGSP. b Number of genes on Affy platform that are expressed in B73 or Mo17 seedling tissue. c Percent of genes that are differentially expressed (q,0.05). d Percent of differentially expressed genes that are expressed at higher levels in B73 than in Mo17. doi: /journal.pgen t005 variants; CNV) and sequences that are present in one individual but absent in another (presence-absence variant; PAV). The unknown order and orientation of intra-bac DNA sequence contigs will potentially lead to an over-estimation of the number of structural variation events by splitting some events into two different segments. However, it will also lead to an underestimation of the number of events due to the fact that some smaller structural variant events which will not have enough sequence or probes on either side of a contig border to be called. We found that the 3,789 stringent B73.Mo17 or Mo17. B73 structural variants represent a minimum of 2,056 unique events (must be separated from nearest structural variant by. 200,000 bp). Differing probe densities, algorithms and statistical criteria complicate comparisons of rates of structural variation among organisms [5]. However, it is quite clear that the maize genome has a high rate of structural variation compared with other species. In the human, rat, dog, mouse, macaque and chimpanzee genomes the average number of CNVs between two individuals is between 15 and 75 [43 48]. A high resolution study of eight human genomes [49] revealed only several hundred insertions and deletions, including CNV and PAV sequences, in the comparison of any two human genomes. In contrast, even after very stringent filtering we identified.3,700 CNV or PAV sequences that represent at least 2,000 events between these two maize genomes. This likely represents a very conservative estimate of the true number of CNV and PAV events in the maize genome. Previous analyses of BAC libraries have also found significant differences in genome content [25]. This high level of structural variation with frequent changes in genome content is reminiscent of the high level of variation observed in the E. coli genome [50 51], but is without precedent among higher eukaryotes. As the levels of structural variation are assessed for other species it will be interesting to determine whether maize has an unusually high level of variation relative to other plants and animals. Mechanisms and impact of CNV This study identified.400 putative CNVs between B73 and Mo17. A combination of genome homology searches and qpcr suggests that many of these sequences represent actual CNVs. There is evidence that these CNV can be the result of tandem duplications or duplications dispersed throughout the genome. There are a large number of tandem duplications in the maize genome [32]; some of these are NIPs [29] and 5% of these exhibit CNV in our data (Y. Kai, P. Schnable, unpublished data). This suggests that some of the differences in copy number between B73 and Mo17 are due to haplotype-specific tandem duplication events. Alternatively, some of the CNVs may be caused by duplication to non-allelic positions. Differences in genome content at the bz1 locus [23] actually represent a CNV event in which both genotypes have copies of a sequence at a shared position and one of the genotypes has one or more additional copies at a non-shared location [26]. Many of these CNV are likely the result of Helitronmediated movement of gene fragments [12,25 26]. There is also evidence that tandemly duplicated gene families such as zeins [52] or disease resistance genes [53] exhibit differences in copy number for different haplotypes, possibly as the result of recombinationbased mechanisms as have been analyzed in detailed by Yandeau- Nelson et al. [54]. Previous studies have suggested a high rate of near identical paralogs (NIPs) in the maize genome [29]. It is likely that the formation (or removal) of NIPs may have been haplotype specific. There are 50 genes from the MGSP s filtered gene set within our stringently called CNVs. By relaxing our criteria only slightly, we identify CNV segments that contain 558 genes (Table S3). As most gene fragments were successfully removed from the MGSP s filtered gene set [32], these genes within CNV are likely to include functional genes. Because 12 of the 14 CNV genes that were assayed exhibit variable gene expression levels in B73 and Mo17 seedlings, these genic CNV may contribute to phenotypic diversity. Many maize alleles that are known to be epigenetically regulated exhibit allelic variation for tandem repeats. There is allelic variation in the tandem duplication of coding regions at the p1, c2, and r1 loci that exhibit epigenetic regulation [54 56]. In addition, there is evidence for allelic variation in the copy number of a non-genic sequence,100 kb upstream of the b1 gene that controls expression and paramutation [57]. It is possible that the high rates of CNV in both genes and other low-copy sequences contribute to high rates of expression variation and epigenetic regulation in maize. Widespread genome content differences In addition to the hundreds of CNV detected between B73 and Mo17 we also noted thousands of sequences that account for over 20 Mb of DNA that are present in the B73 genome and absent in PLoS Genetics 12 November 2009 Volume 5 Issue 11 e

Heritable Epigenetic Variation among Maize Inbreds

Heritable Epigenetic Variation among Maize Inbreds Heritable Epigenetic Variation among Maize Inbreds Steve R. Eichten 1., Ruth A. Swanson-Wagner 1., James C. Schnable 2, Amanda J. Waters 1, Peter J. Hermanson 1, Sanzhen Liu 3, Cheng-Ting Yeh 3, Yi Jia

More information

MAIZE demonstrates a large amount of intraspecific

MAIZE demonstrates a large amount of intraspecific Copyright Ó 2006 by the Genetics Society of America DOI: 10.1534/genetics.106.060699 Cis-transcriptional Variation in Maize Inbred Lines B73 and Mo17 Leads to Additive Expression Patterns in the F 1 Hybrid

More information

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Structural variation Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Genetic variation How much genetic variation is there between individuals? What type of variants

More information

Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome

Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome Finishing Fosmid DMAC-27a of the Drosophila mojavensis third chromosome Ruth Howe Bio 434W 27 February 2010 Abstract The fourth or dot chromosome of Drosophila species is composed primarily of highly condensed,

More information

Genomes summary. Bacterial genome sizes

Genomes summary. Bacterial genome sizes Genomes summary 1. >930 bacterial genomes sequenced. 2. Circular. Genes densely packed. 3. 2-10 Mbases, 470-7,000 genes 4. Genomes of >200 eukaryotes (45 higher ) sequenced. 5. Linear chromosomes 6. On

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids.

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Number and length distributions of the inferred fosmids. Supplementary Figure 1 Number and length distributions of the inferred fosmids. Fosmid were inferred by mapping each pool s sequence reads to hg19. We retained only those reads that mapped to within a

More information

Mutations during meiosis and germ line division lead to genetic variation between individuals

Mutations during meiosis and germ line division lead to genetic variation between individuals Mutations during meiosis and germ line division lead to genetic variation between individuals Types of mutations: point mutations indels (insertion/deletion) copy number variation structural rearrangements

More information

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Why can GBS be complicated? Tools for filtering, error correction and imputation. Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower

More information

CHAPTER 21 LECTURE SLIDES

CHAPTER 21 LECTURE SLIDES CHAPTER 21 LECTURE SLIDES Prepared by Brenda Leady University of Toledo To run the animations you must be in Slideshow View. Use the buttons on the animation to play, pause, and turn audio/text on or off.

More information

SNPs - GWAS - eqtls. Sebastian Schmeier

SNPs - GWAS - eqtls. Sebastian Schmeier SNPs - GWAS - eqtls s.schmeier@gmail.com http://sschmeier.github.io/bioinf-workshop/ 17.08.2015 Overview Single nucleotide polymorphism (refresh) SNPs effect on genes (refresh) Genome-wide association

More information

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1 Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the

More information

Mate-pair library data improves genome assembly

Mate-pair library data improves genome assembly De Novo Sequencing on the Ion Torrent PGM APPLICATION NOTE Mate-pair library data improves genome assembly Highly accurate PGM data allows for de Novo Sequencing and Assembly For a draft assembly, generate

More information

High-Resolution Oligonucleotide- Based acgh Analysis of Single Cells in Under 24 Hours

High-Resolution Oligonucleotide- Based acgh Analysis of Single Cells in Under 24 Hours High-Resolution Oligonucleotide- Based acgh Analysis of Single Cells in Under 24 Hours Application Note Authors Paula Costa and Anniek De Witte Agilent Technologies, Inc. Santa Clara, CA USA Abstract As

More information

3I03 - Eukaryotic Genetics Repetitive DNA

3I03 - Eukaryotic Genetics Repetitive DNA Repetitive DNA Satellite DNA Minisatellite DNA Microsatellite DNA Transposable elements LINES, SINES and other retrosequences High copy number genes (e.g. ribosomal genes, histone genes) Multifamily member

More information

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping

BENG 183 Trey Ideker. Genome Assembly and Physical Mapping BENG 183 Trey Ideker Genome Assembly and Physical Mapping Reasons for sequencing Complete genome sequencing!!! Resequencing (Confirmatory) E.g., short regions containing single nucleotide polymorphisms

More information

Repeat subtraction-mediated sequence capture from a complex genome

Repeat subtraction-mediated sequence capture from a complex genome The Plant Journal (2010) 62, 898 909 doi: 10.1111/j.1365-313X.2010.04196.x TECHNICAL ADVANCE Repeat subtraction-mediated sequence capture from a complex genome Yan Fu 1,2, Nathan M. Springer 3, Daniel

More information

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population.

Supplementary Text. eqtl mapping in the Bay x Sha recombinant population. Supplementary Text eqtl mapping in the Bay x Sha recombinant population. Expression levels for 24,576 traits (Gene-specific Sequence Tags: GSTs, CATMA array version 2) was measured in RNA extracted from

More information

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition

Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition RESEARCH ARTICLE Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition Moses M. Muraya 1,2, Thomas Schmutzer 1 *, Chris Ulpinnis

More information

MICROSATELLITE MARKER AND ITS UTILITY

MICROSATELLITE MARKER AND ITS UTILITY Your full article ( between 500 to 5000 words) - - Do check for grammatical errors or spelling mistakes MICROSATELLITE MARKER AND ITS UTILITY 1 Prasenjit, D., 2 Anirudha, S. K. and 3 Mallar, N.K. 1,2 M.Sc.(Agri.),

More information

Book chapter appears in:

Book chapter appears in: Mass casualty identification through DNA analysis: overview, problems and pitfalls Mark W. Perlin, PhD, MD, PhD Cybergenetics, Pittsburgh, PA 29 August 2007 2007 Cybergenetics Book chapter appears in:

More information

Lecture #1. Introduction to microarray technology

Lecture #1. Introduction to microarray technology Lecture #1 Introduction to microarray technology Outline General purpose Microarray assay concept Basic microarray experimental process cdna/two channel arrays Oligonucleotide arrays Exon arrays Comparing

More information

Applications and Uses. (adapted from Roche RealTime PCR Application Manual)

Applications and Uses. (adapted from Roche RealTime PCR Application Manual) What Can You Do With qpcr? Applications and Uses (adapted from Roche RealTime PCR Application Manual) What is qpcr? Real time PCR also known as quantitative PCR (qpcr) measures PCR amplification as it

More information

Quality Measures for CytoChip Microarrays

Quality Measures for CytoChip Microarrays Quality Measures for CytoChip Microarrays How to evaluate CytoChip Oligo data quality in BlueFuse Multi software. Data quality is one of the most important aspects of any microarray experiment. This technical

More information

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome

3. human genomics clone genes associated with genetic disorders. 4. many projects generate ordered clones that cover genome Lectures 30 and 31 Genome analysis I. Genome analysis A. two general areas 1. structural 2. functional B. genome projects a status report 1. 1 st sequenced: several viral genomes 2. mitochondria and chloroplasts

More information

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene

More information

REVIEWS. Structural variation in the human genome

REVIEWS. Structural variation in the human genome REVIEWS Structural variation in the human genome Lars Feuk, Andrew R. Carson and Stephen W. Scherer Abstract The first wave of information from the analysis of the human genome revealed SNPs to be the

More information

4.1. Genetics as a Tool in Anthropology

4.1. Genetics as a Tool in Anthropology 4.1. Genetics as a Tool in Anthropology Each biological system and every human being is defined by its genetic material. The genetic material is stored in the cells of the body, mainly in the nucleus of

More information

Quality Control Assessment in Genotyping Console

Quality Control Assessment in Genotyping Console Quality Control Assessment in Genotyping Console Introduction Prior to the release of Genotyping Console (GTC) 2.1, quality control (QC) assessment of the SNP Array 6.0 assay was performed using the Dynamic

More information

Genome research in eukaryotes

Genome research in eukaryotes Functional Genomics Genome and EST sequencing can tell us how many POTENTIAL genes are present in the genome Proteomics can tell us about proteins and their interactions The goal of functional genomics

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

Gene Expression Technology

Gene Expression Technology Gene Expression Technology Bing Zhang Department of Biomedical Informatics Vanderbilt University bing.zhang@vanderbilt.edu Gene expression Gene expression is the process by which information from a gene

More information

Genome Sequencing-- Strategies

Genome Sequencing-- Strategies Genome Sequencing-- Strategies Bio 4342 Spring 04 What is a genome? A genome can be defined as the entire DNA content of each nucleated cell in an organism Each organism has one or more chromosomes that

More information

Contact us for more information and a quotation

Contact us for more information and a quotation GenePool Information Sheet #1 Installed Sequencing Technologies in the GenePool The GenePool offers sequencing service on three platforms: Sanger (dideoxy) sequencing on ABI 3730 instruments Illumina SOLEXA

More information

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010 Traditional QTL approach Uses standard bi-parental mapping populations o F2 or RI These have a limited number of

More information

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material

Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Supplementary Material Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions Joshua N. Burton 1, Andrew Adey 1, Rupali P. Patwardhan 1, Ruolan Qiu 1, Jacob O. Kitzman 1, Jay Shendure 1 1 Department

More information

Introduction to Bioinformatics and Gene Expression Technology

Introduction to Bioinformatics and Gene Expression Technology Vocabulary Introduction to Bioinformatics and Gene Expression Technology Utah State University Spring 2014 STAT 5570: Statistical Bioinformatics Notes 1.1 Gene: Genetics: Genome: Genomics: hereditary DNA

More information

Genomic analysis of natural variation for seed and plant size in maize Dr. Shawn Kaepler University of Wisconsin

Genomic analysis of natural variation for seed and plant size in maize Dr. Shawn Kaepler University of Wisconsin Genomic analysis of natural variation for seed and plant size in maize Dr. Shawn Kaepler University of Wisconsin Abstract: Crop productivity is a function of basic component traits. Grain yield in maize

More information

Annotating Fosmid 14p24 of D. Virilis chromosome 4

Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo 1 Annotating Fosmid 14p24 of D. Virilis chromosome 4 Lo, Louis April 20, 2006 Annotation Report Introduction In the first half of Research Explorations in Genomics I finished a 38kb fragment of chromosome

More information

Lecture 2: Biology Basics Continued

Lecture 2: Biology Basics Continued Lecture 2: Biology Basics Continued Central Dogma DNA: The Code of Life The structure and the four genomic letters code for all living organisms Adenine, Guanine, Thymine, and Cytosine which pair A-T and

More information

Multiple choice questions (numbers in brackets indicate the number of correct answers)

Multiple choice questions (numbers in brackets indicate the number of correct answers) 1 Multiple choice questions (numbers in brackets indicate the number of correct answers) February 1, 2013 1. Ribose is found in Nucleic acids Proteins Lipids RNA DNA (2) 2. Most RNA in cells is transfer

More information

Molecular Biology: DNA sequencing

Molecular Biology: DNA sequencing Molecular Biology: DNA sequencing Author: Prof Marinda Oosthuizen Licensed under a Creative Commons Attribution license. SEQUENCING OF LARGE TEMPLATES As we have seen, we can obtain up to 800 nucleotides

More information

Chapter 4 Gene Linkage and Genetic Mapping

Chapter 4 Gene Linkage and Genetic Mapping Chapter 4 Gene Linkage and Genetic Mapping 1 Important Definitions Locus = physical location of a gene on a chromosome Homologous pairs of chromosomes often contain alternative forms of a given gene =

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs

By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs (3) QTL and GWAS methods By the end of this lecture you should be able to explain: Some of the principles underlying the statistical analysis of QTLs Under what conditions particular methods are suitable

More information

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays

Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays Gene Expression Profiling and Validation Using Agilent SurePrint G3 Gene Expression Arrays Application Note Authors Bahram Arezi, Nilanjan Guha and Anne Bergstrom Lucas Agilent Technologies Inc. Santa

More information

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score

Midterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score Midterm 1 Results 10 Midterm 1 Akey/ Fields Median - 69 8 Number of Students 6 4 2 0 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 Exam Score Quick review of where we left off Parental type: the

More information

Genetic Identity. Steve Harris SPASH - Biotechnology

Genetic Identity. Steve Harris SPASH - Biotechnology Genetic Identity Steve Harris SPASH - Biotechnology Comparison of Organisms ORGANISM GENES BASE PAIRS Lambda Phage 40 50,000 E.coli 400 5,000,000 Yeast 13,000 15,000,000 Human 20,000 3,000,000,000 (3 billion)

More information

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping

Sept 2. Structure and Organization of Genomes. Today: Genetic and Physical Mapping. Sept 9. Forward and Reverse Genetics. Genetic and Physical Mapping Sept 2. Structure and Organization of Genomes Today: Genetic and Physical Mapping Assignments: Gibson & Muse, pp.4-10 Brown, pp. 126-160 Olson et al., Science 245: 1434 New homework:due, before class,

More information

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini

GENE MAPPING. Genetica per Scienze Naturali a.a prof S. Presciuttini GENE MAPPING Questo documento è pubblicato sotto licenza Creative Commons Attribuzione Non commerciale Condividi allo stesso modo http://creativecommons.org/licenses/by-nc-sa/2.5/deed.it Genetic mapping

More information

Axiom mydesign Custom Array design guide for human genotyping applications

Axiom mydesign Custom Array design guide for human genotyping applications TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required

More information

LightScanner Hi-Res Melting Comparison of Six Master Mixes for Scanning and Small Amplicon and LunaProbes Genotyping

LightScanner Hi-Res Melting Comparison of Six Master Mixes for Scanning and Small Amplicon and LunaProbes Genotyping LightScanner Hi-Res Melting Comparison of Six Master Mixes for Scanning and Small Amplicon and LunaProbes Genotyping Introduction Commercial master mixes are convenient and cost-effective solutions for

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

Bioinformatics Advice on Experimental Design

Bioinformatics Advice on Experimental Design Bioinformatics Advice on Experimental Design Where do I start? Please refer to the following guide to better plan your experiments for good statistical analysis, best suited for your research needs. Statistics

More information

Human genetic variation

Human genetic variation Human genetic variation CHEW Fook Tim Human Genetic Variation Variants contribute to rare and common diseases Variants can be used to trace human origins Human Genetic Variation What types of variants

More information

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé

RADSeq Data Analysis. Through STACKS on Galaxy. Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RADSeq Data Analysis Through STACKS on Galaxy Yvan Le Bras Anthony Bretaudeau Cyril Monjeaud Gildas Le Corguillé RAD sequencing: next-generation tools for an old problem INTRODUCTION source: Karim Gharbi

More information

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions

Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions Association mapping of Sclerotinia stalk rot resistance in domesticated sunflower plant introductions Zahirul Talukder 1, Brent Hulke 2, Lili Qi 2, and Thomas Gulya 2 1 Department of Plant Sciences, NDSU

More information

Alternative Cleavage and Polyadenylation of RNA

Alternative Cleavage and Polyadenylation of RNA Developmental Cell 18 Supplemental Information The Spen Family Protein FPA Controls Alternative Cleavage and Polyadenylation of RNA Csaba Hornyik, Lionel C. Terzi, and Gordon G. Simpson Figure S1, related

More information

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer:

Sequence Variations. Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms. NCBI SNP Primer: Sequence Variations Baxevanis and Ouellette, Chapter 7 - Sequence Polymorphisms NCBI SNP Primer: http://www.ncbi.nlm.nih.gov/about/primer/snps.html Overview Mutation and Alleles Linkage Genetic variation

More information

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013

Introduction to RNA-Seq. David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Introduction to RNA-Seq David Wood Winter School in Mathematics and Computational Biology July 1, 2013 Abundance RNA is... Diverse Dynamic Central DNA rrna Epigenetics trna RNA mrna Time Protein Abundance

More information

Roche Molecular Biochemicals Technical Note No. LC 10/2000

Roche Molecular Biochemicals Technical Note No. LC 10/2000 Roche Molecular Biochemicals Technical Note No. LC 10/2000 LightCycler Overview of LightCycler Quantification Methods 1. General Introduction Introduction Content Definitions This Technical Note will introduce

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

BIOLOGY Dr.Locke Lecture# 27 An Introduction to Polymerase Chain Reaction (PCR)

BIOLOGY Dr.Locke Lecture# 27 An Introduction to Polymerase Chain Reaction (PCR) BIOLOGY 207 - Dr.Locke Lecture# 27 An Introduction to Polymerase Chain Reaction (PCR) Required readings and problems: Reading: Open Genetics, Chapter 8.1 Problems: Chapter 8 Optional Griffiths (2008) 9

More information

Genetic Engineering & Recombinant DNA

Genetic Engineering & Recombinant DNA Genetic Engineering & Recombinant DNA Chapter 10 Copyright The McGraw-Hill Companies, Inc) Permission required for reproduction or display. Applications of Genetic Engineering Basic science vs. Applied

More information

7 Gene Isolation and Analysis of Multiple

7 Gene Isolation and Analysis of Multiple Genetic Techniques for Biological Research Corinne A. Michels Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-89921-6 (Hardback); 0-470-84662-3 (Electronic) 7 Gene Isolation and Analysis of Multiple

More information

Mapping and Mapping Populations

Mapping and Mapping Populations Mapping and Mapping Populations Types of mapping populations F 2 o Two F 1 individuals are intermated Backcross o Cross of a recurrent parent to a F 1 Recombinant Inbred Lines (RILs; F 2 -derived lines)

More information

Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip

Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip Validation Study of FUJIFILM QuickGene System for Affymetrix GeneChip Reproducibility of Extraction of Genomic DNA from Whole Blood samples in EDTA using FUJIFILM membrane technology on the QuickGene-810

More information

Complementary Technologies for Precision Genetic Analysis

Complementary Technologies for Precision Genetic Analysis Complementary NGS, CGH and Workflow Featured Publication Zhu, J. et al. Duplication of C7orf58, WNT16 and FAM3C in an obese female with a t(7;22)(q32.1;q11.2) chromosomal translocation and clinical features

More information

C. Incorrect! Second Law: Law of Independent Assortment - Genes for different traits sort independently of one another in the formation of gametes.

C. Incorrect! Second Law: Law of Independent Assortment - Genes for different traits sort independently of one another in the formation of gametes. OAT Biology - Problem Drill 20: Chromosomes and Genetic Technology Question No. 1 of 10 Instructions: (1) Read the problem and answer choices carefully, (2) Work the problems on paper as needed, (3) Pick

More information

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques)

Recent technology allow production of microarrays composed of 70-mers (essentially a hybrid of the two techniques) Microarrays and Transcript Profiling Gene expression patterns are traditionally studied using Northern blots (DNA-RNA hybridization assays). This approach involves separation of total or polya + RNA on

More information

Using mutants to clone genes

Using mutants to clone genes Using mutants to clone genes Objectives 1. What is positional cloning? 2. What is insertional tagging? 3. How can one confirm that the gene cloned is the same one that is mutated to give the phenotype

More information

TaqPath ProAmp Master Mixes

TaqPath ProAmp Master Mixes PRODUCT BULLETIN es es Applied Biosystems TaqPath ProAmp Master Mixes are versatile master mixes developed for high-throughput genotyping and copy number variation (CNV) analysis protocols that require

More information

SUPPLEMENTAL MATERIALS

SUPPLEMENTAL MATERIALS SUPPLEMENL MERILS Eh-seq: RISPR epitope tagging hip-seq of DN-binding proteins Daniel Savic, E. hristopher Partridge, Kimberly M. Newberry, Sophia. Smith, Sarah K. Meadows, rian S. Roberts, Mark Mackiewicz,

More information

High Cross-Platform Genotyping Concordance of Axiom High-Density Microarrays and Eureka Low-Density Targeted NGS Assays

High Cross-Platform Genotyping Concordance of Axiom High-Density Microarrays and Eureka Low-Density Targeted NGS Assays High Cross-Platform Genotyping Concordance of Axiom High-Density Microarrays and Eureka Low-Density Targeted NGS Assays Ali Pirani and Mohini A Patil ISAG July 2017 The world leader in serving science

More information

Linking Genetic Variation to Important Phenotypes

Linking Genetic Variation to Important Phenotypes Linking Genetic Variation to Important Phenotypes BMI/CS 776 www.biostat.wisc.edu/bmi776/ Spring 2018 Anthony Gitter gitter@biostat.wisc.edu These slides, excluding third-party material, are licensed under

More information

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013)

Genome annotation. Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation Erwin Datema (2011) Sandra Smit (2012, 2013) Genome annotation AGACAAAGATCCGCTAAATTAAATCTGGACTTCACATATTGAAGTGATATCACACGTTTCTCTAAT AATCTCCTCACAATATTATGTTTGGGATGAACTTGTCGTGATTTGCCATTGTAGCAATCACTTGAA

More information

Nature Methods: doi: /nmeth.4396

Nature Methods: doi: /nmeth.4396 Supplementary Figure 1 Comparison of technical replicate consistency between and across the standard ATAC-seq method, DNase-seq, and Omni-ATAC. (a) Heatmap-based representation of ATAC-seq quality control

More information

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed.

This is a closed book, closed note exam. No calculators, phones or any electronic device are allowed. MCB 104 MIDTERM #2 October 23, 2013 ***IMPORTANT REMINDERS*** Print your name and ID# on every page of the exam. You will lose 0.5 point/page if you forget to do this. Name KEY If you need more space than

More information

RNA-Sequencing analysis

RNA-Sequencing analysis RNA-Sequencing analysis Markus Kreuz 25. 04. 2012 Institut für Medizinische Informatik, Statistik und Epidemiologie Content: Biological background Overview transcriptomics RNA-Seq RNA-Seq technology Challenges

More information

Agilent Genomic Workbench 7.0

Agilent Genomic Workbench 7.0 Agilent Genomic Workbench 7.0 Product Overview Guide Agilent Technologies Notices Agilent Technologies, Inc. 2012, 2015 No part of this manual may be reproduced in any form or by any means (including electronic

More information

Chapter 15 Gene Technologies and Human Applications

Chapter 15 Gene Technologies and Human Applications Chapter Outline Chapter 15 Gene Technologies and Human Applications Section 1: The Human Genome KEY IDEAS > Why is the Human Genome Project so important? > How do genomics and gene technologies affect

More information

Biology 201 (Genetics) Exam #3 120 points 20 November Read the question carefully before answering. Think before you write.

Biology 201 (Genetics) Exam #3 120 points 20 November Read the question carefully before answering. Think before you write. Name KEY Section Biology 201 (Genetics) Exam #3 120 points 20 November 2006 Read the question carefully before answering. Think before you write. You will have up to 50 minutes to take this exam. After

More information

Biotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for

Biotechnology. Chapter 20. Biology Eighth Edition Neil Campbell and Jane Reece. PowerPoint Lecture Presentations for Chapter 20 Biotechnology PowerPoint Lecture Presentations for Biology Eighth Edition Neil Campbell and Jane Reece Lectures by Chris Romero, updated by Erin Barley with contributions from Joan Sharp Copyright

More information

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology

Bioinformatics Support of Genome Sequencing Projects. Seminar in biology Bioinformatics Support of Genome Sequencing Projects Seminar in biology Introduction The Big Picture Biology reminder Enzyme for DNA manipulation DNA cloning DNA mapping Sequencing genomes Alignment of

More information

Lecture 8: Sequencing and SNP. Sept 15, 2006

Lecture 8: Sequencing and SNP. Sept 15, 2006 Lecture 8: Sequencing and SNP Sept 15, 2006 Announcements Random questioning during literature discussion sessions starts next week for real! Schedule changes Moved QTL lecture up Removed landscape genetics

More information

The Polymerase Chain Reaction. Chapter 6: Background

The Polymerase Chain Reaction. Chapter 6: Background The Polymerase Chain Reaction Chapter 6: Background Invention of PCR Kary Mullis Mile marker 46.58 in April of 1983 Pulled off the road and outlined a way to conduct DNA replication in a tube Worked for

More information

Cancer Genetics Solutions

Cancer Genetics Solutions Cancer Genetics Solutions Cancer Genetics Solutions Pushing the Boundaries in Cancer Genetics Cancer is a formidable foe that presents significant challenges. The complexity of this disease can be daunting

More information

Introduction to Bioinformatics and Gene Expression Technologies

Introduction to Bioinformatics and Gene Expression Technologies Introduction to Bioinformatics and Gene Expression Technologies Utah State University Fall 2017 Statistical Bioinformatics (Biomedical Big Data) Notes 1 1 Vocabulary Gene: hereditary DNA sequence at a

More information

Mutation entries in SMA databases Guidelines for national curators

Mutation entries in SMA databases Guidelines for national curators 1 Mutation entries in SMA databases Guidelines for national curators GENERAL CONSIDERATIONS Role of the curator(s) of a national database Molecular data can be collected by many different ways. There are

More information

UC Davis UC Davis Previously Published Works

UC Davis UC Davis Previously Published Works UC Davis UC Davis Previously Published Works Title Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome

More information

earray 5.0 Create your own Custom Microarray Design

earray 5.0 Create your own Custom Microarray Design earray 5.0 Create your own Custom Microarray Design http://earray.chem.agilent.com earray 5.x Overview Session Summary Session Summary Agilent Genomics Microarray Solution earray Functional Overview Gene

More information

Comparative Genomic Hybridization

Comparative Genomic Hybridization Comparative Genomic Hybridization Srikesh G. Arunajadai Division of Biostatistics University of California Berkeley PH 296 Presentation Fall 2002 December 9 th 2002 OUTLINE CGH Introduction Methodology,

More information

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H

Introduction to ChIP Seq data analyses. Acknowledgement: slides taken from Dr. H Introduction to ChIP Seq data analyses Acknowledgement: slides taken from Dr. H Wu @Emory ChIP seq: Chromatin ImmunoPrecipitation it ti + sequencing Same biological motivation as ChIP chip: measure specific

More information

Sequence Annotation & Designing Gene-specific qpcr Primers (computational)

Sequence Annotation & Designing Gene-specific qpcr Primers (computational) James Madison University From the SelectedWorks of Ray Enke Ph.D. Fall October 31, 2016 Sequence Annotation & Designing Gene-specific qpcr Primers (computational) Raymond A Enke This work is licensed under

More information

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es

Sequence assembly. Jose Blanca COMAV institute bioinf.comav.upv.es Sequence assembly Jose Blanca COMAV institute bioinf.comav.upv.es Sequencing project Unknown sequence { experimental evidence result read 1 read 4 read 2 read 5 read 3 read 6 read 7 Computational requirements

More information

Technical Review. Real time PCR

Technical Review. Real time PCR Technical Review Real time PCR Normal PCR: Analyze with agarose gel Normal PCR vs Real time PCR Real-time PCR, also known as quantitative PCR (qpcr) or kinetic PCR Key feature: Used to amplify and simultaneously

More information

Figure S1. Schematic representation of the winter VRN-H1 allele from cv. Strider (AY750993) with positions of markers genotyped in this study

Figure S1. Schematic representation of the winter VRN-H1 allele from cv. Strider (AY750993) with positions of markers genotyped in this study Figure S1. Schematic representation of the winter VRN-H1 allele from cv. Strider (AY750993) with positions of markers genotyped in this study indicated. Exons are denoted by black boxes. SNP1 (T-1,948/C),

More information

Functional Genomics in Plants

Functional Genomics in Plants Functional Genomics in Plants Jeffrey L Bennetzen, Purdue University, West Lafayette, Indiana, USA Functional genomics refers to a suite of genetic technologies that will contribute to a comprehensive

More information

Overview of Human Genetics

Overview of Human Genetics Overview of Human Genetics 1 Structure and function of nucleic acids. 2 Structure and composition of the human genome. 3 Mendelian genetics. Lander et al. (Nature, 2001) MAT 394 (ASU) Human Genetics Spring

More information