AMAP: A pipeline for whole-genome mutation detection in Arabidopsis thaliana
|
|
- Toby Fields
- 6 years ago
- Views:
Transcription
1 Genes Genet. Syst. (2016) 91, p Pipeline for detecting whole-genome mutations 229 : A pipeline for whole-genome mutation detection in Arabidopsis thaliana Kotaro Ishii 1, Yusuke Kazama 1, Tomonari Hirano 1,2, Michiaki Hamada 3, Yukiteru Ono 4, Mieko Yamada 1 and Tomoko Abe 1 * 1 RIKEN Nishina Center, 2-1, Hirosawa, Wako, Saitama , Japan 2 Faculty of Agriculture, University of Miyazaki, 1-1, Gakuenkibanadai-Nishi, Miyazaki , Japan 3 Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo , Japan 4 IMSBIO Co., Ltd., , Higashi-Ikebukuro, Toshima-ku, Tokyo , Japan (Received 3 December 2015, accepted 27 March 2016; J-STAGE Advance published date: 25 July 2016) Detection of mutations at the whole-genome level is now possible by the use of high-throughput sequencing. However, determining mutations is a time-consuming process due to the number of false positives provided by mutation-detecting programs. (automated mutation analysis pipeline) was developed to overcome this issue. integrates a set of well-validated programs for mapping (BWA), removal of potential PCR duplicates (Picard), realignment (GATK) and detection of mutations (SAMtools, GATK, Pindel, BreakDancer and CNVnator). Thus, all types of mutations such as base substitution, deletion, insertion, translocation and chromosomal rearrangement can be detected by. In addition, automatically distinguishes false positives by comparing lists of candidate mutations in sequenced mutants. We tested by inputting already analyzed read data derived from three individual Arabidopsis thaliana mutants and confirmed that all true mutations were included in the list of candidate mutations. The result showed that the number of false positives was reduced to 12% of that obtained in a previous analysis that lacked a process of reducing false positives. Thus, will accelerate not only the analysis of mutation induction by individual mutagens but also the process of forward genetics. Key words: Arabidopsis thaliana, heavy-ion beam, mutation detection, pipeline, whole-genome re-sequencing Whole-genome re-sequencing can now be performed in organisms whose genome sequencing has already been completed using high-throughput sequencing (HTS) technologies. Whole-genome re-sequencing enables the rapid identification of genes responsible for mutant traits in many model organisms, including yeast (Edwards and Gifford, 2012), zebrafish (Bowen et al., 2012; Obholzer et al., 2012), Caenorhabditis elegans (Minevich et al., 2012), Arabidopsis thaliana (Schneeberger et al., 2009; Ashelford et al., 2011; Austin et al., 2011; Uchida et al., 2011) and rice (Abe et al., 2012; Fekih et al., 2013). These tools have accelerated forward genetic studies to date. In forward genetic studies, appropriate mutagens need to be selected for obtaining the mutants of interest. Chemical mutagens such as ethyl methanesulfonate (EMS) are widely used for inducing mutations and the Edited by Koji Murai * Corresponding author. tomoabe@riken.jp DOI: above-described methods are suitable for detecting EMSinduced mutations, such as base substitution. On the other hand, various kinds of ionizing radiation, including fast-neutron and heavy-ion beam radiation, can be used as effective mutagens, and have traditionally been believed to induce diverse mutations, including base substitution, deletion, insertion and chromosomal rearrangement. These mutations can now be identified at the whole-genome level using HTS. In A. thaliana, fast-neutron-induced mutations were revealed to be mainly base substitutions and small deletions (Belfield et al., 2012). We have previously identified the mutation spectrum of the heavy-ion beam in A. thaliana as comprising base substitutions, deletions, insertions and chromosomal rearrangements (Hirano et al., 2015). The size of deletions increases with increasing value of the linear energy transfers (LETs) of heavy-ion beams (Kazama et al., 2011, 2013; Hirano et al., 2012). Whole-genome identification using HTS confirmed that base substitutions, deletions, insertions and chromosomal rearrangements
2 230 K. ISHII et al. were induced at the whole-genome level (Hirano et al., 2015). Thus, it is now possible to determine any type of mutation using HTS. However, for detection of whole-genome mutations, different programs need to be used for each target mutation. Base substitutions or small insertions/deletions are detected by SAMtools (Li et al., 2009) and GATK (McKenna et al., 2010), while large deletions or chromosomal rearrangements are detected by Pindel (Ye et al., 2009), BreakDancer (Chen et al., 2009) and CNVnator (Abyzov et al., 2011). An additional problem in mutation detection using HTS is that the lists of candidate mutations generated by these programs contain a number of false positives. The possible causes of false positives include mismapping of sequencing reads or SNPs between the original accession and the accession used in mutation induction. Belfield et al. (2012) and Hirano et al. (2015) confirmed all candidate mutations using a genome browser, but this is a very time-consuming process. In this study, we have developed a novel pipeline, (automated mutation analysis pipeline), for conducting an integrated set of mutation analyses, mapping, removal of potential PCR duplicates and detection of mutations, using several programs. In, false positives are automatically determined by searching repetitive near candidate mutations and by comparing lists of candidate mutations in sequenced mutants. We tested using HTS data that were previously analyzed by Hirano et al. (2015). HTS analysis using will allow the acceleration of forward genetics and gene function analysis. consists of Perl scripts and requires the following software: BWA (ver , Li and Durbin, 2009), Picard (ver , RepeatMasker (ver. open-4.0.5, org), SAMtools (ver ), GATK (ver ), Pindel (ver t), BreakDancer (ver ), CNVnator (ver. 0.3) and SnpEff (ver. 3.6, Cingolani et al., 2012). The A. thaliana reference genome sequence (in FASTA format), gene sets data (in GTF format) and variation data (in VCF format) are also required and are available in EnsemblPlants ( The workflow of is shown in Fig. 1. is designed to accept paired-end read data generated from the HiSeq sequencing system (Illumina, Cambridge, UK). When sequencing reads in the FASTQ format obtained from multiple mutants are input into, mapping by BWA and removal of PCR duplicates by Picard are automatically performed, similar to that reported by Hirano et al. (2015). In addition, conducts realignment of reads using GATK to refine the mapping of reads. detection is then automatically conducted using SAMtools, Pindel and BreakDancer. also detects short indels and SNPs by GATK, and copy number variants by CNVnator. in the mitochondrion or plastid are excluded. For results of GATK and SAMtools, known information about SNPs is added by SnpEff. The files of all mutants generated from each program are merged into a single file. When trying to remove false-positive mutations automatically, it is possible that false-negative mutations are also removed. Thus, does not delete the estimated false positives but only adds flags to them so that they can easily be distinguished in the files. In the SAMtools and GATK s, mutation whose positions are covered by less than five or more than 1,000 reads are marked as false positives. In the GATK, SNP that failed to pass the filter QUAL < 30.0 QD < 5.0 and indel that failed to pass the filter QUAL < 10.0 or MQ0 4 && ((MQ0 / (1.0 * DP)) > 0.1 are also marked as false positives. In the CNVnator, mutation with a depth (average number of covered reads at both ends of the mutation region) of less than five or more than 1,000 are marked as false positives. In the Pindel and Break- Dancer s, mutation with a depth of less than five or more than 1,000, or those in which the ratio of the number of reads supporting the mutation to the depth is less than 0.1, are marked as false positives. commonly detected in at least two mutants are evaluated by as false positives stemming from preexistent polymorphisms, although works properly with a single mutant input without this function. also considers mutation in or around ( ±10 bp) repetitive that are detected by RepeatMasker as false positives. is available on GitHub ( github.com/ion-beam-breeding/). To test, the sequencing reads obtained from three mutants (Hirano et al., 2015) isolated after Ar-ion irradiation (50 Gy, LET = 290 kev μm) were input. In the test, to avoid different results due to differences in the versions of the programs used, the same versions of BWA (ver ), Picard (ver. 1.55) and SAMtools (0.1.16) as used in Hirano et al. (2015) were applied. The GATK and CNVnator s were confirmed by Integrative Genomics Viewer (IGV; ver. 2.3, Robinson et al., 2011). Read files from the three Ar-ion-induced mutants that were previously analyzed by Hirano et al. (2015) were reanalyzed using. The resulting s generated by are shown in Tables 1 and 2. In the previous study, 16,521, 8,927 and 3,626 mutation were detected by SAMtools, Pindel and BreakDancer, respectively, in an average of three mutants. However, only 149, 17 and 35 mutations that were detected by SAMtools, Pindel and BreakDancer, respectively, in a total of three mutants were confirmed by IGV, and 99.8% of the mutation were false positives (Hirano et al., 2015). By contrast, generated s of 3,493, 23 and 60 mutation in an average of the three mutants as s by SAMtools, Pindel and Break- Dancer, respectively, leading to 12% of the total mutation
3 Pipeline for detecting whole-genome mutations 231 Read (FASTQ) Mapping Realignment Mapping result GATK SAMtools Pindel BreakDancer CNVnator Known SNP (VCF) snpeff snpeff (Pindel) (BreakDancer) (CNVnator) RepBase List of repetitive Repeat Masker Add information of repetitive Add information of repetitive List of mutation (GATK) List of mutation (SAMtools) Fig. 1. Flowchart of. Read from multiple mutants were input in the FASTQ format. analyses by GATK, SAMtools, Pindel, BreakDancer and CNVnator were performed on each mutant. The results of each mutation analysis for all mutants were merged into a single TSV file. obtained in the previous study (Table 1). In addition, all the true mutations confirmed by Hirano et al. (2015) were detected by except for two mutations (possibly caused by the version upgrade of Pindel). Detection of SNPs and short indels with GATK was not performed in the previous study (Hirano et al., 2015). Thus, we confirmed all mutation detected by GATK using IGV. In the current analysis, 156 mutations were detected by GATK (Supplementary Tables S1 and S2), and 125 of these were identical to the mutations identified by SAMtools (Supplementary Table S1). The other 31 mutations were identified exclusively by GATK (Supplementary Table S2). On the other hand, SAMtools identified 24 mutations that were not detected by GATK in this study (Supplementary Table S3). The differences of the detected mutations may be due to differ-
4 232 K. ISHII et al. Table 1. Numbers of mutation by SAMtools, Pindel and BreakDancer, and the numbers after filtration by Mutant line SAMtools Pindel BreakDancer Ar-57-al1 16,181 3, , , Ar-365-as1 16,939 3, , , Ar-443-as1 16,442 3, , , *Calculated from Hirano et al. (2015). **Percentage of mutation after filtration by in those by each software. Table 2. Numbers of mutation from GATK and CNVnator, and the accuracy rates of mutation detection Mutant line GATK BS DEL INS AR* DEL CNVnator DUP AR* Ar-57-al1 82 (66) 180 (11) 110 (3) (0) 4 (0) 0 Ar-365-as1 41 (19) 182 (15) 134 (2) (0) 5 (0) 0 Ar-443-as1 84 (24) 187 (12) 110 (5) (2) 7 (4) 33 Numbers in parentheses indicate mutation confirmed by IGV. BS: base substitution; DEL: deletion; INS: insertion; DUP: duplication. *Accuracy rate: the percentage of the numbers of mutation visually confirmed using IGV in those by. ences in the algorithms of the two programs. Parameter tuning in each program by users to fit their own sequence data may minimize the detection of program-specific mutation, although it may also increase the number of false positives. CNVnator detected copy number variations in the Ar- 443-as1 mutant. Confirmation of the copy number variations by IGV revealed that all the copy number variations obtained were identical to those detected by Array-CGH in Hirano et al. (2015). In addition, a heterozygous deletion in the region 10,285,036 10,307,586 on chromosome 5 was identified, which could not be detected by either Pindel or BreakDancer. Thus, incorporation of CNVnator into improves the efficiency of mutation detection. We checked mutation detected by GATK and CNVnator using IGV and confirmed that 14% and 7% of were positive (Table 2). In this study, we developed the new pipeline, for mutation detection at the whole-genome level, which integrates a set of well-validated open access programs. enables the reduction of false-positive mutation to 12% of those reported in a previous study (Hirano et al., 2015). This reduction gives us high-throughput detection of whole-genome mutations. can analyze the sequencing reads derived from back-crossed populations to carry out mutation induction and mapping of genes responsible for the mutant phenotype as described earlier (Ashelford et al., 2011; Uchida et al., 2011). Moreover, can be applied to other model organisms, if their reference are available. These techniques will accelerate forward genetic studies. Finally, it should be mentioned that candidate mutation by still included false positives. Therefore, confirming the mutation using a genome browser is still required. Also, the possibility of false negatives should be considered because the same mutations may conceivably be induced independently in different mutants. This research was supported by the Council for Science, Technology and Innovation (CSTI), Cross-ministerial Strategic Innovation Promotion Program (SIP), Technologies for creating next-generation agriculture, forestry and fisheries (funding agency: Bio-oriented Technology Research Advancement Institution, NARO); by the Japan Society for the Promotion of Science (JSPS) through the Funding Program for Next Generation World-Leading Researchers (NEXT Program) to T. A. (GR096) and through a Grant-in-Aid for Scientific Research (B) (Y. K., No ); by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) through KAKENHI (T. A., No. 221S0002); and by the RIKEN Biomass Engineering Program. REFERENCES Abe, A., Kosugi, S., Yoshida, K., Natsume, S., Takagi, H., Kanzaki, H., Matsumura, H., Yoshida, K., Mitsuoka, C., Tamiru, M., et al. (2012) Genome sequencing reveals agronomically important loci in rice using MutMap. Nat. Biotechnol. 30, Abyzov, A., Urban, A. E., Snyder, M., and Gerstein, M. (2011) CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, Ashelford, K., Eriksson, M. E., Allen, C. M., D Amore, R., Johansson, M., Gould, P., Kay, S., Millar, A. J., Hall, N., and Hall, A. (2011) Full genome re-sequencing reveals a novel circadian clock mutation in Arabidopsis. Genome Biol. 12, R28. Austin, R. S., Vidaurre, D., Stamatiou, G., Breit, R., Provart, N. J., Bonetta, D., Zhang, J. F., Fung, P., Gong, Y. C., Wang, P. W., et al. (2011) Next-generation mapping of Arabidopsis genes. Plant J. 67, Belfield, E. J., Gan, X. C., Mithani, A., Brown, C., Jiang, C. F., Franklin, K., Alvey, E., Wibowo, A., Jung, M., Bailey, K., et al. (2012) Genome-wide analysis of mutations in mutant lineages selected following fast-neutron irradiation mutagen-
5 Pipeline for detecting whole-genome mutations 233 esis of Arabidopsis thaliana. Genome Res. 22, Bowen, M. E., Henke, K., Siegfried, K. R., Warman, M. L., and Harris, M. P. (2012) Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. Genetics 190, Chen, K., Wallis, J. W., McLellan, M. D., Larson, D. E., Kalicki, J. M., Pohl, C. S., McGrath, S. D., Wendl, M. C., Zhang, Q. Y., Locke, D. P., et al. (2009) BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X. Y., and Ruden, D. M. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, Edwards, M. D., and Gifford, D. K. (2012) High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 (suppl 6), S8. Fekih, R., Takagi, H., Tamiru, M., Abe, A., Natsume, S., Yaegashi, H., Sharma, S., Sharma, S., Kanzaki, H., Matsumura, H., et al. (2013) MutMap+: Genetic mapping and mutant identification without crossing in rice. Plos One 8, e Hirano, T., Kazama, Y., Ohbu, S., Shirakawa, Y., Liu, Y., Kambara, T., Fukunishi, N., and Abe, T. (2012) Molecular nature of mutations induced by high-let irradiation with argon and carbon ions in Arabidopsis thaliana. Mutat. Res.-Fund. Mol. M. 735, Hirano, T., Kazama, Y., Ishii, K., Ohbu, S., Shirakawa, Y., and Abe, T. (2015) Comprehensive identification of mutations induced by heavy-ion beam irradiation in Arabidopsis thaliana. Plant J. 82, Kazama, Y., Hirano, T., Saito, H., Liu, Y., Ohbu, S., Hayashi, Y., and Abe, T. (2011) Characterization of highly efficient heavyion mutagenesis in Arabidopsis thaliana. BMC Plant Biology 11, 161. Kazama, Y., Hirano, T., Nishihara, K., Ohbu, S., Shirakawa, Y., and Abe, T. (2013) Effect of high-let Fe-ion beam irradiation on mutation induction in Arabidopsis thaliana. Genes Genet. Syst. 88, Li, H., and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Proc, G. P. D. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, Minevich, G., Park, D. S., Blankenberg, D., Poole, R. J., and Hobert, O. (2012) CloudMap: A cloud-based pipeline for analysis of mutant genome. Genetics 192, Obholzer, N., Swinburne, I. A., Schwab, E., Nechiporuk, A. V., Nicolson, T., and Megason, S. G. (2012) Rapid positional cloning of zebrafish mutations by linkage and homozygosity mapping using whole-genome sequencing. Development 139, Robinson, J. T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., and Mesirov, J. P. (2011) Integrative genomics viewer. Nat. Biotechnol. 29, Schneeberger, K., Ossowski, S., Lanz, C., Juul, T., Petersen, A. H., Nielsen, K. L., Jorgensen, J. E., Weigel, D., and Andersen, S. U. (2009) SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat. Methods 6, Uchida, N., Sakamoto, T., Kurata, T., and Tasaka, M. (2011) Identification of EMS-induced causal mutations in a nonreference Arabidopsis thaliana accession by whole genome sequencing. Plant Cell Physiol. 52, Ye, K., Schulz, M. H., Long, Q., Apweiler, R., and Ning, Z. M. (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25,
Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI
Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality
More informationVariant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012
+ Variant Detection in Next Generation Sequencing Data John Osborne Sept 14, 2012 + Overview My Bias Talk slanted towards analyzing whole genomes using Illumina paired end reads with open source tools
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationNGS in Pathology Webinar
NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical
More informationAlignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014
Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationSUPPLEMENTARY INFORMATION
doi:10.1038/nature26136 We reexamined the available whole data from different cave and surface populations (McGaugh et al, unpublished) to investigate whether insra exhibited any indication that it has
More informationAlignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014
Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationA Pipeline for Markers Selection Using Restriction Site Associated DNA Sequencing (RADSeq)
European Journal of Biophysics 2018; 6(1): 7-16 http://www.sciencepublishinggroup.com/j/ejb doi: 10.11648/j.ejb.20180601.12 ISSN: 2329-1745 (Print); ISSN: 2329-1737 (Online) A Pipeline for Markers Selection
More informationSNP detection in allopolyploid crops
SNP detection in allopolyploid crops using NGS data Abstract Homologous SNP detection in polyploid organisms is complicated due to the presence of subgenome polymorphisms, i.e. homeologous SNPs. Several
More informationVariant Callers. J Fass 24 August 2017
Variant Callers J Fass 24 August 2017 Variant Types Caller Consistency Pabinger (2014) Briefings Bioinformatics 15:256 Freebayes Bayesian haplotype caller that can call SNPs, short CNVs / duplications,
More informationDeletion of Indian hedgehog gene causes dominant semi-lethal Creeper trait in chicken
1 Supplementary information 2 3 4 5 6 7 Deletion of Indian hedgehog gene causes dominant semi-lethal Creeper trait in chicken 8 9 10 11 12 Sihua Jin 1, Feng Zhu 1, Yanyun Wang 1, Guoqiang Yi 1, Junying
More informationVariant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016
Variant Finding UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Types of Variants Adapted from Alkan et al, Nature Reviews Genetics 2011 Why Look For Variants? Genotyping Correlation with
More informationBIGGIE: A Distributed Pipeline for Genomic Variant Calling
BIGGIE: A Distributed Pipeline for Genomic Variant Calling Richard Xia, Sara Sheehan, Yuchen Zhang, Ameet Talwalkar, Matei Zaharia Jonathan Terhorst, Michael Jordan, Yun S. Song, Armando Fox, David Patterson
More informationNext Generation Genetics: Using deep sequencing to connect phenotype to genotype
Next Generation Genetics: Using deep sequencing to connect phenotype to genotype http://1001genomes.org Korbinian Schneeberger Connecting Genotype and Phenotype Genotyping SNPs small Resequencing SVs*
More informationDistributed Pipeline for Genomic Variant Calling
Distributed Pipeline for Genomic Variant Calling Richard Xia, Sara Sheehan, Yuchen Zhang, Ameet Talwalkar, Matei Zaharia Jonathan Terhorst, Michael Jordan, Yun S. Song, Armando Fox, David Patterson Division
More informationSupplementary Figures and Data
Supplementary Figures and Data Whole Exome Screening Identifies Novel and Recurrent WISP3 Mutations Causing Progressive Pseudorheumatoid Dysplasia in Jammu and Kashmir India Ekta Rai 1, Ankit Mahajan 2,
More informationVariant Discovery. Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD
Variant Discovery Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD Variant Type Alkan et al, Nature Reviews Genetics 2011 doi:10.1038/nrg2958 Variant Type http://www.broadinstitute.org/education/glossary/snp
More informationProceedings of the World Congress on Genetics Applied to Livestock Production,
Genomics using the Assembly of the Mink Genome B. Guldbrandtsen, Z. Cai, G. Sahana, T.M. Villumsen, T. Asp, B. Thomsen, M.S. Lund Dept. of Molecular Biology and Genetics, Research Center Foulum, Aarhus
More informationDNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing
TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence
More informationRead Mapping and Variant Calling. Johannes Starlinger
Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.
More informationExploring structural variation in the tomato genome with JBrowse
Exploring structural variation in the tomato genome with JBrowse Richard Finkers, Wageningen UR Plant Breeding Richard.Finkers@wur.nl; @rfinkers Version 1.0, December 2013 This work is licensed under the
More informationGenome 373: Mapping Short Sequence Reads II. Doug Fowler
Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half
More informationSupplementary Figures
Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived
More informationHHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07.
Integrative Genomics Viewer James T. Robinson 1, Helga Thorvaldsdóttir 1, Wendy Winckler 1, Mitchell Guttman 1,2, Eric S. Lander 1,2,3, Gad Getz 1, and Jill P. Mesirov 1 1 Broad Institute of Massachusetts
More informationHiSeq Whole Exome Sequencing Report. BGI Co., Ltd.
HiSeq Whole Exome Sequencing Report BGI Co., Ltd. Friday, 11th Nov., 2016 Table of Contents Results 1 Data Production 2 Summary Statistics of Alignment on Target Regions 3 Data Quality Control 4 SNP Results
More informationComparing a few SNP calling algorithms using low-coverage sequencing data
Yu and Sun BMC Bioinformatics 2013, 14:274 RESEARCH ARTICLE Open Access Comparing a few SNP calling algorithms using low-coverage sequencing data Xiaoqing Yu 1 and Shuying Sun 1,2* Abstract Background:
More informationStrand NGS Variant Caller
STRAND LIFE SCIENCES WHITE PAPER Strand NGS Variant Caller A Benchmarking Study Rohit Gupta, Pallavi Gupta, Aishwarya Narayanan, Somak Aditya, Shanmukh Katragadda, Vamsi Veeramachaneni, and Ramesh Hariharan
More informationVirus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability
Title Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability Author(s) Ho, DWH; Sze, MF; Ng, IOL Citation, 2015, v. 6, n. 25,
More informationPrioritization: from vcf to finding the causative gene
Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for
More informationGenomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics
Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to
More informationHuman Genetic Variation. Ricardo Lebrón Dpto. Genética UGR
Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.
More informationISOLATING causal genes is essential for the understanding
INVESTIGATION Gene Discovery Using Mutagen-Induced Polymorphisms and Deep Sequencing: Application to Plant Disease Resistance Ying Zhu,*,,1 Hyung-gon Mang,*,1,2 Qi Sun,,1 Jun Qian,* Ashley Hipps,*,3 and
More informationBulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University
Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare
More informationThe effect of strand bias in Illumina short-read sequencing data
Guo et al. BMC Genomics 2012, 13:666 RESEARCH ARTICLE Open Access The effect of strand bias in Illumina short-read sequencing data Yan Guo 1, Jiang Li 1, Chung-I Li 1, Jirong Long 2, David C Samuels 3
More informationThree or more paraffin blocks from at least 2 different lung lobes were analyzed per
ONLINE DATA SUPPLEMENT Methods Histological analysis of formalin-fixed samples Three or more paraffin blocks from at least 2 different lung lobes were analyzed per case. Serial 4 µm sections were stained
More informationUAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science
+ UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point
More informationSNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es
SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR
More informationSupplemental Methods. Exome Enrichment and Sequencing
Supplemental Methods Exome Enrichment and Sequencing Genomic libraries were prepared using the Illumina Paired End Sample Prep Kit following the manufacturer s instructions. Enrichment was performed as
More informationEdge effects in calling variants from targeted amplicon sequencing
Vijaya Satya and DiCarlo BMC Genomics 2014, 15:1073 METHODOLOGY ARTICLE Open Access Edge effects in calling from targeted amplicon sequencing Ravi Vijaya Satya * and John DiCarlo Abstract Background: Analysis
More informationBioinformatics in next generation sequencing projects
Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Standard sequence library generation Illumina
More informationEstimation of the Spontaneous Mutation Rate per Nucleotide Site in a Drosophila melanogaster Full-Sib Family
INVESTIGATION HIGHLIGHTED ARTICLE Estimation of the Spontaneous Mutation Rate per Nucleotide Site in a Drosophila melanogaster Full-Sib Family Peter D. Keightley, 1 Rob W. Ness, Daniel L. Halligan, and
More informationmindel: a high-throughput and efficient pipeline for genome-wide InDel marker development
Lv et al. BMC Genomics (2016) 17:290 DOI 10.1186/s12864-016-2614-5 SOFTWARE mindel: a high-throughput and efficient pipeline for genome-wide InDel marker development Yuanda Lv 1, Yuhe Liu 2 and Han Zhao
More informationGenome research in eukaryotes
Functional Genomics Genome and EST sequencing can tell us how many POTENTIAL genes are present in the genome Proteomics can tell us about proteins and their interactions The goal of functional genomics
More informationBioinformatics small variants Data Analysis. Guidelines. genomescan.nl
Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationSNP calling and Genome Wide Association Study (GWAS) Trushar Shah
SNP calling and Genome Wide Association Study (GWAS) Trushar Shah Types of Genetic Variation Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Variations (SNVs) Short
More informationVariant calling in NGS experiments
Variant calling in NGS experiments Jorge Jiménez jjimeneza@cipf.es BIER CIBERER Genomics Department Centro de Investigacion Principe Felipe (CIPF) (Valencia, Spain) 1 Index 1. NGS workflow 2. Variant calling
More informationA DE NOVO NONSENSE MUTATION IN MAGEL2 IN A PATIENT INITIALLY DIAGNOSED AS OPITZ-C: SIMILARITIES BETWEEN SCHAAF-YANG AND
A DE NOVO NONSENSE MUTATION IN MAGEL2 IN A PATIENT INITIALLY DIAGNOSED AS OPITZ-C: SIMILARITIES BETWEEN SCHAAF-YANG AND OPITZ-C SYNDROMES Roser Urreizti, PhD, Anna Maria Cueto-Gonzalez, MD, Héctor Franco-Valls,
More informationData Analysis Report: Variant Analysis v1.2
GATC Biotech AG, Jakob-Stadler-Platz 7, 78467 Konstanz Data Analysis Report: Variant Analysis v1.2 Project / Study: GATC-Demo Date: February 28, 2018 Table of Contents 1 Analysis workflow 1 2 Samples Analysed
More informationHLA and Next Generation Sequencing it s all about the Data
HLA and Next Generation Sequencing it s all about the Data John Ord, NHSBT Colindale and University of Cambridge BSHI Annual Conference Manchester September 2014 Introduction In 2003 the first full public
More informationCREST maps somatic structural variation in cancer genomes with base-pair resolution
Nature Methods CREST maps somatic structural variation in cancer genomes with base-pair resolution Jianmin Wang, Charles G Mullighan, John Easton, Stefan Roberts, Jing Ma, Michael C Rusch, Ken Chen, Christopher
More informationThe Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data
The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data Donald Freed 1*, Rafael Aldana 1, Jessica A. Weber 2, Jeremy S. Edwards 3,4,5 1 Sentieon Inc,
More informationFast and Accurate Variant Calling in Strand NGS
S T R A ND LIF E SCIENCE S WH ITE PAPE R Fast and Accurate Variant Calling in Strand NGS A benchmarking study Radhakrishna Bettadapura, Shanmukh Katragadda, Vamsi Veeramachaneni, Atanu Pal, Mahesh Nagarajan
More informationAnalytics Behind Genomic Testing
A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical
More informationIntegrated, accurate and multienvironment. discovery from whole genome sequencing data with NGSEP
Integrated, accurate and multienvironment structural variation discovery from whole genome sequencing data with NGSEP Juan Fernando de la Hoz, Jorge Duitama Agrobiodiversity research area International
More informationIntroducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager
Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service Dr. Ruth Burton Product Manager Today s agenda Introduction CytoSure arrays and analysis
More informationWhat is Bioinformatics?
What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. - NCBI The ultimate goal of the field is
More informationGap Filling for a Human MHC Haplotype Sequence
American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human
More informationDNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.
DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.
More informationReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data
Cabanski et al. BMC Bioinformatics 2012, 13:221 SOFTWARE Open Access ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data Christopher R Cabanski 1, Keary
More informationDiscretized Gaussian Mixture for Genotyping of microsatellite loci containing homopolymer runs
Bioinformatics Advance Access published October 17, 2013 Sequence analysis Discretized Gaussian Mixture for Genotyping of microsatellite loci containing homopolymer runs Hongseok Tae 1, Dong-Yun Kim 2,
More informationData processing and analysis of genetic variation using next-generation DNA sequencing!
Data processing and analysis of genetic variation using next-generation DNA sequencing! Mark DePristo, Ph.D.! Genome Sequencing and Analysis Group! Medical and Population Genetics Program! Broad Institute
More information14 March, 2016: Introduction to Genomics
14 March, 2016: Introduction to Genomics Genome Genome within Ensembl browser http://www.ensembl.org/homo_sapiens/location/view?db=core;g=ensg00000139618;r=13:3231547432400266 Genome within Ensembl browser
More informationsnp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing
Al-Shahib and Underwood BMC Bioinformatics 2013, 14:326 SOFTWARE Open Access snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing Ali Al-Shahib * and Anthony
More informationDirect estimation of the spontaneous mutation rate by short-term mutation accumulation lines in Chironomus riparius
Direct estimation of the spontaneous mutation rate by short-term mutation accumulation lines in Chironomus riparius Ann-Marie Oppold 1,2 and Markus Pfenninger 1,2* 1 Senckenberg Biodiversity and Climate
More informationSupplementary Data 1.
Supplementary Data 1. Evaluation of the effects of number of F2 progeny to be bulked (n) and average sequencing coverage (depth) of the genome (G) on the levels of false positive SNPs (SNP index = 1).
More informationAlignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics
Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform
More informationGenetics: Published Articles Ahead of Print, published on December 14, 2011 as /genetics
Genetics: Published Articles Ahead of Print, published on December 14, 2011 as 10.1534/genetics.111.136069 1 Efficient mapping and cloning of mutations in zebrafish by low coverage whole genome sequencing
More informationMPG NGS workshop I: SNP calling
MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula
More informationProceedings of the World Congress on Genetics Applied to Livestock Production,
Which is the best variant caller for large whole-genome sequencing datasets? C.J. Vander Jagt 1, A.J. Chamberlain 1, R.D. Schnabel 2, B.J. Hayes 1,3 & H.D. Daetwyler 1,4 1 Agriculture Victoria, AgriBio,
More informationThe Human Genome and its upcoming Dynamics
The Human Genome and its upcoming Dynamics Matthias Platzer Genome Analysis Leibniz Institute for Age Research - Fritz-Lipmann Institute (FLI) Sequencing of the Human Genome Publications 2004 2001 2001
More informationSUPPLEMENTARY INFORMATION
SUPPLEMENTARY INFORMATION doi:10.1038/nature24473 1. Supplementary Information Computational identification of neoantigens Neoantigens from the three datasets were inferred using a consistent pipeline
More informationWhy can GBS be complicated? Tools for filtering, error correction and imputation.
Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower
More informationNature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity
Supplementary Figure 1 Read Complexity A) Density plot showing the percentage of read length masked by the dust program, which identifies low-complexity sequence (simple repeats). Scrappie outputs a significantly
More informationNUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING
NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING Ken Chen, Ph.D. kchen@genome.wustl.edu The Genome Center, Washington University in St. Louis The path
More informationUsing RNAseq data to improve genomic selection in dairy cattle
Using RNAseq data to improve genomic selection in dairy cattle T. Lopdell 1,2 K. Tiplady 1 & M. Littlejohn 1 1 R&D, Livestock Improvement Corporation, Ruakura Rd, Newstead, Hamilton, New Zealand 2 School
More informationStructural variation analysis using NGS sequencing
Structural variation analysis using NGS sequencing Victor Guryev NBIC NGS taskforce meeting April 15th, 2011 Scale of genomic variants Scale 1 bp 10 bp 100 bp 1 kb 10 kb 100 kb 1 Mb Variants SNPs Short
More informationTranscriptomics analysis with RNA seq: an overview Frederik Coppens
Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)
More informationStructural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona
Structural variation Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Genetic variation How much genetic variation is there between individuals? What type of variants
More informationDeep sequencing strategies for mapping and identifying mutations from genetic screens
REVIEW Worm 2:3, e25081; July/August/September 2013; 2013 Landes Bioscience REVIEW Deep sequencing strategies for mapping and identifying mutations from genetic screens Steven Zuryn and Sophie Jarriault*
More informationAN ALGORITHM FOR STRUCTURAL VARIANT DETECTION WITH THIRD GENERATION SEQUENCING HUI-JOU CHOU. A thesis submitted to the. Graduate School Camden
AN ALGORITHM FOR STRUCTURAL VARIANT DETECTION WITH THIRD GENERATION SEQUENCING BY HUI-JOU CHOU A thesis submitted to the Graduate School Camden Rutgers, The State University of New Jersey in partial fulfillment
More informationSV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches
International Journal of Pharma Medicine and Biological Sciences Vol. 5, No. 4, October 2016 SV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches
More informationIncorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits
Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing
More informationSanger vs Next-Gen Sequencing
Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics
More informationGene discovery using mutagen-induced polymorphisms and deep sequencing: application
Genetics: Published Articles Ahead of Print, published on June 19, 2012 as 10.1534/genetics.112.141986 Gene discovery using mutagen-induced polymorphisms and deep sequencing: application to plant disease
More informationMapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010
Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong
More informationDevelopment and characterization of a high throughput targeted genotypingby-sequencing solution for agricultural genetic applications
Development and characterization of a high throughput targeted genotypingby-sequencing solution for agricultural genetic applications Michelle Swimley 1, Angela Burrell 1, Prasad Siddavatam 1, Chris Willis
More informationSetting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting
Setting Standards and Raising Quality for Clinical Bioinformatics Joo Wook Ahn, Guy s & St Thomas 04/07/2016 - ACGS summer scientific meeting 1. Best Practice Guidelines Draft guidelines circulated to
More informationVariant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4
WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent
More informationNext-Generation Sequencing. Technologies
Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062
More informationAnalysis Datasheet Exosome RNA-seq Analysis
Analysis Datasheet Exosome RNA-seq Analysis Overview RNA-seq is a high-throughput sequencing technology that provides a genome-wide assessment of the RNA content of an organism, tissue, or cell. Small
More informationGenomic resources. for non-model systems
Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing
More informationDiscovery and genotyping of genome structural polymorphism by sequencing on a population scale
Discovery and genotyping of genome structural polymorphism by sequencing on a population scale The Harvard community has made this article openly available. Please share how this access benefits you. Your
More informationMutMap+: Genetic Mapping and Mutant Identification without Crossing in Rice
MutMap+: Genetic Mapping and Mutant Identification without Crossing in Rice Rym Fekih 1., Hiroki Takagi 1,2., Muluneh Tamiru 1., Akira Abe 1, Satoshi Natsume 1,2, Hiroki Yaegashi 1, Shailendra Sharma 1,
More informationSupplemental Data. Who's Who? Detecting and Resolving. Sample Anomalies in Human DNA. Sequencing Studies with Peddy
The American Journal of Human Genetics, Volume 100 Supplemental Data Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy Brent S. Pedersen and Aaron R. Quinlan
More informationFunctional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group
Functional genomics to improve wheat disease resistance Dina Raats Postdoctoral Scientist, Krasileva Group Talk plan Goal: to contribute to the crop improvement by isolating YR resistance genes from cultivated
More informationQIAseq Targeted Panel Analysis Plugin USER MANUAL
QIAseq Targeted Panel Analysis Plugin USER MANUAL User manual for QIAseq Targeted Panel Analysis 1.1 Windows, macos and Linux June 18, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej
More informationCloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences
enetics: dvance Online Publication, published on October 11, 212 as 1.134/genetics.112.14424 CloudMap: Cloud-based Pipeline for nalysis of Mutant enome Sequences regory Minevich 1,, Danny S. Park 1, Daniel
More informationAssignment 9: Genetic Variation
Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant
More information