AMAP: A pipeline for whole-genome mutation detection in Arabidopsis thaliana

Size: px
Start display at page:

Download "AMAP: A pipeline for whole-genome mutation detection in Arabidopsis thaliana"

Transcription

1 Genes Genet. Syst. (2016) 91, p Pipeline for detecting whole-genome mutations 229 : A pipeline for whole-genome mutation detection in Arabidopsis thaliana Kotaro Ishii 1, Yusuke Kazama 1, Tomonari Hirano 1,2, Michiaki Hamada 3, Yukiteru Ono 4, Mieko Yamada 1 and Tomoko Abe 1 * 1 RIKEN Nishina Center, 2-1, Hirosawa, Wako, Saitama , Japan 2 Faculty of Agriculture, University of Miyazaki, 1-1, Gakuenkibanadai-Nishi, Miyazaki , Japan 3 Faculty of Science and Engineering, Waseda University, 55N-06-10, 3-4-1, Okubo, Shinjuku-ku, Tokyo , Japan 4 IMSBIO Co., Ltd., , Higashi-Ikebukuro, Toshima-ku, Tokyo , Japan (Received 3 December 2015, accepted 27 March 2016; J-STAGE Advance published date: 25 July 2016) Detection of mutations at the whole-genome level is now possible by the use of high-throughput sequencing. However, determining mutations is a time-consuming process due to the number of false positives provided by mutation-detecting programs. (automated mutation analysis pipeline) was developed to overcome this issue. integrates a set of well-validated programs for mapping (BWA), removal of potential PCR duplicates (Picard), realignment (GATK) and detection of mutations (SAMtools, GATK, Pindel, BreakDancer and CNVnator). Thus, all types of mutations such as base substitution, deletion, insertion, translocation and chromosomal rearrangement can be detected by. In addition, automatically distinguishes false positives by comparing lists of candidate mutations in sequenced mutants. We tested by inputting already analyzed read data derived from three individual Arabidopsis thaliana mutants and confirmed that all true mutations were included in the list of candidate mutations. The result showed that the number of false positives was reduced to 12% of that obtained in a previous analysis that lacked a process of reducing false positives. Thus, will accelerate not only the analysis of mutation induction by individual mutagens but also the process of forward genetics. Key words: Arabidopsis thaliana, heavy-ion beam, mutation detection, pipeline, whole-genome re-sequencing Whole-genome re-sequencing can now be performed in organisms whose genome sequencing has already been completed using high-throughput sequencing (HTS) technologies. Whole-genome re-sequencing enables the rapid identification of genes responsible for mutant traits in many model organisms, including yeast (Edwards and Gifford, 2012), zebrafish (Bowen et al., 2012; Obholzer et al., 2012), Caenorhabditis elegans (Minevich et al., 2012), Arabidopsis thaliana (Schneeberger et al., 2009; Ashelford et al., 2011; Austin et al., 2011; Uchida et al., 2011) and rice (Abe et al., 2012; Fekih et al., 2013). These tools have accelerated forward genetic studies to date. In forward genetic studies, appropriate mutagens need to be selected for obtaining the mutants of interest. Chemical mutagens such as ethyl methanesulfonate (EMS) are widely used for inducing mutations and the Edited by Koji Murai * Corresponding author. tomoabe@riken.jp DOI: above-described methods are suitable for detecting EMSinduced mutations, such as base substitution. On the other hand, various kinds of ionizing radiation, including fast-neutron and heavy-ion beam radiation, can be used as effective mutagens, and have traditionally been believed to induce diverse mutations, including base substitution, deletion, insertion and chromosomal rearrangement. These mutations can now be identified at the whole-genome level using HTS. In A. thaliana, fast-neutron-induced mutations were revealed to be mainly base substitutions and small deletions (Belfield et al., 2012). We have previously identified the mutation spectrum of the heavy-ion beam in A. thaliana as comprising base substitutions, deletions, insertions and chromosomal rearrangements (Hirano et al., 2015). The size of deletions increases with increasing value of the linear energy transfers (LETs) of heavy-ion beams (Kazama et al., 2011, 2013; Hirano et al., 2012). Whole-genome identification using HTS confirmed that base substitutions, deletions, insertions and chromosomal rearrangements

2 230 K. ISHII et al. were induced at the whole-genome level (Hirano et al., 2015). Thus, it is now possible to determine any type of mutation using HTS. However, for detection of whole-genome mutations, different programs need to be used for each target mutation. Base substitutions or small insertions/deletions are detected by SAMtools (Li et al., 2009) and GATK (McKenna et al., 2010), while large deletions or chromosomal rearrangements are detected by Pindel (Ye et al., 2009), BreakDancer (Chen et al., 2009) and CNVnator (Abyzov et al., 2011). An additional problem in mutation detection using HTS is that the lists of candidate mutations generated by these programs contain a number of false positives. The possible causes of false positives include mismapping of sequencing reads or SNPs between the original accession and the accession used in mutation induction. Belfield et al. (2012) and Hirano et al. (2015) confirmed all candidate mutations using a genome browser, but this is a very time-consuming process. In this study, we have developed a novel pipeline, (automated mutation analysis pipeline), for conducting an integrated set of mutation analyses, mapping, removal of potential PCR duplicates and detection of mutations, using several programs. In, false positives are automatically determined by searching repetitive near candidate mutations and by comparing lists of candidate mutations in sequenced mutants. We tested using HTS data that were previously analyzed by Hirano et al. (2015). HTS analysis using will allow the acceleration of forward genetics and gene function analysis. consists of Perl scripts and requires the following software: BWA (ver , Li and Durbin, 2009), Picard (ver , RepeatMasker (ver. open-4.0.5, org), SAMtools (ver ), GATK (ver ), Pindel (ver t), BreakDancer (ver ), CNVnator (ver. 0.3) and SnpEff (ver. 3.6, Cingolani et al., 2012). The A. thaliana reference genome sequence (in FASTA format), gene sets data (in GTF format) and variation data (in VCF format) are also required and are available in EnsemblPlants ( The workflow of is shown in Fig. 1. is designed to accept paired-end read data generated from the HiSeq sequencing system (Illumina, Cambridge, UK). When sequencing reads in the FASTQ format obtained from multiple mutants are input into, mapping by BWA and removal of PCR duplicates by Picard are automatically performed, similar to that reported by Hirano et al. (2015). In addition, conducts realignment of reads using GATK to refine the mapping of reads. detection is then automatically conducted using SAMtools, Pindel and BreakDancer. also detects short indels and SNPs by GATK, and copy number variants by CNVnator. in the mitochondrion or plastid are excluded. For results of GATK and SAMtools, known information about SNPs is added by SnpEff. The files of all mutants generated from each program are merged into a single file. When trying to remove false-positive mutations automatically, it is possible that false-negative mutations are also removed. Thus, does not delete the estimated false positives but only adds flags to them so that they can easily be distinguished in the files. In the SAMtools and GATK s, mutation whose positions are covered by less than five or more than 1,000 reads are marked as false positives. In the GATK, SNP that failed to pass the filter QUAL < 30.0 QD < 5.0 and indel that failed to pass the filter QUAL < 10.0 or MQ0 4 && ((MQ0 / (1.0 * DP)) > 0.1 are also marked as false positives. In the CNVnator, mutation with a depth (average number of covered reads at both ends of the mutation region) of less than five or more than 1,000 are marked as false positives. In the Pindel and Break- Dancer s, mutation with a depth of less than five or more than 1,000, or those in which the ratio of the number of reads supporting the mutation to the depth is less than 0.1, are marked as false positives. commonly detected in at least two mutants are evaluated by as false positives stemming from preexistent polymorphisms, although works properly with a single mutant input without this function. also considers mutation in or around ( ±10 bp) repetitive that are detected by RepeatMasker as false positives. is available on GitHub ( github.com/ion-beam-breeding/). To test, the sequencing reads obtained from three mutants (Hirano et al., 2015) isolated after Ar-ion irradiation (50 Gy, LET = 290 kev μm) were input. In the test, to avoid different results due to differences in the versions of the programs used, the same versions of BWA (ver ), Picard (ver. 1.55) and SAMtools (0.1.16) as used in Hirano et al. (2015) were applied. The GATK and CNVnator s were confirmed by Integrative Genomics Viewer (IGV; ver. 2.3, Robinson et al., 2011). Read files from the three Ar-ion-induced mutants that were previously analyzed by Hirano et al. (2015) were reanalyzed using. The resulting s generated by are shown in Tables 1 and 2. In the previous study, 16,521, 8,927 and 3,626 mutation were detected by SAMtools, Pindel and BreakDancer, respectively, in an average of three mutants. However, only 149, 17 and 35 mutations that were detected by SAMtools, Pindel and BreakDancer, respectively, in a total of three mutants were confirmed by IGV, and 99.8% of the mutation were false positives (Hirano et al., 2015). By contrast, generated s of 3,493, 23 and 60 mutation in an average of the three mutants as s by SAMtools, Pindel and Break- Dancer, respectively, leading to 12% of the total mutation

3 Pipeline for detecting whole-genome mutations 231 Read (FASTQ) Mapping Realignment Mapping result GATK SAMtools Pindel BreakDancer CNVnator Known SNP (VCF) snpeff snpeff (Pindel) (BreakDancer) (CNVnator) RepBase List of repetitive Repeat Masker Add information of repetitive Add information of repetitive List of mutation (GATK) List of mutation (SAMtools) Fig. 1. Flowchart of. Read from multiple mutants were input in the FASTQ format. analyses by GATK, SAMtools, Pindel, BreakDancer and CNVnator were performed on each mutant. The results of each mutation analysis for all mutants were merged into a single TSV file. obtained in the previous study (Table 1). In addition, all the true mutations confirmed by Hirano et al. (2015) were detected by except for two mutations (possibly caused by the version upgrade of Pindel). Detection of SNPs and short indels with GATK was not performed in the previous study (Hirano et al., 2015). Thus, we confirmed all mutation detected by GATK using IGV. In the current analysis, 156 mutations were detected by GATK (Supplementary Tables S1 and S2), and 125 of these were identical to the mutations identified by SAMtools (Supplementary Table S1). The other 31 mutations were identified exclusively by GATK (Supplementary Table S2). On the other hand, SAMtools identified 24 mutations that were not detected by GATK in this study (Supplementary Table S3). The differences of the detected mutations may be due to differ-

4 232 K. ISHII et al. Table 1. Numbers of mutation by SAMtools, Pindel and BreakDancer, and the numbers after filtration by Mutant line SAMtools Pindel BreakDancer Ar-57-al1 16,181 3, , , Ar-365-as1 16,939 3, , , Ar-443-as1 16,442 3, , , *Calculated from Hirano et al. (2015). **Percentage of mutation after filtration by in those by each software. Table 2. Numbers of mutation from GATK and CNVnator, and the accuracy rates of mutation detection Mutant line GATK BS DEL INS AR* DEL CNVnator DUP AR* Ar-57-al1 82 (66) 180 (11) 110 (3) (0) 4 (0) 0 Ar-365-as1 41 (19) 182 (15) 134 (2) (0) 5 (0) 0 Ar-443-as1 84 (24) 187 (12) 110 (5) (2) 7 (4) 33 Numbers in parentheses indicate mutation confirmed by IGV. BS: base substitution; DEL: deletion; INS: insertion; DUP: duplication. *Accuracy rate: the percentage of the numbers of mutation visually confirmed using IGV in those by. ences in the algorithms of the two programs. Parameter tuning in each program by users to fit their own sequence data may minimize the detection of program-specific mutation, although it may also increase the number of false positives. CNVnator detected copy number variations in the Ar- 443-as1 mutant. Confirmation of the copy number variations by IGV revealed that all the copy number variations obtained were identical to those detected by Array-CGH in Hirano et al. (2015). In addition, a heterozygous deletion in the region 10,285,036 10,307,586 on chromosome 5 was identified, which could not be detected by either Pindel or BreakDancer. Thus, incorporation of CNVnator into improves the efficiency of mutation detection. We checked mutation detected by GATK and CNVnator using IGV and confirmed that 14% and 7% of were positive (Table 2). In this study, we developed the new pipeline, for mutation detection at the whole-genome level, which integrates a set of well-validated open access programs. enables the reduction of false-positive mutation to 12% of those reported in a previous study (Hirano et al., 2015). This reduction gives us high-throughput detection of whole-genome mutations. can analyze the sequencing reads derived from back-crossed populations to carry out mutation induction and mapping of genes responsible for the mutant phenotype as described earlier (Ashelford et al., 2011; Uchida et al., 2011). Moreover, can be applied to other model organisms, if their reference are available. These techniques will accelerate forward genetic studies. Finally, it should be mentioned that candidate mutation by still included false positives. Therefore, confirming the mutation using a genome browser is still required. Also, the possibility of false negatives should be considered because the same mutations may conceivably be induced independently in different mutants. This research was supported by the Council for Science, Technology and Innovation (CSTI), Cross-ministerial Strategic Innovation Promotion Program (SIP), Technologies for creating next-generation agriculture, forestry and fisheries (funding agency: Bio-oriented Technology Research Advancement Institution, NARO); by the Japan Society for the Promotion of Science (JSPS) through the Funding Program for Next Generation World-Leading Researchers (NEXT Program) to T. A. (GR096) and through a Grant-in-Aid for Scientific Research (B) (Y. K., No ); by the Ministry of Education, Culture, Sports, Science and Technology (MEXT) through KAKENHI (T. A., No. 221S0002); and by the RIKEN Biomass Engineering Program. REFERENCES Abe, A., Kosugi, S., Yoshida, K., Natsume, S., Takagi, H., Kanzaki, H., Matsumura, H., Yoshida, K., Mitsuoka, C., Tamiru, M., et al. (2012) Genome sequencing reveals agronomically important loci in rice using MutMap. Nat. Biotechnol. 30, Abyzov, A., Urban, A. E., Snyder, M., and Gerstein, M. (2011) CNVnator: An approach to discover, genotype, and characterize typical and atypical CNVs from family and population genome sequencing. Genome Res. 21, Ashelford, K., Eriksson, M. E., Allen, C. M., D Amore, R., Johansson, M., Gould, P., Kay, S., Millar, A. J., Hall, N., and Hall, A. (2011) Full genome re-sequencing reveals a novel circadian clock mutation in Arabidopsis. Genome Biol. 12, R28. Austin, R. S., Vidaurre, D., Stamatiou, G., Breit, R., Provart, N. J., Bonetta, D., Zhang, J. F., Fung, P., Gong, Y. C., Wang, P. W., et al. (2011) Next-generation mapping of Arabidopsis genes. Plant J. 67, Belfield, E. J., Gan, X. C., Mithani, A., Brown, C., Jiang, C. F., Franklin, K., Alvey, E., Wibowo, A., Jung, M., Bailey, K., et al. (2012) Genome-wide analysis of mutations in mutant lineages selected following fast-neutron irradiation mutagen-

5 Pipeline for detecting whole-genome mutations 233 esis of Arabidopsis thaliana. Genome Res. 22, Bowen, M. E., Henke, K., Siegfried, K. R., Warman, M. L., and Harris, M. P. (2012) Efficient mapping and cloning of mutations in zebrafish by low-coverage whole-genome sequencing. Genetics 190, Chen, K., Wallis, J. W., McLellan, M. D., Larson, D. E., Kalicki, J. M., Pohl, C. S., McGrath, S. D., Wendl, M. C., Zhang, Q. Y., Locke, D. P., et al. (2009) BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, Cingolani, P., Platts, A., Wang, L. L., Coon, M., Nguyen, T., Wang, L., Land, S. J., Lu, X. Y., and Ruden, D. M. (2012) A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly 6, Edwards, M. D., and Gifford, D. K. (2012) High-resolution genetic mapping with pooled sequencing. BMC Bioinformatics 13 (suppl 6), S8. Fekih, R., Takagi, H., Tamiru, M., Abe, A., Natsume, S., Yaegashi, H., Sharma, S., Sharma, S., Kanzaki, H., Matsumura, H., et al. (2013) MutMap+: Genetic mapping and mutant identification without crossing in rice. Plos One 8, e Hirano, T., Kazama, Y., Ohbu, S., Shirakawa, Y., Liu, Y., Kambara, T., Fukunishi, N., and Abe, T. (2012) Molecular nature of mutations induced by high-let irradiation with argon and carbon ions in Arabidopsis thaliana. Mutat. Res.-Fund. Mol. M. 735, Hirano, T., Kazama, Y., Ishii, K., Ohbu, S., Shirakawa, Y., and Abe, T. (2015) Comprehensive identification of mutations induced by heavy-ion beam irradiation in Arabidopsis thaliana. Plant J. 82, Kazama, Y., Hirano, T., Saito, H., Liu, Y., Ohbu, S., Hayashi, Y., and Abe, T. (2011) Characterization of highly efficient heavyion mutagenesis in Arabidopsis thaliana. BMC Plant Biology 11, 161. Kazama, Y., Hirano, T., Nishihara, K., Ohbu, S., Shirakawa, Y., and Abe, T. (2013) Effect of high-let Fe-ion beam irradiation on mutation induction in Arabidopsis thaliana. Genes Genet. Syst. 88, Li, H., and Durbin, R. (2009) Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Proc, G. P. D. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010) The Genome Analysis Toolkit: A MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, Minevich, G., Park, D. S., Blankenberg, D., Poole, R. J., and Hobert, O. (2012) CloudMap: A cloud-based pipeline for analysis of mutant genome. Genetics 192, Obholzer, N., Swinburne, I. A., Schwab, E., Nechiporuk, A. V., Nicolson, T., and Megason, S. G. (2012) Rapid positional cloning of zebrafish mutations by linkage and homozygosity mapping using whole-genome sequencing. Development 139, Robinson, J. T., Thorvaldsdottir, H., Winckler, W., Guttman, M., Lander, E. S., Getz, G., and Mesirov, J. P. (2011) Integrative genomics viewer. Nat. Biotechnol. 29, Schneeberger, K., Ossowski, S., Lanz, C., Juul, T., Petersen, A. H., Nielsen, K. L., Jorgensen, J. E., Weigel, D., and Andersen, S. U. (2009) SHOREmap: simultaneous mapping and mutation identification by deep sequencing. Nat. Methods 6, Uchida, N., Sakamoto, T., Kurata, T., and Tasaka, M. (2011) Identification of EMS-induced causal mutations in a nonreference Arabidopsis thaliana accession by whole genome sequencing. Plant Cell Physiol. 52, Ye, K., Schulz, M. H., Long, Q., Apweiler, R., and Ning, Z. M. (2009) Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25,

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI

Variation detection based on second generation sequencing data. Xin LIU Department of Science and Technology, BGI Variation detection based on second generation sequencing data Xin LIU Department of Science and Technology, BGI liuxin@genomics.org.cn 2013.11.21 Outline Summary of sequencing techniques Data quality

More information

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012 + Variant Detection in Next Generation Sequencing Data John Osborne Sept 14, 2012 + Overview My Bias Talk slanted towards analyzing whole genomes using Illumina paired end reads with open source tools

More information

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/

More information

NGS in Pathology Webinar

NGS in Pathology Webinar NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical

More information

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION doi:10.1038/nature26136 We reexamined the available whole data from different cave and surface populations (McGaugh et al, unpublished) to investigate whether insra exhibited any indication that it has

More information

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

A Pipeline for Markers Selection Using Restriction Site Associated DNA Sequencing (RADSeq)

A Pipeline for Markers Selection Using Restriction Site Associated DNA Sequencing (RADSeq) European Journal of Biophysics 2018; 6(1): 7-16 http://www.sciencepublishinggroup.com/j/ejb doi: 10.11648/j.ejb.20180601.12 ISSN: 2329-1745 (Print); ISSN: 2329-1737 (Online) A Pipeline for Markers Selection

More information

SNP detection in allopolyploid crops

SNP detection in allopolyploid crops SNP detection in allopolyploid crops using NGS data Abstract Homologous SNP detection in polyploid organisms is complicated due to the presence of subgenome polymorphisms, i.e. homeologous SNPs. Several

More information

Variant Callers. J Fass 24 August 2017

Variant Callers. J Fass 24 August 2017 Variant Callers J Fass 24 August 2017 Variant Types Caller Consistency Pabinger (2014) Briefings Bioinformatics 15:256 Freebayes Bayesian haplotype caller that can call SNPs, short CNVs / duplications,

More information

Deletion of Indian hedgehog gene causes dominant semi-lethal Creeper trait in chicken

Deletion of Indian hedgehog gene causes dominant semi-lethal Creeper trait in chicken 1 Supplementary information 2 3 4 5 6 7 Deletion of Indian hedgehog gene causes dominant semi-lethal Creeper trait in chicken 8 9 10 11 12 Sihua Jin 1, Feng Zhu 1, Yanyun Wang 1, Guoqiang Yi 1, Junying

More information

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Variant Finding UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Types of Variants Adapted from Alkan et al, Nature Reviews Genetics 2011 Why Look For Variants? Genotyping Correlation with

More information

BIGGIE: A Distributed Pipeline for Genomic Variant Calling

BIGGIE: A Distributed Pipeline for Genomic Variant Calling BIGGIE: A Distributed Pipeline for Genomic Variant Calling Richard Xia, Sara Sheehan, Yuchen Zhang, Ameet Talwalkar, Matei Zaharia Jonathan Terhorst, Michael Jordan, Yun S. Song, Armando Fox, David Patterson

More information

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype

Next Generation Genetics: Using deep sequencing to connect phenotype to genotype Next Generation Genetics: Using deep sequencing to connect phenotype to genotype http://1001genomes.org Korbinian Schneeberger Connecting Genotype and Phenotype Genotyping SNPs small Resequencing SVs*

More information

Distributed Pipeline for Genomic Variant Calling

Distributed Pipeline for Genomic Variant Calling Distributed Pipeline for Genomic Variant Calling Richard Xia, Sara Sheehan, Yuchen Zhang, Ameet Talwalkar, Matei Zaharia Jonathan Terhorst, Michael Jordan, Yun S. Song, Armando Fox, David Patterson Division

More information

Supplementary Figures and Data

Supplementary Figures and Data Supplementary Figures and Data Whole Exome Screening Identifies Novel and Recurrent WISP3 Mutations Causing Progressive Pseudorheumatoid Dysplasia in Jammu and Kashmir India Ekta Rai 1, Ankit Mahajan 2,

More information

Variant Discovery. Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD

Variant Discovery. Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD Variant Discovery Jie (Jessie) Li PhD Bioinformatics Analyst Bioinformatics Core, UCD Variant Type Alkan et al, Nature Reviews Genetics 2011 doi:10.1038/nrg2958 Variant Type http://www.broadinstitute.org/education/glossary/snp

More information

Proceedings of the World Congress on Genetics Applied to Livestock Production,

Proceedings of the World Congress on Genetics Applied to Livestock Production, Genomics using the Assembly of the Mink Genome B. Guldbrandtsen, Z. Cai, G. Sahana, T.M. Villumsen, T. Asp, B. Thomsen, M.S. Lund Dept. of Molecular Biology and Genetics, Research Center Foulum, Aarhus

More information

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing

DNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence

More information

Read Mapping and Variant Calling. Johannes Starlinger

Read Mapping and Variant Calling. Johannes Starlinger Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.

More information

Exploring structural variation in the tomato genome with JBrowse

Exploring structural variation in the tomato genome with JBrowse Exploring structural variation in the tomato genome with JBrowse Richard Finkers, Wageningen UR Plant Breeding Richard.Finkers@wur.nl; @rfinkers Version 1.0, December 2013 This work is licensed under the

More information

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

Genome 373: Mapping Short Sequence Reads II. Doug Fowler Genome 373: Mapping Short Sequence Reads II Doug Fowler The final Will be in this room on June 6 th at 8:30a Will be focused on the second half of the course, but will include material from the first half

More information

Supplementary Figures

Supplementary Figures Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived

More information

HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07.

HHS Public Access Author manuscript Nat Biotechnol. Author manuscript; available in PMC 2012 May 07. Integrative Genomics Viewer James T. Robinson 1, Helga Thorvaldsdóttir 1, Wendy Winckler 1, Mitchell Guttman 1,2, Eric S. Lander 1,2,3, Gad Getz 1, and Jill P. Mesirov 1 1 Broad Institute of Massachusetts

More information

HiSeq Whole Exome Sequencing Report. BGI Co., Ltd.

HiSeq Whole Exome Sequencing Report. BGI Co., Ltd. HiSeq Whole Exome Sequencing Report BGI Co., Ltd. Friday, 11th Nov., 2016 Table of Contents Results 1 Data Production 2 Summary Statistics of Alignment on Target Regions 3 Data Quality Control 4 SNP Results

More information

Comparing a few SNP calling algorithms using low-coverage sequencing data

Comparing a few SNP calling algorithms using low-coverage sequencing data Yu and Sun BMC Bioinformatics 2013, 14:274 RESEARCH ARTICLE Open Access Comparing a few SNP calling algorithms using low-coverage sequencing data Xiaoqing Yu 1 and Shuying Sun 1,2* Abstract Background:

More information

Strand NGS Variant Caller

Strand NGS Variant Caller STRAND LIFE SCIENCES WHITE PAPER Strand NGS Variant Caller A Benchmarking Study Rohit Gupta, Pallavi Gupta, Aishwarya Narayanan, Somak Aditya, Shanmukh Katragadda, Vamsi Veeramachaneni, and Ramesh Hariharan

More information

Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability

Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability Title Virus-Clip: a fast and memory-efficient viral integration site detection tool at single-base resolution with annotation capability Author(s) Ho, DWH; Sze, MF; Ng, IOL Citation, 2015, v. 6, n. 25,

More information

Prioritization: from vcf to finding the causative gene

Prioritization: from vcf to finding the causative gene Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for

More information

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics Genomic Technologies Michael Schatz Feb 1, 2018 Lecture 2: Applied Comparative Genomics Welcome! The primary goal of the course is for students to be grounded in theory and leave the course empowered to

More information

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.

More information

ISOLATING causal genes is essential for the understanding

ISOLATING causal genes is essential for the understanding INVESTIGATION Gene Discovery Using Mutagen-Induced Polymorphisms and Deep Sequencing: Application to Plant Disease Resistance Ying Zhu,*,,1 Hyung-gon Mang,*,1,2 Qi Sun,,1 Jun Qian,* Ashley Hipps,*,3 and

More information

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare

More information

The effect of strand bias in Illumina short-read sequencing data

The effect of strand bias in Illumina short-read sequencing data Guo et al. BMC Genomics 2012, 13:666 RESEARCH ARTICLE Open Access The effect of strand bias in Illumina short-read sequencing data Yan Guo 1, Jiang Li 1, Chung-I Li 1, Jirong Long 2, David C Samuels 3

More information

Three or more paraffin blocks from at least 2 different lung lobes were analyzed per

Three or more paraffin blocks from at least 2 different lung lobes were analyzed per ONLINE DATA SUPPLEMENT Methods Histological analysis of formalin-fixed samples Three or more paraffin blocks from at least 2 different lung lobes were analyzed per case. Serial 4 µm sections were stained

More information

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science + UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point

More information

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR

More information

Supplemental Methods. Exome Enrichment and Sequencing

Supplemental Methods. Exome Enrichment and Sequencing Supplemental Methods Exome Enrichment and Sequencing Genomic libraries were prepared using the Illumina Paired End Sample Prep Kit following the manufacturer s instructions. Enrichment was performed as

More information

Edge effects in calling variants from targeted amplicon sequencing

Edge effects in calling variants from targeted amplicon sequencing Vijaya Satya and DiCarlo BMC Genomics 2014, 15:1073 METHODOLOGY ARTICLE Open Access Edge effects in calling from targeted amplicon sequencing Ravi Vijaya Satya * and John DiCarlo Abstract Background: Analysis

More information

Bioinformatics in next generation sequencing projects

Bioinformatics in next generation sequencing projects Bioinformatics in next generation sequencing projects Rickard Sandberg Assistant Professor Department of Cell and Molecular Biology Karolinska Institutet May 2013 Standard sequence library generation Illumina

More information

Estimation of the Spontaneous Mutation Rate per Nucleotide Site in a Drosophila melanogaster Full-Sib Family

Estimation of the Spontaneous Mutation Rate per Nucleotide Site in a Drosophila melanogaster Full-Sib Family INVESTIGATION HIGHLIGHTED ARTICLE Estimation of the Spontaneous Mutation Rate per Nucleotide Site in a Drosophila melanogaster Full-Sib Family Peter D. Keightley, 1 Rob W. Ness, Daniel L. Halligan, and

More information

mindel: a high-throughput and efficient pipeline for genome-wide InDel marker development

mindel: a high-throughput and efficient pipeline for genome-wide InDel marker development Lv et al. BMC Genomics (2016) 17:290 DOI 10.1186/s12864-016-2614-5 SOFTWARE mindel: a high-throughput and efficient pipeline for genome-wide InDel marker development Yuanda Lv 1, Yuhe Liu 2 and Han Zhao

More information

Genome research in eukaryotes

Genome research in eukaryotes Functional Genomics Genome and EST sequencing can tell us how many POTENTIAL genes are present in the genome Proteomics can tell us about proteins and their interactions The goal of functional genomics

More information

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah

SNP calling and Genome Wide Association Study (GWAS) Trushar Shah SNP calling and Genome Wide Association Study (GWAS) Trushar Shah Types of Genetic Variation Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide Variations (SNVs) Short

More information

Variant calling in NGS experiments

Variant calling in NGS experiments Variant calling in NGS experiments Jorge Jiménez jjimeneza@cipf.es BIER CIBERER Genomics Department Centro de Investigacion Principe Felipe (CIPF) (Valencia, Spain) 1 Index 1. NGS workflow 2. Variant calling

More information

A DE NOVO NONSENSE MUTATION IN MAGEL2 IN A PATIENT INITIALLY DIAGNOSED AS OPITZ-C: SIMILARITIES BETWEEN SCHAAF-YANG AND

A DE NOVO NONSENSE MUTATION IN MAGEL2 IN A PATIENT INITIALLY DIAGNOSED AS OPITZ-C: SIMILARITIES BETWEEN SCHAAF-YANG AND A DE NOVO NONSENSE MUTATION IN MAGEL2 IN A PATIENT INITIALLY DIAGNOSED AS OPITZ-C: SIMILARITIES BETWEEN SCHAAF-YANG AND OPITZ-C SYNDROMES Roser Urreizti, PhD, Anna Maria Cueto-Gonzalez, MD, Héctor Franco-Valls,

More information

Data Analysis Report: Variant Analysis v1.2

Data Analysis Report: Variant Analysis v1.2 GATC Biotech AG, Jakob-Stadler-Platz 7, 78467 Konstanz Data Analysis Report: Variant Analysis v1.2 Project / Study: GATC-Demo Date: February 28, 2018 Table of Contents 1 Analysis workflow 1 2 Samples Analysed

More information

HLA and Next Generation Sequencing it s all about the Data

HLA and Next Generation Sequencing it s all about the Data HLA and Next Generation Sequencing it s all about the Data John Ord, NHSBT Colindale and University of Cambridge BSHI Annual Conference Manchester September 2014 Introduction In 2003 the first full public

More information

CREST maps somatic structural variation in cancer genomes with base-pair resolution

CREST maps somatic structural variation in cancer genomes with base-pair resolution Nature Methods CREST maps somatic structural variation in cancer genomes with base-pair resolution Jianmin Wang, Charles G Mullighan, John Easton, Stefan Roberts, Jing Ma, Michael C Rusch, Ken Chen, Christopher

More information

The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data

The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data The Sentieon Genomics Tools A fast and accurate solution to variant calling from next-generation sequence data Donald Freed 1*, Rafael Aldana 1, Jessica A. Weber 2, Jeremy S. Edwards 3,4,5 1 Sentieon Inc,

More information

Fast and Accurate Variant Calling in Strand NGS

Fast and Accurate Variant Calling in Strand NGS S T R A ND LIF E SCIENCE S WH ITE PAPE R Fast and Accurate Variant Calling in Strand NGS A benchmarking study Radhakrishna Bettadapura, Shanmukh Katragadda, Vamsi Veeramachaneni, Atanu Pal, Mahesh Nagarajan

More information

Analytics Behind Genomic Testing

Analytics Behind Genomic Testing A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical

More information

Integrated, accurate and multienvironment. discovery from whole genome sequencing data with NGSEP

Integrated, accurate and multienvironment. discovery from whole genome sequencing data with NGSEP Integrated, accurate and multienvironment structural variation discovery from whole genome sequencing data with NGSEP Juan Fernando de la Hoz, Jorge Duitama Agrobiodiversity research area International

More information

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service Dr. Ruth Burton Product Manager Today s agenda Introduction CytoSure arrays and analysis

More information

What is Bioinformatics?

What is Bioinformatics? What is Bioinformatics? Bioinformatics is the field of science in which biology, computer science, and information technology merge to form a single discipline. - NCBI The ultimate goal of the field is

More information

Gap Filling for a Human MHC Haplotype Sequence

Gap Filling for a Human MHC Haplotype Sequence American Journal of Life Sciences 2016; 4(6): 146-151 http://www.sciencepublishinggroup.com/j/ajls doi: 10.11648/j.ajls.20160406.12 ISSN: 2328-5702 (Print); ISSN: 2328-5737 (Online) Gap Filling for a Human

More information

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.

DNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer. DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.

More information

ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data

ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data Cabanski et al. BMC Bioinformatics 2012, 13:221 SOFTWARE Open Access ReQON: a Bioconductor package for recalibrating quality scores from next-generation sequencing data Christopher R Cabanski 1, Keary

More information

Discretized Gaussian Mixture for Genotyping of microsatellite loci containing homopolymer runs

Discretized Gaussian Mixture for Genotyping of microsatellite loci containing homopolymer runs Bioinformatics Advance Access published October 17, 2013 Sequence analysis Discretized Gaussian Mixture for Genotyping of microsatellite loci containing homopolymer runs Hongseok Tae 1, Dong-Yun Kim 2,

More information

Data processing and analysis of genetic variation using next-generation DNA sequencing!

Data processing and analysis of genetic variation using next-generation DNA sequencing! Data processing and analysis of genetic variation using next-generation DNA sequencing! Mark DePristo, Ph.D.! Genome Sequencing and Analysis Group! Medical and Population Genetics Program! Broad Institute

More information

14 March, 2016: Introduction to Genomics

14 March, 2016: Introduction to Genomics 14 March, 2016: Introduction to Genomics Genome Genome within Ensembl browser http://www.ensembl.org/homo_sapiens/location/view?db=core;g=ensg00000139618;r=13:3231547432400266 Genome within Ensembl browser

More information

snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing

snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing Al-Shahib and Underwood BMC Bioinformatics 2013, 14:326 SOFTWARE Open Access snp-search: simple processing, manipulation and searching of SNPs from high-throughput sequencing Ali Al-Shahib * and Anthony

More information

Direct estimation of the spontaneous mutation rate by short-term mutation accumulation lines in Chironomus riparius

Direct estimation of the spontaneous mutation rate by short-term mutation accumulation lines in Chironomus riparius Direct estimation of the spontaneous mutation rate by short-term mutation accumulation lines in Chironomus riparius Ann-Marie Oppold 1,2 and Markus Pfenninger 1,2* 1 Senckenberg Biodiversity and Climate

More information

Supplementary Data 1.

Supplementary Data 1. Supplementary Data 1. Evaluation of the effects of number of F2 progeny to be bulked (n) and average sequencing coverage (depth) of the genome (G) on the levels of false positive SNPs (SNP index = 1).

More information

Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics

Alignment methods. Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics Alignment methods Martijn Vermaat Department of Human Genetics Center for Human and Clinical Genetics Alignment methods Sequence alignment Assembly vs alignment Alignment methods Common issues Platform

More information

Genetics: Published Articles Ahead of Print, published on December 14, 2011 as /genetics

Genetics: Published Articles Ahead of Print, published on December 14, 2011 as /genetics Genetics: Published Articles Ahead of Print, published on December 14, 2011 as 10.1534/genetics.111.136069 1 Efficient mapping and cloning of mutations in zebrafish by low coverage whole genome sequencing

More information

MPG NGS workshop I: SNP calling

MPG NGS workshop I: SNP calling MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula

More information

Proceedings of the World Congress on Genetics Applied to Livestock Production,

Proceedings of the World Congress on Genetics Applied to Livestock Production, Which is the best variant caller for large whole-genome sequencing datasets? C.J. Vander Jagt 1, A.J. Chamberlain 1, R.D. Schnabel 2, B.J. Hayes 1,3 & H.D. Daetwyler 1,4 1 Agriculture Victoria, AgriBio,

More information

The Human Genome and its upcoming Dynamics

The Human Genome and its upcoming Dynamics The Human Genome and its upcoming Dynamics Matthias Platzer Genome Analysis Leibniz Institute for Age Research - Fritz-Lipmann Institute (FLI) Sequencing of the Human Genome Publications 2004 2001 2001

More information

SUPPLEMENTARY INFORMATION

SUPPLEMENTARY INFORMATION SUPPLEMENTARY INFORMATION doi:10.1038/nature24473 1. Supplementary Information Computational identification of neoantigens Neoantigens from the three datasets were inferred using a consistent pipeline

More information

Why can GBS be complicated? Tools for filtering, error correction and imputation.

Why can GBS be complicated? Tools for filtering, error correction and imputation. Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower

More information

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity

Nature Biotechnology: doi: /nbt Supplementary Figure 1. Read Complexity Supplementary Figure 1 Read Complexity A) Density plot showing the percentage of read length masked by the dust program, which identifies low-complexity sequence (simple repeats). Scrappie outputs a significantly

More information

NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING

NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING NUCLEOTIDE RESOLUTION STRUCTURAL VARIATION DETECTION USING NEXT- GENERATION WHOLE GENOME RESEQUENCING Ken Chen, Ph.D. kchen@genome.wustl.edu The Genome Center, Washington University in St. Louis The path

More information

Using RNAseq data to improve genomic selection in dairy cattle

Using RNAseq data to improve genomic selection in dairy cattle Using RNAseq data to improve genomic selection in dairy cattle T. Lopdell 1,2 K. Tiplady 1 & M. Littlejohn 1 1 R&D, Livestock Improvement Corporation, Ruakura Rd, Newstead, Hamilton, New Zealand 2 School

More information

Structural variation analysis using NGS sequencing

Structural variation analysis using NGS sequencing Structural variation analysis using NGS sequencing Victor Guryev NBIC NGS taskforce meeting April 15th, 2011 Scale of genomic variants Scale 1 bp 10 bp 100 bp 1 kb 10 kb 100 kb 1 Mb Variants SNPs Short

More information

Transcriptomics analysis with RNA seq: an overview Frederik Coppens

Transcriptomics analysis with RNA seq: an overview Frederik Coppens Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)

More information

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona

Structural variation. Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Structural variation Marta Puig Institut de Biotecnologia i Biomedicina Universitat Autònoma de Barcelona Genetic variation How much genetic variation is there between individuals? What type of variants

More information

Deep sequencing strategies for mapping and identifying mutations from genetic screens

Deep sequencing strategies for mapping and identifying mutations from genetic screens REVIEW Worm 2:3, e25081; July/August/September 2013; 2013 Landes Bioscience REVIEW Deep sequencing strategies for mapping and identifying mutations from genetic screens Steven Zuryn and Sophie Jarriault*

More information

AN ALGORITHM FOR STRUCTURAL VARIANT DETECTION WITH THIRD GENERATION SEQUENCING HUI-JOU CHOU. A thesis submitted to the. Graduate School Camden

AN ALGORITHM FOR STRUCTURAL VARIANT DETECTION WITH THIRD GENERATION SEQUENCING HUI-JOU CHOU. A thesis submitted to the. Graduate School Camden AN ALGORITHM FOR STRUCTURAL VARIANT DETECTION WITH THIRD GENERATION SEQUENCING BY HUI-JOU CHOU A thesis submitted to the Graduate School Camden Rutgers, The State University of New Jersey in partial fulfillment

More information

SV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches

SV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches International Journal of Pharma Medicine and Biological Sciences Vol. 5, No. 4, October 2016 SV-BET: Structure Variation Benchmarking and Evaluation Tool with Comparative Analysis of Split Read-Based Approaches

More information

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits Incorporating Molecular ID Technology Accel-NGS 2S MID Indexing Kits Molecular Identifiers (MIDs) MIDs are indices used to label unique library molecules MIDs can assess duplicate molecules in sequencing

More information

Sanger vs Next-Gen Sequencing

Sanger vs Next-Gen Sequencing Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics

More information

Gene discovery using mutagen-induced polymorphisms and deep sequencing: application

Gene discovery using mutagen-induced polymorphisms and deep sequencing: application Genetics: Published Articles Ahead of Print, published on June 19, 2012 as 10.1534/genetics.112.141986 Gene discovery using mutagen-induced polymorphisms and deep sequencing: application to plant disease

More information

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010

Mapping Next Generation Sequence Reads. Bingbing Yuan Dec. 2, 2010 Mapping Next Generation Sequence Reads Bingbing Yuan Dec. 2, 2010 1 What happen if reads are not mapped properly? Some data won t be used, thus fewer reads would be aligned. Reads are mapped to the wrong

More information

Development and characterization of a high throughput targeted genotypingby-sequencing solution for agricultural genetic applications

Development and characterization of a high throughput targeted genotypingby-sequencing solution for agricultural genetic applications Development and characterization of a high throughput targeted genotypingby-sequencing solution for agricultural genetic applications Michelle Swimley 1, Angela Burrell 1, Prasad Siddavatam 1, Chris Willis

More information

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting Setting Standards and Raising Quality for Clinical Bioinformatics Joo Wook Ahn, Guy s & St Thomas 04/07/2016 - ACGS summer scientific meeting 1. Best Practice Guidelines Draft guidelines circulated to

More information

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent

More information

Next-Generation Sequencing. Technologies

Next-Generation Sequencing. Technologies Next-Generation Next-Generation Sequencing Technologies Sequencing Technologies Nicholas E. Navin, Ph.D. MD Anderson Cancer Center Dept. Genetics Dept. Bioinformatics Introduction to Bioinformatics GS011062

More information

Analysis Datasheet Exosome RNA-seq Analysis

Analysis Datasheet Exosome RNA-seq Analysis Analysis Datasheet Exosome RNA-seq Analysis Overview RNA-seq is a high-throughput sequencing technology that provides a genome-wide assessment of the RNA content of an organism, tissue, or cell. Small

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

Discovery and genotyping of genome structural polymorphism by sequencing on a population scale

Discovery and genotyping of genome structural polymorphism by sequencing on a population scale Discovery and genotyping of genome structural polymorphism by sequencing on a population scale The Harvard community has made this article openly available. Please share how this access benefits you. Your

More information

MutMap+: Genetic Mapping and Mutant Identification without Crossing in Rice

MutMap+: Genetic Mapping and Mutant Identification without Crossing in Rice MutMap+: Genetic Mapping and Mutant Identification without Crossing in Rice Rym Fekih 1., Hiroki Takagi 1,2., Muluneh Tamiru 1., Akira Abe 1, Satoshi Natsume 1,2, Hiroki Yaegashi 1, Shailendra Sharma 1,

More information

Supplemental Data. Who's Who? Detecting and Resolving. Sample Anomalies in Human DNA. Sequencing Studies with Peddy

Supplemental Data. Who's Who? Detecting and Resolving. Sample Anomalies in Human DNA. Sequencing Studies with Peddy The American Journal of Human Genetics, Volume 100 Supplemental Data Who's Who? Detecting and Resolving Sample Anomalies in Human DNA Sequencing Studies with Peddy Brent S. Pedersen and Aaron R. Quinlan

More information

Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group

Functional genomics to improve wheat disease resistance. Dina Raats Postdoctoral Scientist, Krasileva Group Functional genomics to improve wheat disease resistance Dina Raats Postdoctoral Scientist, Krasileva Group Talk plan Goal: to contribute to the crop improvement by isolating YR resistance genes from cultivated

More information

QIAseq Targeted Panel Analysis Plugin USER MANUAL

QIAseq Targeted Panel Analysis Plugin USER MANUAL QIAseq Targeted Panel Analysis Plugin USER MANUAL User manual for QIAseq Targeted Panel Analysis 1.1 Windows, macos and Linux June 18, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej

More information

CloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences

CloudMap: A Cloud-based Pipeline for Analysis of Mutant Genome Sequences enetics: dvance Online Publication, published on October 11, 212 as 1.134/genetics.112.14424 CloudMap: Cloud-based Pipeline for nalysis of Mutant enome Sequences regory Minevich 1,, Danny S. Park 1, Daniel

More information

Assignment 9: Genetic Variation

Assignment 9: Genetic Variation Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant

More information