Identifying recessive gene candidates with GEMINI

Size: px
Start display at page:

Download "Identifying recessive gene candidates with GEMINI"

Transcription

1 Identifying recessive gene candidates with GEMINI Aaron Quinlan University of Utah! quinlanlab.org Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist 1

2 Compound heterozygote detective work with GEMINI 2

3 Compound het refresher 3

4 Example compound heterozygote Dad Mom G a G G Kid C C t C a G C t T 4

5 Phasing genotypes 5 Jessica Chong

6 The result of phasing by transmission 6 Jessica Chong

7 Phasing a VCF file by transmission with GATK Jessica Chong 7 Jessica Chong

8 Phasing a VCF file by transmission with GATK Jessica Chong G/G 7 Jessica Chong

9 Phasing a VCF file by transmission with GATK Jessica Chong G/G 7 Jessica Chong

10 Phasing a VCF file by transmission with GATK Jessica Chong G/G 7 Jessica Chong

11 Phasing a VCF file by transmission with GATK Jessica Chong G/G? 7 Jessica Chong

12 Phasing a VCF file by transmission with GEMINI 8

13 Phasing a VCF file by transmission with GEMINI G/G 8

14 Phasing a VCF file by transmission with GEMINI G/G 8

15 Phasing a VCF file by transmission with GEMINI G/G 8

16 Phasing a VCF file by transmission with GEMINI G/G? 8

17 Phasing by transmission C/C 9 * Convention for phased genotype is maternal allele first

18 Phasing by transmission C/C G A C T Both sites phasable: high confidence as deleterious alleles on different chromosomes 9 * Convention for phased genotype is maternal allele first

19 Phasing by transmission C/C G/A C/C G A C T Both sites phasable: high confidence as deleterious alleles on different chromosomes 9 * Convention for phased genotype is maternal allele first

20 Phasing by transmission C/C G/A C/C G A C T G A T C Both sites phasable: high confidence as deleterious alleles on different chromosomes Both sites phasable: yet exclude as deleterious alleles on same chromosomes 9 * Convention for phased genotype is maternal allele first

21 Phasing by transmission C/C G/A C/C G A C T G A T C Both sites phasable: high confidence as deleterious alleles on different chromosomes Both sites phasable: yet exclude as deleterious alleles on same chromosomes G/G 9 * Convention for phased genotype is maternal allele first

22 Phasing by transmission C/C G/A C/C G A C T G A T C Both sites phasable: high confidence as deleterious alleles on different chromosomes Both sites phasable: yet exclude as deleterious alleles on same chromosomes G/G G A? Only one site is phasable: lower confidence but cannot necessarily exclude. 9 * Convention for phased genotype is maternal allele first

23 Phasing by transmission C/C G/A C/C G A C T G A T C Both sites phasable: high confidence as deleterious alleles on different chromosomes Both sites phasable: yet exclude as deleterious alleles on same chromosomes G/G G A? Only one site is phasable: lower confidence but cannot necessarily exclude. 9 * Convention for phased genotype is maternal allele first

24 Phasing by transmission C/C G/A C/C G A C T G A T C Both sites phasable: high confidence as deleterious alleles on different chromosomes Both sites phasable: yet exclude as deleterious alleles on same chromosomes G/G G A??? Only one site is phasable: lower confidence but cannot necessarily exclude. Neither site is phasable: lower confidence but cannot necessarily exclude (recombination?). 9 * Convention for phased genotype is maternal allele first

25 Compound het test case 10 Jessica Chong

26 The comp_hets tool in GEMINI Requires a PED file #family_id sample_id paternal_id maternal_id sex phenotype family family family

27 Create a GEMINI database from a VCF Notes: 1. The VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP. $ curl tutorials/trio.trim.vep.vcf.gz > trio.trim.vep.vcf.gz $ curl tutorials/recessive.ped > recessive.ped $ gemini load - - cores 4\ - v trio.trim.vep.vcf.gz \ - t VEP \ - - skip- gene- tables \! - p recessive.ped \ trio.trim.vep.recessive.db Note: copy and paste the full command from the Github Gist to avoid errors

28 Running the comp_hets tool. gemini comp_hets trio.trim.vep.recessive.db Note: copy and paste the full command from the Github Gist 13

29 Again, we can limit the attributes returned w/ the --columns option. gemini comp_hets --columns "chrom, start, end, gene, impact, cadd_raw" trio.trim.vep.recessive.db Note: copy and paste the full command from the Github Gist chrom start end gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count comp_het_id priority chr AAK1 UTR_3_prime family1 1805;unaffected,1847;unaffected,4805;affected G/C,G/C,G/C _1638_ chr AAK1 UTR_5_prime family1 1805;unaffected,1847;unaffected,4805;affected G/G,G/A,G A _1638_ chr AAK1 UTR_3_prime family1 1805;unaffected,1847;unaffected,4805;affected A/C,A/C,A/C _1637_ chr AAK1 UTR_3_prime family1 1805;unaffected,1847;unaffected,4805;affected G/C,G/C,G/C _1637_ chr AAK1 UTR_3_prime family1 1805;unaffected,1847;unaffected,4805;affected T/T,T/T,T/C _1636_ chr AAK1 UTR_5_prime family1 1805;unaffected,1847;unaffected,4805;affected G/G,G/A,G A _1636_ chr AAK1 UTR_3_prime family1 1805;unaffected,1847;unaffected,4805;affected G/C,G/C,G/C _1638_ chr AAK1 UTR_5_prime None 1645 family1 1805;unaffected,1847;unaffected,4805;affected AT/A,AT/A,AT/A _1638_ chr AAK1 UTR_3_prime family1 1805;unaffected,1847;unaffected,4805;affected T/T,T/T,T/C _1636_

30 Start with highest priority compound heterozygote candidates C/C 15

31 Start with highest priority compound heterozygote candidates C/C G A C T Both sites phasable: high confidence as deleterious alleles on different chromosomes 15

32 Restrict to highest priority (i.e, priority==1) candidates $ gemini comp_hets \ --columns "chrom, start, end, gene, impact, cadd_raw" \ trio.trim.vep.recessive.db \ awk '$14==1' \ head chrom start end gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count comp_het_id priority chr ACAN non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected C/A,C/C,A C _9519_ chr ACAN non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected,,A G _9519_ chr ACAN non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected C/A,C/C,A C _9519_ chr ACAN splice_region family1 1805;unaffected,1847;unaffected,4805;affected G/G,G/A,G A _9519_ chr ACOXL non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected C/C,,C T _3247_ chr ACOXL non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected,C/C,T C _3247_ chr AKAP1 non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected,C/C,T C _13305_ chr AKAP1 synonymous_coding family1 1805;unaffected,1847;unaffected,4805;affected G/G,G/A,G A _13305_ chr ALK non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected T/T,T/G,T G _839_841 1 chr ALK non_syn_coding family1 1805;unaffected,1847;unaffected,4805;affected A/T,,T A _839_841 1 $ gemini comp_hets \ --columns "chrom, start, end, gene, impact, cadd_raw" \ trio.trim.vep.recessive.db \ awk '$14==1' \ wc -l 612 lines Note: copy and paste the full command from the Github Gist Each compund heterozygote is a set of two lines, so we have 306 (612 / 2) compound heterozygote candidates 16

33 So many candidates. Time to start --filtering! $ gemini comp_hets \ --columns "chrom, start, end, gene, impact, cadd_raw" \ --filter "impact_severity!= 'LOW'" \ trio.trim.vep.recessive.db \ awk '$14==1' \ wc -l Note: copy and paste the full command from the Github Gist 260 lines (130 comp_hets) 17

34 Use ESP and ExAC to focus on rare variants $ gemini comp_hets \ --columns "chrom, start, end, gene, impact, cadd_raw" \ --filter "impact_severity!= 'LOW' \ and ((aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) \ and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)) \ trio.trim.vep.recessive.db \ awk '$14==1' \ wc -l Note: copy and paste the full command from the Github Gist 8 lines, 4 comp_hets 18

35 Use ESP and ExAC to focus on rare variants $ gemini comp_hets \ --columns "chrom, start, end, gene, impact, cadd_raw" \ --filter "impact_severity!= 'LOW' \ and ((aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) \ and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)) \ trio.trim.vep.recessive.db \ awk '$14==1' Note: copy and paste the full command from the Github Gist chr GAA non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected T/T,T/C,T C _14401_ chr GAA non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected G/A,G/G,A G _14401_ chr HS6ST1 non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected C/A,C/C,A C _3657_ chr HS6ST1 non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected G/G,G/C,G C _3657_ chr PRR5- ARHGAP8 non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected G/G,G/T,G T _16838_ chr PRR5- ARHGAP8 non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected,C/C,T C _16838_ chr THSD4 non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected C/C,,C T _8777_ chr THSD4 non_syn_coding family1 1805;unaffected,1847;unaffected4805;affected G/A,G/G,A G _8777_

36 Load the following files into IGV (Load from URL) and inspect your candidates BAM alignment files:! tutorials/1805.workshop.bam tutorials/1847.workshop.bam tutorials/4805.workshop.bam VCF variant file:! tutorials/trio.trim.vep.vcf.gz! 20

37 Finding recessive genes with GEMINI assuming consanguinuity 21

38 The autosomal_recessive tool. 22

39 The autosomal_recessive tool. Default behavior: 23

40 The autosomal_recessive tool. The - - min- kindreds option: This specifies the number of families required to have a variant in the same gene in order for it to be reported. For example, we may only be interested in candidates where at least 2 families have a variant in that gene. 24

41 The autosomal_recessive tool. - - filter for variants with potential functional consequence: 25

42 The autosomal_recessive tool: other options The gt- pl- max option: In order to eliminate less confident genotypes, it is possible to enforce a maximum PL value for each sample. On this scale, lower values indicate more confidence that the called genotype is correct. 10 is a reasonable value: What is the PL? What is a Phred scaled genotype likelihood? 26

43 The autosomal_recessive tool: other options What is a Phred scaled genotype likelihood? Example calculation based on the GATK HaplotypeCaller 27

44 Runs of homozygosity Method 1: intersecting with previously known regions 28

45 Intersect with observed homozygosity region(s) (Example commands) 1. Tabix a BED file with the observed homozygosity regions bgzip homoz_region.bed tabix -p bed homoz_region.bed.gz 2. Use the annotate tool to flag variants that overlap these regions. gemini annotate -f homoz_region.bed.gz \ c homoz_region \ -t boolean \ AR.db 3. Filter variants for those that overlap these regions. gemini autosomal_recessive AR.db --columns "chrom, start, end, ref, alt, filter, qual, gene, impact, aaf_esp_ea, aaf_1kg_eur - filter "filter is NULL and aaf_esp_ea < 0.1 and (impact_severity = 'HIGH' or impact_severity = 'MED') and region ==1 29

46 Runs of homozygosity Method 2: search for runs of homozygosity 30

47 Intersect with observed homozygosity region(s) Run the roh tool to search for candidate runs of homozygosity. gemini roh AR.db 31

48 Intersect with observed homozygosity region(s) Run the roh tool to search for candidate runs of homozygosity. gemini roh AR.db sort -k7nr chrom start end sample num_of_snps density_per_kb run_length_in_bp! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S ! chr S

49 Caveats when screening for runs of homozygosity 1. Difficult with exome data. Density of markers. 2. Shorter runs of homozygosity happen often by chance. 3. Density of homozygotes is important. 33

Identifying dominant gene candidates with GEMINI

Identifying dominant gene candidates with GEMINI Identifying dominant gene candidates with GEMINI Aaron Quinlan University of Utah! quinlanlab.org Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted

More information

Prioritization: from vcf to finding the causative gene

Prioritization: from vcf to finding the causative gene Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for

More information

What is genetic variation?

What is genetic variation? enetic Variation Applied Computational enomics, Lecture 05 https://github.com/quinlan-lab/applied-computational-genomics Aaron Quinlan Departments of Human enetics and Biomedical Informatics USTAR Center

More information

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data. RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data. Adam Gudyś and Tomasz Stokowy October 30, 2017 Introduction The search for causative genetic variants in rare diseases

More information

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte BICF Variant Analysis Tools Using the BioHPC Workflow Launching Tool Astrocyte Prioritization of Variants SNP INDEL SV Astrocyte BioHPC Workflow Platform Allows groups to give easy-access to their analysis

More information

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines

More information

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test Copyrighted 2018 Zongxiao He & Suzanne M. Leal Introduction Many population-based rare-variant association tests, which aggregate

More information

Variant prioritization in NGS studies: Annotation and Filtering "

Variant prioritization in NGS studies: Annotation and Filtering Variant prioritization in NGS studies: Annotation and Filtering Colleen J. Saunders (PhD) DST/NRF Innovation Postdoctoral Research Fellow, South African National Bioinformatics Institute/MRC Unit for Bioinformatics

More information

Genomics: Human variation

Genomics: Human variation Genomics: Human variation Lecture 1 Introduction to Human Variation Dr Colleen J. Saunders, PhD South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development, University

More information

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.

More information

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR

More information

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti! Variant Analysis CB2-201 Computational Biology and Bioinformatics February 27, 2015 Emidio Capriotti http://biofold.org/emidio Division of Informatics Department of Pathology Variant Call Format The final

More information

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016 Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent

More information

Variant calling in NGS experiments

Variant calling in NGS experiments Variant calling in NGS experiments Jorge Jiménez jjimeneza@cipf.es BIER CIBERER Genomics Department Centro de Investigacion Principe Felipe (CIPF) (Valencia, Spain) 1 Index 1. NGS workflow 2. Variant calling

More information

Next Generation Sequencing: Data analysis for genetic profiling

Next Generation Sequencing: Data analysis for genetic profiling Next Generation Sequencing: Data analysis for genetic profiling Raed Samara, Ph.D. Global Product Manager Raed.Samara@QIAGEN.com Welcome to the NGS webinar series - 2015 NGS Technology Webinar 1 NGS: Introduction

More information

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS

More information

MPG NGS workshop I: SNP calling

MPG NGS workshop I: SNP calling MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula

More information

From raw reads to variants

From raw reads to variants From raw reads to variants Sebastian DiLorenzo Sebastian.DiLorenzo@NBIS.se Talk Overview Concepts Reference genome Variants Paired-end data NGS Workflow Quality control & Trimming Alignment Local realignment

More information

Novel Variant Discovery Tutorial

Novel Variant Discovery Tutorial Novel Variant Discovery Tutorial Release 8.4.0 Golden Helix, Inc. August 12, 2015 Contents Requirements 2 Download Annotation Data Sources...................................... 2 1. Overview...................................................

More information

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat)) Authors: Klaas J. Wierenga, MD & Zhijie Jiang, PhD First edition, May 2013 0 Introduction The Human Genome Clinical Annotation

More information

Answers to additional linkage problems.

Answers to additional linkage problems. Spring 2013 Biology 321 Answers to Assignment Set 8 Chapter 4 http://fire.biol.wwu.edu/trent/trent/iga_10e_sm_chapter_04.pdf Answers to additional linkage problems. Problem -1 In this cell, there two copies

More information

Exploring genomic databases: Practical session "

Exploring genomic databases: Practical session Exploring genomic databases: Practical session Work through the following practical exercises on your own. The objective of these exercises is to become familiar with the information available in each

More information

Assignment 9: Genetic Variation

Assignment 9: Genetic Variation Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant

More information

Edexcel (B) Biology A-level

Edexcel (B) Biology A-level Edexcel (B) Biology A-level Topic 8: Origins of Genetic Variation Notes Meiosis is reduction division. The main role of meiosis is production of haploid gametes as cells produced by meiosis have half the

More information

Hardy Weinberg Equilibrium

Hardy Weinberg Equilibrium Gregor Mendel Hardy Weinberg Equilibrium Lectures 4-11: Mechanisms of Evolution (Microevolution) Hardy Weinberg Principle (Mendelian Inheritance) Genetic Drift Mutation Sex: Recombination and Random Mating

More information

Evidence of Purifying Selection in Humans. John Long Mentor: Angela Yen (Kellis Lab)

Evidence of Purifying Selection in Humans. John Long Mentor: Angela Yen (Kellis Lab) Evidence of Purifying Selection in Humans John Long Mentor: Angela Yen (Kellis Lab) Outline Background Genomes Expression Regulation Selection Goal Methods Progress Future Work Outline Background Genomes

More information

Supplementary Figures

Supplementary Figures 1 Supplementary Figures exm26442 2.40 2.20 2.00 1.80 Norm Intensity (B) 1.60 1.40 1.20 1 0.80 0.60 0.40 0.20 2 0-0.20 0 0.20 0.40 0.60 0.80 1 1.20 1.40 1.60 1.80 2.00 2.20 2.40 2.60 2.80 Norm Intensity

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016

Variant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Variant Finding UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Types of Variants Adapted from Alkan et al, Nature Reviews Genetics 2011 Why Look For Variants? Genotyping Correlation with

More information

Autozygosity by difference a method for locating autosomal recessive mutations. Geoff Pollott

Autozygosity by difference a method for locating autosomal recessive mutations. Geoff Pollott Autozygosity by difference a method for locating autosomal recessive mutations Geoff Pollott Background Mutations occur regularly in all species Autosomal recessive conditions arise in most breeds from

More information

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare

More information

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE : GENETIC DATA UPDATE April 30, 2014 Biomarker Network Meeting PAA Jessica Faul, Ph.D., M.P.H. Health and Retirement Study Survey Research Center Institute for Social Research University of Michigan HRS

More information

COMPUTER SIMULATIONS AND PROBLEMS

COMPUTER SIMULATIONS AND PROBLEMS Exercise 1: Exploring Evolutionary Mechanisms with Theoretical Computer Simulations, and Calculation of Allele and Genotype Frequencies & Hardy-Weinberg Equilibrium Theory INTRODUCTION Evolution is defined

More information

Exome Sequencing and Disease Gene Search

Exome Sequencing and Disease Gene Search Exome Sequencing and Disease Gene Search Erzurumluoglu AM, Rodriguez S, Shihab HA, Baird D, Richardson TG, Day IN, Gaunt TR. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation

More information

Mining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo

Mining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo Mining GWAS Catalog & 1000 Genomes Dataset Segun Fatumo What is GWAS Catalog NHGRI GWA Catalog www.genome.gov/gwastudies Citation How to cite the NHGRI GWAS Catalog: Hindorff LA, MacArthur J (European

More information

Using VarSeq to Improve Variant Analysis Research

Using VarSeq to Improve Variant Analysis Research Using VarSeq to Improve Variant Analysis Research June 10, 2015 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda 1 Variant

More information

LECTURE 5: LINKAGE AND GENETIC MAPPING

LECTURE 5: LINKAGE AND GENETIC MAPPING LECTURE 5: LINKAGE AND GENETIC MAPPING Reading: Ch. 5, p. 113-131 Problems: Ch. 5, solved problems I, II; 5-2, 5-4, 5-5, 5.7 5.9, 5-12, 5-16a; 5-17 5-19, 5-21; 5-22a-e; 5-23 The dihybrid crosses that we

More information

MedSavant: An open source platform for personal genome interpretation

MedSavant: An open source platform for personal genome interpretation MedSavant: An open source platform for personal genome interpretation Marc Fiume 1, James Vlasblom 2, Ron Ammar 3, Orion Buske 1, Eric Smith 1, Andrew Brook 1, Sergiu Dumitriu 2, Christian R. Marshall

More information

VARIANT ANNOTATION. Vivien Deshaies.

VARIANT ANNOTATION. Vivien Deshaies. VARIANT ANNOTATION Vivien Deshaies vivien.deshaies@icm-institute.org Goal Add meta-information on variant to facilitate interpretation Location TSS Exon Intron Exon Intron Exon 5 3 upstream Donor Acceptor

More information

Variant Quality Score Recalibra2on

Variant Quality Score Recalibra2on talks Variant Quality Score Recalibra2on Assigning accurate confidence scores to each puta2ve muta2on call You are here in the GATK Best Prac2ces workflow for germline variant discovery Data Pre-processing

More information

Genomic resources. for non-model systems

Genomic resources. for non-model systems Genomic resources for non-model systems 1 Genomic resources Whole genome sequencing reference genome sequence comparisons across species identify signatures of natural selection population-level resequencing

More information

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically

More information

1. What is the wild-type phenotype? 2. What is the assay? 3. Who are the mutants? Fascinated, Garrod decided to study the disease more.

1. What is the wild-type phenotype? 2. What is the assay? 3. Who are the mutants? Fascinated, Garrod decided to study the disease more. B. Pedigrees In 1901 a physician in London named Archibald Garrod had some new patients with an unusual condition: when their urine came into contact with air it turned black. 1. What is the wild-type

More information

Analytics Behind Genomic Testing

Analytics Behind Genomic Testing A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

PLINK gplink Haploview

PLINK gplink Haploview PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,

More information

Oral Cleft Targeted Sequencing Project

Oral Cleft Targeted Sequencing Project Oral Cleft Targeted Sequencing Project Oral Cleft Group January, 2013 Contents I Quality Control 3 1 Summary of Multi-Family vcf File, Jan. 11, 2013 3 2 Analysis Group Quality Control (Proposed Protocol)

More information

Genome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014

Genome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014 Genome STRiP ASHG Workshop demo materials Bob Handsaker October 19, 2014 Running Genome STRiP directly on AWS Genome STRiP Structure in Populations Popula'on)aware-discovery-andgenotyping-of-structural-varia'onfrom-whole)genome-sequencing-

More information

Module 2: Introduction to PLINK and Quality Control

Module 2: Introduction to PLINK and Quality Control Module 2: Introduction to PLINK and Quality Control 1 Introduction to PLINK 2 Quality Control 1 Introduction to PLINK 2 Quality Control Single Nucleotide Polymorphism (SNP) A SNP (pronounced snip) is a

More information

LD Mapping and the Coalescent

LD Mapping and the Coalescent Zhaojun Zhang zzj@cs.unc.edu April 2, 2009 Outline 1 Linkage Mapping 2 Linkage Disequilibrium Mapping 3 A role for coalescent 4 Prove existance of LD on simulated data Qualitiative measure Quantitiave

More information

NGS in Pathology Webinar

NGS in Pathology Webinar NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical

More information

Introduction to Genetics. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012

Introduction to Genetics. Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Introduction to Genetics Bruce Walsh lecture notes Uppsala EQG course version 28 Jan 2012 Darwin and Mendel Mendel genetics Topics Mendel's experiments Mendel's laws Genes and chromosomes Linkage Prior

More information

Genomic management of inbreeding in breeding schemes

Genomic management of inbreeding in breeding schemes Genomic management of inbreeding in breeding schemes Theo Meuwissen, Anna Sonesson, John Woolliams Norwegian University of Life Sciences, Ås, Norway. NOFIMA, Ås, Norway Roslin Institute, Edinburgh, UK

More information

4) How many alleles does each individual carry? 5) How many total alleles do we need to create this population?

4) How many alleles does each individual carry? 5) How many total alleles do we need to create this population? SC135 Introductory Biology Hardy-Weinberg and Natural Selection with M & M s Lab Objectives: Understand the concepts of allele frequency, genotype frequency and phenotype frequency in a population. Understand

More information

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

-Genes on the same chromosome are called linked. Human -23 pairs of chromosomes, ~35,000 different genes expressed.

-Genes on the same chromosome are called linked. Human -23 pairs of chromosomes, ~35,000 different genes expressed. Linkage -Genes on the same chromosome are called linked Human -23 pairs of chromosomes, ~35,000 different genes expressed. - average of 1,500 genes/chromosome Following Meiosis Parental chromosomal types

More information

Shaare Zedek Medical Center (SZMC) Gaucher Clinic. Peripheral blood samples were collected from each

Shaare Zedek Medical Center (SZMC) Gaucher Clinic. Peripheral blood samples were collected from each SUPPLEMENTAL METHODS Sample collection and DNA extraction Pregnant Ashkenazi Jewish (AJ) couples, carrying mutation/s in the GBA gene, were recruited at the Shaare Zedek Medical Center (SZMC) Gaucher Clinic.

More information

Annotation of Genetic Variants

Annotation of Genetic Variants Annotation of Genetic Variants Valerie Obenchain Fred Hutchinson Cancer Research Center 27-28 February 2012 Read VCF Files Structural location of variants Amino acid coding changes Extras Outline Read

More information

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros DNA Collection Data Quality Control Suzanne M. Leal Baylor College of Medicine sleal@bcm.edu Copyrighted S.M. Leal 2016 Blood samples For unlimited supply of DNA Transformed cell lines Buccal Swabs Small

More information

The Revolution in Human Genetics: Deciphering Complexity. David Galas Institute for Systems Biology, Seattle, WA

The Revolution in Human Genetics: Deciphering Complexity. David Galas Institute for Systems Biology, Seattle, WA The Revolution in Human Genetics: Deciphering Complexity David Galas Institute for Systems Biology, Seattle, WA Genetics and Environment integration is key to future medicine Genome Blood proteins, mirna,

More information

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

Variant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017

Variant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017 Variant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017 Up-front acknowledgments Many figures/slides come from: GATK Workshop slides: http://www.broadinstitute.org/gatk/guide/events?id=2038

More information

Introduc)on to Genomics

Introduc)on to Genomics Introduc)on to Genomics Libor Mořkovský, Václav Janoušek, Anastassiya Zidkova, Anna Přistoupilová, Filip Sedlák h1p://ngs-course.readthedocs.org/en/praha-january-2017/ Genome The genome is the gene,c material

More information

Mapping errors require re- alignment

Mapping errors require re- alignment RE- ALIGNMENT Mapping errors require re- alignment Source: Heng Li, presenta8on at GSA workshop 2011 Alignment Key component of alignment algorithm is the scoring nega8ve contribu8on to score opening a

More information

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work Genome 371, 1 March 2010, Lecture 13 Association Mapping Mendelian versus Complex Phenotypes How to Perform an Association Study Why Association Studies (Can) Work Introduction to LOD score analysis Common

More information

BIOLOGY - CLUTCH CH.14 - MENDELIAN GENETICS.

BIOLOGY - CLUTCH CH.14 - MENDELIAN GENETICS. !! www.clutchprep.com CONCEPT: MENDEL S EXPERIMENT Gregor Mendel designed an experiment to study inheritance in pea plants. Character a feature that can be inherited, and shows variation between individuals

More information

Selecting TILLING mutants

Selecting TILLING mutants Selecting TILLING mutants The following document will explain how to select TILLING mutants for your gene(s) of interest. To begin, you will need the IWGSC gene model identifier for your gene(s), the IWGSC

More information

Trimethylaminuria (TMAU) Yiran Guo, Ph.D. Center for Applied Genomics Children's Hospital of Philadelphia

Trimethylaminuria (TMAU) Yiran Guo, Ph.D. Center for Applied Genomics Children's Hospital of Philadelphia Trimethylaminuria (TMAU) Yiran Guo, Ph.D. Center for Applied Genomics Children's Hospital of Philadelphia TMAU Genetics Background in Human Genetics Human genome variants and methods to detect them Rare

More information

Human linkage analysis. fundamental concepts

Human linkage analysis. fundamental concepts Human linkage analysis fundamental concepts Genes and chromosomes Alelles of genes located on different chromosomes show independent assortment (Mendel s 2nd law) For 2 genes: 4 gamete classes with equal

More information

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager

Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service. Dr. Ruth Burton Product Manager Introducing combined CGH and SNP arrays for cancer characterisation and a unique next-generation sequencing service Dr. Ruth Burton Product Manager Today s agenda Introduction CytoSure arrays and analysis

More information

Genetics II: Linkage and the Chromosomal Theory

Genetics II: Linkage and the Chromosomal Theory Genetics II: Linkage and the Chromosomal Theory An individual has two copies of each particle of inheritance (gene). These two copies separate during the formation of gametes and come together when the

More information

Papers for 11 September

Papers for 11 September Papers for 11 September v Kreitman M (1983) Nucleotide polymorphism at the alcohol-dehydrogenase locus of Drosophila melanogaster. Nature 304, 412-417. v Hishimoto et al. (2010) Alcohol and aldehyde dehydrogenase

More information

Germline variant calling and joint genotyping

Germline variant calling and joint genotyping talks Germline variant calling and joint genotyping Applying the joint discovery workflow with HaplotypeCaller + GenotypeGVCFs You are here in the GATK Best PracDces workflow for germline variant discovery

More information

Basic Concepts of Human Genetics

Basic Concepts of Human Genetics Basic Concepts of Human Genetics The genetic information of an individual is contained in 23 pairs of chromosomes. Every human cell contains the 23 pair of chromosomes. One pair is called sex chromosomes

More information

QTL Mapping, MAS, and Genomic Selection

QTL Mapping, MAS, and Genomic Selection QTL Mapping, MAS, and Genomic Selection Dr. Ben Hayes Department of Primary Industries Victoria, Australia A short-course organized by Animal Breeding & Genetics Department of Animal Science Iowa State

More information

How to use Variant Effects Report

How to use Variant Effects Report How to use Variant Effects Report A. Introduction to Ensembl Variant Effect Predictor B. Using RefSeq_v1 C. Using TGACv1 A. Introduction The Ensembl Variant Effect Predictor is a toolset for the analysis,

More information

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

Linkage & Genetic Mapping in Eukaryotes. Ch. 6 Linkage & Genetic Mapping in Eukaryotes Ch. 6 1 LINKAGE AND CROSSING OVER! In eukaryotic species, each linear chromosome contains a long piece of DNA A typical chromosome contains many hundred or even

More information

7.03 Problem Set 1 Solutions

7.03 Problem Set 1 Solutions 7.03 Problem Set 1 Solutions 1. a. Crossing each yeast haploid mutant to wild-type will tell you whether the mutation is recessive or dominant to wild-type. If the diploid is wild-type phenotype, then

More information

CITATION FILE CONTENT / FORMAT

CITATION FILE CONTENT / FORMAT CITATION 1) For any resultant publications using single samples please cite: Matthew A. Field, Vicky Cho, T. Daniel Andrews, and Chris C. Goodnow (2015). "Reliably detecting clinically important variants

More information

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.

More information

RP1: AN EXAMPLE OF REVERSE GENETICS APPROACH TO DESCRIBE COMMON RECESSIVE DEFECTS

RP1: AN EXAMPLE OF REVERSE GENETICS APPROACH TO DESCRIBE COMMON RECESSIVE DEFECTS RP1: AN EXAMPLE OF REVERSE GENETICS APPROACH TO DESCRIBE COMMON RECESSIVE DEFECTS C. Grohs GABI, INRA, AgroParisTech, Université Paris Saclay 28 / 08 / 2017 Outbreaks of recessive defects as a consequence

More information

ANNOVAR Variant Annotation and Interpretation

ANNOVAR Variant Annotation and Interpretation 1 ANNOVAR Variant Annotation and Interpretation Copyrighted 2018 Isabelle Schrauwen and Suzanne M. Leal This exercise touches on several functionalities of the program ANNOVAR to annotate and interpret

More information

Lesson Overview. What would happen when genetics answered questions about how heredity works?

Lesson Overview. What would happen when genetics answered questions about how heredity works? 17.1 Darwin developed his theory of evolution without knowing how heritable traits passed from one generation to the next or where heritable variation came from. What would happen when genetics answered

More information

The Evolution of Populations

The Evolution of Populations The Evolution of Populations What you need to know How and reproduction each produce genetic. The conditions for equilibrium. How to use the Hardy-Weinberg equation to calculate allelic and to test whether

More information

Supplementary information ATLAS

Supplementary information ATLAS Supplementary information ATLAS Vivian Link, Athanasios Kousathanas, Krishna Veeramah, Christian Sell, Amelie Scheu and Daniel Wegmann Section 1: Complete list of functionalities Sequence data processing

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Mendel & Inheritance. SC.912.L.16.1 Use Mendel s laws of segregation and independent assortment to analyze patterns of inheritance.

Mendel & Inheritance. SC.912.L.16.1 Use Mendel s laws of segregation and independent assortment to analyze patterns of inheritance. Mendel & Inheritance SC.912.L.16.1 Use Mendel s laws of segregation and independent assortment Mendel s Law of Segregation: gene pairs separate when gametes (sex cells) are formed; each gamete as only

More information

Briefly, this exercise can be summarised by the follow flowchart:

Briefly, this exercise can be summarised by the follow flowchart: Workshop exercise Data integration and analysis In this exercise, we would like to work out which GWAS (genome-wide association study) SNP associated with schizophrenia is most likely to be functional.

More information

Population and Community Dynamics. The Hardy-Weinberg Principle

Population and Community Dynamics. The Hardy-Weinberg Principle Population and Community Dynamics The Hardy-Weinberg Principle Key Terms Population: same species, same place, same time Gene: unit of heredity. Controls the expression of a trait. Can be passed to offspring.

More information

Whole Genome Sequencing. Biostatistics 666

Whole Genome Sequencing. Biostatistics 666 Whole Genome Sequencing Biostatistics 666 Genomewide Association Studies Survey 500,000 SNPs in a large sample An effective way to skim the genome and find common variants associated with a trait of interest

More information

Normal-Tumor Comparison using Next-Generation Sequencing Data

Normal-Tumor Comparison using Next-Generation Sequencing Data Normal-Tumor Comparison using Next-Generation Sequencing Data Chun Li Vanderbilt University Taichung, March 16, 2011 Next-Generation Sequencing First-generation (Sanger sequencing): 115 kb per day per

More information

1. A dihybrid YyZz is test crossed. The following phenotypic classes are observed:

1. A dihybrid YyZz is test crossed. The following phenotypic classes are observed: Problem Set 4 Genetics 371 Winter 2010 1. A dihybrid YyZz is test crossed. The following phenotypic classes are observed: 442 Yz 458 yz 46 YZ 54 yz (a) What is the parental type of the heterozygous parent?

More information

Runs of Homozygosity Analysis Tutorial

Runs of Homozygosity Analysis Tutorial Runs of Homozygosity Analysis Tutorial Release 8.7.0 Golden Helix, Inc. March 22, 2017 Contents 1. Overview of the Project 2 2. Identify Runs of Homozygosity 6 Illustrative Example...............................................

More information

SNPassoc: an R package to perform whole genome association studies

SNPassoc: an R package to perform whole genome association studies SNPassoc: an R package to perform whole genome association studies Juan R González, Lluís Armengol, Xavier Solé, Elisabet Guinó, Josep M Mercader, Xavier Estivill, Víctor Moreno November 16, 2006 Contents

More information

Release Notes for Genomes Processed Using Complete Genomics Software

Release Notes for Genomes Processed Using Complete Genomics Software Release Notes for Genomes Processed Using Complete Genomics Software Software Version 2.4 Related Documents... 1 Changes to Version 2.4... 2 Changes to Version 2.2... 4 Changes to Version 2.0... 5 Changes

More information

7.012 Problem Set 2. c) If an HhAa unicorn mates with an hhaa unicorn, what fraction of the progeny will be short and brown?

7.012 Problem Set 2. c) If an HhAa unicorn mates with an hhaa unicorn, what fraction of the progeny will be short and brown? Name 7.012 Problem Set 2 Section Question 1 In unicorns, coat color (brown or white) is controlled by a single gene with two alleles, A and a. The brown phenotype is dominant over the white phenotype.

More information