Identifying dominant gene candidates with GEMINI

Size: px
Start display at page:

Download "Identifying dominant gene candidates with GEMINI"

Transcription

1 Identifying dominant gene candidates with GEMINI Aaron Quinlan University of Utah! quinlanlab.org Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted from this Gist 1

2 Autosomal dominant pedigrees 2 Jessica Chong

3 Create a PED file (done for you) 3 Jessica Chong

4 Create a GEMINI database from a VCF Notes: 1. The VCF has been normalized and decomposed with VT 2. The VCF has been annotated with VEP. $ curl tutorials/trio.trim.vep.vcf.gz > trio.trim.vep.vcf.gz $ curl tutorials/dominant.ped > dominant.ped $ gemini load - - cores 4 \ - v trio.trim.vep.vcf.gz \ - t VEP \ - - skip- gene- tables \! - p dominant.ped \ trio.trim.vep.dominant.db to avoid errors 4

5 Running the autosomal_dominant tool $ gemini autosomal_dominant --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ head \ column -t chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr T C AC mature_mirna family1 1805,1847,4805 T/C,T/T,T/C 1805, chr G C AC downstream family1 1805,1847,4805 G/C,G/G,G/C 1805, chr G A AC upstream family1 1805,1847,4805 G/A,G/G,G/A 1805, chr C T AC upstream family1 1805,1847,4805 C/T,C/C,C/T 1805, chr C T AC upstream family1 1805,1847,4805 C/T,C/C,C/T 1805, chr C G AC mature_mirna family1 1805,1847,4805 C/G,C/C,C/G 1805, chr C A ACAN non_syn_coding family1 1805,1847,4805 C/A,C/C,C/A 1805, chr G A ADAMTS17 synonymous_coding family1 1805,1847,4805 G/A,G/G,G/A 1805, chr A G ADAMTS17 non_syn_coding family1 1805,1847,4805 A/G,A/A,A/G 1805,

6 Time to start filtering $ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ wc -l 1515 candidates 6

7 Again, use the - - filter option Exclude variants that failed GATK filters $ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL" \ wc -l 1288 candidates 7

8 Further restrict variants with functional consequence $ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity!= 'LOW'" \ wc -l 347 candidates 8

9 Use ESP and ExAC to focus on rare variants $ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity!= 'LOW' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" wc -l 40 candidates 9

10 Let s be more strict with the functional consequence $ gemini autosomal_dominant \ --columns "chrom, start, end, ref, alt, gene, impact, cadd_raw" \ --filter "filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" wc -l chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr T C CPD stop_loss family1 1805,1847,4805 T/C,T/T,T/C 1805, chr G A APOB stop_gain family1 1805,1847,4805 G/A,G/G,G/A 1805, chr T C TUBA3FP splice_acceptor family1 1805,1847,4805 T/C,T/T,T/C 1805,

11 Are any of these variants known to underlie a clinical phenotype? How could you extend the query from the previous slide to answer this? Hint: use the gemini documentation chrom start end ref alt gene impact cadd_raw variant_id family_id family_members family_genotypes samples family_count chr T C CPD stop_loss family1 1805,1847,4805 T/C,T/T,T/C 1805, chr G A APOB stop_gain family1 1805,1847,4805 G/A,G/G,G/A 1805, chr T C TUBA3FP splice_acceptor family1 1805,1847,4805 T/C,T/T,T/C 1805,

12 Custom analyses: Use the query module to identify autosomal dominant variants 12

13 Using the - - gt- filter We need to use the query module to enforce an autosomal dominant inheritance pattern. 13

14 Using the - - gt- filter $ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, (gts).(*) \ FROM variants" \ --header \ --gt-filter "gt_types.4805 == HET \ and gt_types.1805 == HET \ and gt_types.1847 == HOM_REF" \ head \ column -t chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 chr A G SH3YL1 intron A/G A/A A/G chr G A ACP1 downstream G/A G/G G/A chr G T TMEM18 intron G/T G/G G/T chr A T AC upstream A/T A/A A/T chr C T PXDN UTR_3_prime C/T C/C C/T chr T C PXDN intron T/C T/T T/C chr C A COLEC11 intron C/A C/C C/A chr G A COLEC11 intron G/A G/G G/A chr G A COLEC11 intron G/A G/G G/A 14

15 What about large pedigrees or multiple families? --gt- filter wildcards 15

16 Using the --gt- filter wildcards $ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, (gts).(*) \ FROM variants" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==het).(all) \ and (gt_types).(phenotype==1).(==hom_ref).(all) \ head \ column -t Affected individuals must be HET Unffected individuals must be HOM_REF chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 chr A G SH3YL1 intron A/G A/A A/G chr G A ACP1 downstream G/A G/G G/A chr G T TMEM18 intron G/T G/G G/T chr A T AC upstream A/T A/A A/T chr C T PXDN UTR_3_prime C/T C/C C/T chr T C PXDN intron T/C T/T T/C chr C A COLEC11 intron C/A C/C C/A chr G A COLEC11 intron G/A G/G G/A chr G A COLEC11 16 intron G/A G/G G/A

17 Can apply multiple --gt- filter wildcards $ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==het).(all) \ and (gt_types).(phenotype==1).(==hom_ref).(all) \ and (gt_depths).(*).(>=20).(all)" \ head \ column -t Affected individuals must be HET Unaffected individuals must be HOM_REF Everyone must have sequence depth >=20 chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 gt_depths.1805 gt_depths.1847 gt_depths.4805 chr A G SH3YL1 intron A/G A/A A/G chr G A ACP1 downstream G/A G/G G/A chr A T AC upstream A/T A/A A/T chr C T PXDN UTR_3_prime C/T C/C C/T chr T C PXDN intron T/C T/T T/C chr C A COLEC11 intron C/A C/C C/A chr G A COLEC11 intron G/A G/G G/A chr G A COLEC11 intron G/A G/G G/A chr T A LINC00487 intron T/A T/T T/A

18 We have the inheritance model, now apply the annotation filters $ gemini query \ -q "SELECT chrom, start, end, ref, alt, gene, impact, \ (gts).(*), (gt_depths).(*) \ FROM variants \ WHERE filter is NULL and impact_severity == 'HIGH' and (aaf_esp_ea <= 0.01 or aaf_esp_ea is NULL) and (aaf_exac_all <= 0.01 or aaf_exac_all is NULL)" \ --header \ --gt-filter "(gt_types).(phenotype==2).(==het).(all) \ and (gt_types).(phenotype==1).(==hom_ref).(all) \ and (gt_depths).(*).(>=20).(all)" \ column -t chrom start end ref alt gene impact gts.1805 gts.1847 gts.4805 gt_depths.1805 gt_depths.1847 gt_depths.4805 chr G A APOB stop_gain G/A G/G G/A chr T C CPD stop_loss T/C T/T T/C chr T C TUBA3FP splice_acceptor T/C T/T T/C Note that (pleasingly) these are the same three candidates as detected with the autosomal_dominant tool. 18

19 Load the following files into IGV (Load from URL) and inspect your candidates BAM alignment files:! tutorials/1805.workshop.bam tutorials/1847.workshop.bam tutorials/4805.workshop.bam VCF variant file:! tutorials/trio.trim.vep.vcf.gz! 19

Identifying recessive gene candidates with GEMINI

Identifying recessive gene candidates with GEMINI Identifying recessive gene candidates with GEMINI Aaron Quinlan University of Utah! quinlanlab.org Please refer to the following Github Gist to find each command for this session. Commands should be copy/pasted

More information

Prioritization: from vcf to finding the causative gene

Prioritization: from vcf to finding the causative gene Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for

More information

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016

Annotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016 Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation

More information

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte

BICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte BICF Variant Analysis Tools Using the BioHPC Workflow Launching Tool Astrocyte Prioritization of Variants SNP INDEL SV Astrocyte BioHPC Workflow Platform Allows groups to give easy-access to their analysis

More information

Variant calling in NGS experiments

Variant calling in NGS experiments Variant calling in NGS experiments Jorge Jiménez jjimeneza@cipf.es BIER CIBERER Genomics Department Centro de Investigacion Principe Felipe (CIPF) (Valencia, Spain) 1 Index 1. NGS workflow 2. Variant calling

More information

What is genetic variation?

What is genetic variation? enetic Variation Applied Computational enomics, Lecture 05 https://github.com/quinlan-lab/applied-computational-genomics Aaron Quinlan Departments of Human enetics and Biomedical Informatics USTAR Center

More information

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer Project XX Customer Detail Table of Contents. Bioinformatics analysis pipeline...3.. Read quality check. 3.2. Read alignment...3.3.

More information

Variant prioritization in NGS studies: Annotation and Filtering "

Variant prioritization in NGS studies: Annotation and Filtering Variant prioritization in NGS studies: Annotation and Filtering Colleen J. Saunders (PhD) DST/NRF Innovation Postdoctoral Research Fellow, South African National Bioinformatics Institute/MRC Unit for Bioinformatics

More information

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4

Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent

More information

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data.

RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data. RareVariantVis 2: R suite for analysis of rare variants in whole genome sequencing data. Adam Gudyś and Tomasz Stokowy October 30, 2017 Introduction The search for causative genetic variants in rare diseases

More information

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017

Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS

More information

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es

SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR

More information

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD

USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat) uthors: Klaas J. Wierenga, MD & Zhijie Jiang, P PhD USER MANUAL for the use of the human Genome Clinical Annotation Tool (h-gcat)) Authors: Klaas J. Wierenga, MD & Zhijie Jiang, PhD First edition, May 2013 0 Introduction The Human Genome Clinical Annotation

More information

Genomics: Human variation

Genomics: Human variation Genomics: Human variation Lecture 1 Introduction to Human Variation Dr Colleen J. Saunders, PhD South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development, University

More information

Exploring genomic databases: Practical session "

Exploring genomic databases: Practical session Exploring genomic databases: Practical session Work through the following practical exercises on your own. The objective of these exercises is to become familiar with the information available in each

More information

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.

Ensembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically

More information

Next Generation Sequencing: Data analysis for genetic profiling

Next Generation Sequencing: Data analysis for genetic profiling Next Generation Sequencing: Data analysis for genetic profiling Raed Samara, Ph.D. Global Product Manager Raed.Samara@QIAGEN.com Welcome to the NGS webinar series - 2015 NGS Technology Webinar 1 NGS: Introduction

More information

VARIANT ANNOTATION. Vivien Deshaies.

VARIANT ANNOTATION. Vivien Deshaies. VARIANT ANNOTATION Vivien Deshaies vivien.deshaies@icm-institute.org Goal Add meta-information on variant to facilitate interpretation Location TSS Exon Intron Exon Intron Exon 5 3 upstream Donor Acceptor

More information

From raw reads to variants

From raw reads to variants From raw reads to variants Sebastian DiLorenzo Sebastian.DiLorenzo@NBIS.se Talk Overview Concepts Reference genome Variants Paired-end data NGS Workflow Quality control & Trimming Alignment Local realignment

More information

Novel Variant Discovery Tutorial

Novel Variant Discovery Tutorial Novel Variant Discovery Tutorial Release 8.4.0 Golden Helix, Inc. August 12, 2015 Contents Requirements 2 Download Annotation Data Sources...................................... 2 1. Overview...................................................

More information

MPG NGS workshop I: SNP calling

MPG NGS workshop I: SNP calling MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula

More information

SNP calling and VCF format

SNP calling and VCF format SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide

More information

Course Presentation. Ignacio Medina Presentation

Course Presentation. Ignacio Medina Presentation Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital

More information

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/

More information

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014 Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide

More information

Annotation of Genetic Variants

Annotation of Genetic Variants Annotation of Genetic Variants Valerie Obenchain Fred Hutchinson Cancer Research Center 27-28 February 2012 Read VCF Files Structural location of variants Amino acid coding changes Extras Outline Read

More information

Selecting TILLING mutants

Selecting TILLING mutants Selecting TILLING mutants The following document will explain how to select TILLING mutants for your gene(s) of interest. To begin, you will need the IWGSC gene model identifier for your gene(s), the IWGSC

More information

Genetic Testing and Analysis. (858) MRN: Specimen: Saliva Received: 07/26/2016 GENETIC ANALYSIS REPORT

Genetic Testing and Analysis. (858) MRN: Specimen: Saliva Received: 07/26/2016 GENETIC ANALYSIS REPORT GBinsight Sample Name: GB4408 Race: East Asian Gender: Female Reason for Testing: Family history of premature CAD MRN: 0123456790 Specimen: Saliva Received: 07/26/2016 Test ID: 113-1487118782-1 Test: Dyslipidemia

More information

Exome Sequencing and Disease Gene Search

Exome Sequencing and Disease Gene Search Exome Sequencing and Disease Gene Search Erzurumluoglu AM, Rodriguez S, Shihab HA, Baird D, Richardson TG, Day IN, Gaunt TR. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation

More information

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!

Variant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti! Variant Analysis CB2-201 Computational Biology and Bioinformatics February 27, 2015 Emidio Capriotti http://biofold.org/emidio Division of Informatics Department of Pathology Variant Call Format The final

More information

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with

More information

Figure S4 A-H : Initiation site properties and evolutionary changes

Figure S4 A-H : Initiation site properties and evolutionary changes A 0.3 Figure S4 A-H : Initiation site properties and evolutionary changes G-correction not used 0.25 Fraction of total counts 0.2 0.5 0. tag 2 tags 3 tags 4 tags 5 tags 6 tags 7tags 8tags 9 tags >9 tags

More information

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl

Bioinformatics small variants Data Analysis. Guidelines. genomescan.nl Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines

More information

In silico variant analysis: Challenges and Pitfalls

In silico variant analysis: Challenges and Pitfalls In silico variant analysis: Challenges and Pitfalls Fiona Cunningham Variation annotation coordinator EMBL-EBI www.ensembl.org Sequencing -> Variants -> Interpretation Structural variants SNP? In-dels

More information

Training materials.

Training materials. Training materials - Ensembl training materials are protected by a CC BY license - http://creativecommons.org/licenses/by/4.0/ - If you wish to re-use these materials, please credit Ensembl for their creation

More information

Mapping errors require re- alignment

Mapping errors require re- alignment RE- ALIGNMENT Mapping errors require re- alignment Source: Heng Li, presenta8on at GSA workshop 2011 Alignment Key component of alignment algorithm is the scoring nega8ve contribu8on to score opening a

More information

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test

RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test Copyrighted 2018 Zongxiao He & Suzanne M. Leal Introduction Many population-based rare-variant association tests, which aggregate

More information

Bioinformatics Core Facility IDENTIFYING A DISEASE CAUSING MUTATION

Bioinformatics Core Facility IDENTIFYING A DISEASE CAUSING MUTATION IDENTIFYING A DISEASE CAUSING MUTATION MARCELA DAVILA 2/03/2017 Core Facilities at Sahlgrenska Academy www.cf.gu.se 5 statisticians, 3 bioinformaticians Consultation 7-8 Courses / year Contact information

More information

PLINK gplink Haploview

PLINK gplink Haploview PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,

More information

MedSavant: An open source platform for personal genome interpretation

MedSavant: An open source platform for personal genome interpretation MedSavant: An open source platform for personal genome interpretation Marc Fiume 1, James Vlasblom 2, Ron Ammar 3, Orion Buske 1, Eric Smith 1, Andrew Brook 1, Sergiu Dumitriu 2, Christian R. Marshall

More information

The 100,000 Genomes Project

The 100,000 Genomes Project The 100,000 Genomes Project Dr Tom Fowler, Deputy Chief Scientist Genomics England Jillian Hastings Ward, Chair, National 100K Participant Panel The Kings Fund, 6th October 2017 Our Mission 100,000 whole

More information

Nature Genetics: doi: /ng Supplementary Figure 1. The pedigree information for American upland cotton breeding.

Nature Genetics: doi: /ng Supplementary Figure 1. The pedigree information for American upland cotton breeding. Supplementary Figure 1 The pedigree information for American upland cotton breeding. The integrated figure was modified from Fig. 1 to 10 in Calhoun, Bowman & May (1994). The accessions with blue color

More information

100,000 Genomes Project Rare Disease Programme. Dr Richard Scott Clinical Lead for Rare Disease, Genomics England

100,000 Genomes Project Rare Disease Programme. Dr Richard Scott Clinical Lead for Rare Disease, Genomics England 100,000 Genomes Project Rare Disease Programme Dr Richard Scott Clinical Lead for Rare Disease, Genomics England Recruitment to 16 th June 31,231 rare disease participants Average = 490 participants /

More information

Supplementary Information

Supplementary Information Supplementary Information Supplementary Figure 1: The proportion of somatic SNVs in each tumor is shown in a trinucleotide context. The data represent 31 exome-sequenced osteosarcomas. Note that the mutation

More information

Genome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014

Genome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014 Genome STRiP ASHG Workshop demo materials Bob Handsaker October 19, 2014 Running Genome STRiP directly on AWS Genome STRiP Structure in Populations Popula'on)aware-discovery-andgenotyping-of-structural-varia'onfrom-whole)genome-sequencing-

More information

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science

UAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science + UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point

More information

Overview of the next two hours...

Overview of the next two hours... Overview of the next two hours... Before tea Session 1, Browser: Introduction Ensembl Plants and plant variation data Hands-on Variation in the Ensembl browser Displaying your data in Ensembl After tea

More information

Human mirna controls * * Lim 2003 Berezikov Mouse mirna controls. Not sequenced. Not enough reads. Berezikov 2006b. Xie 2005

Human mirna controls * * Lim 2003 Berezikov Mouse mirna controls. Not sequenced. Not enough reads. Berezikov 2006b. Xie 2005 Chiang135681_FigureS3 hsa-mir-124-1 hsa-mir-125a hsa-mir-128-1 hsa-mir-142 hsa-mir-150 hsa-mir-192 hsa-mir-205 hsa-mir-214 hsa-mir-455 hsa-mir-483 hsa-mir-499 hsa-mir-888 hsa-mir-9-1 hsa-mir-220a cand141

More information

Supplementary Figure 1

Supplementary Figure 1 number of cells, normalized number of cells, normalized number of cells, normalized Supplementary Figure CD CD53 Cd3e fluorescence intensity fluorescence intensity fluorescence intensity Supplementary

More information

Custom Panels via Clinical Exomes

Custom Panels via Clinical Exomes Custom Panels via Clinical Exomes Andrew Wallace Feb 2015 Genomic Diagnostics Laboratory St. Mary s Hospital, Manchester Custom Panel via Exome Approach Pros Reduced validation overhead Single validation

More information

VCGDB: A Virtual and Dynamic Genome Database of the Chinese Population

VCGDB: A Virtual and Dynamic Genome Database of the Chinese Population VCGDB: A Virtual and Dynamic Genome Database of the Chinese Population Jiayan Wu Associate Professor Director of Science and Technology Department Director of Core Facility Beijing Institute of Genomics,

More information

Assignment 9: Genetic Variation

Assignment 9: Genetic Variation Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant

More information

DISCOVERY AND APPLICATION OF DNA MARKERS FOR RESISTANCE TO TERATOSPHAERIA IN EUCALYPTUS GLOBULUS. Dr Bala Thumma Gondwana Genomics Pty Ltd

DISCOVERY AND APPLICATION OF DNA MARKERS FOR RESISTANCE TO TERATOSPHAERIA IN EUCALYPTUS GLOBULUS. Dr Bala Thumma Gondwana Genomics Pty Ltd DISCOVERY AND APPLICATION OF DNA MARKERS FOR RESISTANCE TO TERATOSPHAERIA IN EUCALYPTUS GLOBULUS Dr Bala Thumma Gondwana Genomics Pty Ltd Background Teratosphaeria leaf disease (TLD; formerly Mycospaerella

More information

HiSeq Whole Exome Sequencing Report. BGI Co., Ltd.

HiSeq Whole Exome Sequencing Report. BGI Co., Ltd. HiSeq Whole Exome Sequencing Report BGI Co., Ltd. Friday, 11th Nov., 2016 Table of Contents Results 1 Data Production 2 Summary Statistics of Alignment on Target Regions 3 Data Quality Control 4 SNP Results

More information

BGGN 213 Genome Informatics (II) Barry Grant

BGGN 213 Genome Informatics (II) Barry Grant BGGN 213 Genome Informatics (II) Barry Grant http://thegrantlab.org/bggn213 What is a Genome? Genome sequencing and the Human genome project What can we do with a Genome? Comparative genomics Modern Genome

More information

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University

Bulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare

More information

Mining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo

Mining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo Mining GWAS Catalog & 1000 Genomes Dataset Segun Fatumo What is GWAS Catalog NHGRI GWA Catalog www.genome.gov/gwastudies Citation How to cite the NHGRI GWAS Catalog: Hindorff LA, MacArthur J (European

More information

Module 2: Introduction to PLINK and Quality Control

Module 2: Introduction to PLINK and Quality Control Module 2: Introduction to PLINK and Quality Control 1 Introduction to PLINK 2 Quality Control 1 Introduction to PLINK 2 Quality Control Single Nucleotide Polymorphism (SNP) A SNP (pronounced snip) is a

More information

Understanding genetic association studies. Peter Kamerman

Understanding genetic association studies. Peter Kamerman Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -

More information

NGS in Pathology Webinar

NGS in Pathology Webinar NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical

More information

Normal-Tumor Comparison using Next-Generation Sequencing Data

Normal-Tumor Comparison using Next-Generation Sequencing Data Normal-Tumor Comparison using Next-Generation Sequencing Data Chun Li Vanderbilt University Taichung, March 16, 2011 Next-Generation Sequencing First-generation (Sanger sequencing): 115 kb per day per

More information

Accelerating Genomic Computations 1000X with Hardware

Accelerating Genomic Computations 1000X with Hardware Accelerating Genomic Computations 1000X with Hardware Yatish Turakhia EE PhD candidate Stanford University Prof. Bill Dally (Electrical Engineering and Computer Science) Prof. Gill Bejerano (Computer Science,

More information

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work

Association Mapping. Mendelian versus Complex Phenotypes. How to Perform an Association Study. Why Association Studies (Can) Work Genome 371, 1 March 2010, Lecture 13 Association Mapping Mendelian versus Complex Phenotypes How to Perform an Association Study Why Association Studies (Can) Work Introduction to LOD score analysis Common

More information

CITATION FILE CONTENT / FORMAT

CITATION FILE CONTENT / FORMAT CITATION 1) For any resultant publications using single samples please cite: Matthew A. Field, Vicky Cho, T. Daniel Andrews, and Chris C. Goodnow (2015). "Reliably detecting clinically important variants

More information

Using the Association Workflow in Partek Genomics Suite

Using the Association Workflow in Partek Genomics Suite Using the Association Workflow in Partek Genomics Suite This user guide will illustrate the use of the Association workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within

More information

Bioinformatics group update. Joo Wook Ahn, Guy s & St Thomas 26/06/ ACGS Summer Meeting

Bioinformatics group update. Joo Wook Ahn, Guy s & St Thomas 26/06/ ACGS Summer Meeting Bioinformatics group update Joo Wook Ahn, Guy s & St Thomas 26/06/2017 - ACGS Summer Meeting Recent activity... Bi-annual group meeting Nov 16 - Bristol // June 17 - Leeds // Nov 17 - Oxford ACGS Genomics

More information

Answers to additional linkage problems.

Answers to additional linkage problems. Spring 2013 Biology 321 Answers to Assignment Set 8 Chapter 4 http://fire.biol.wwu.edu/trent/trent/iga_10e_sm_chapter_04.pdf Answers to additional linkage problems. Problem -1 In this cell, there two copies

More information

BTRY 7210: Topics in Quantitative Genomics and Genetics

BTRY 7210: Topics in Quantitative Genomics and Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics Jason Mezey Biological Statistics and Computational Biology (BSCB) Department of Genetic Medicine jgm45@cornell.edu Spring 2015, Thurs.,12:20-1:10

More information

Genetic association studies

Genetic association studies Genetic association studies Cavan Reilly September 20, 2013 Table of contents HIV genetics Data examples FAMuSSS data HGDP data Virco data Human genetics In practice this implies that the difference between

More information

Centro Nacional de Análisis Genómico. Where are the Bottlenecks of Genome Analysis Today? Teratec. Ecole Polytechnique, Palaiseau, F.

Centro Nacional de Análisis Genómico. Where are the Bottlenecks of Genome Analysis Today? Teratec. Ecole Polytechnique, Palaiseau, F. Centro Nacional de Análisis Genómico Where are the Bottlenecks of Genome Analysis Today? Teratec Ecole Polytechnique, Palaiseau, F Ivo Glynne Gut 29.06.2016 The genomehenge Sequencing capacity >1000 Gbases/day

More information

SNPassoc: an R package to perform whole genome association studies

SNPassoc: an R package to perform whole genome association studies SNPassoc: an R package to perform whole genome association studies Juan R González, Lluís Armengol, Xavier Solé, Elisabet Guinó, Josep M Mercader, Xavier Estivill, Víctor Moreno November 16, 2006 Contents

More information

Supplementary Figures and Data

Supplementary Figures and Data Supplementary Figures and Data Whole Exome Screening Identifies Novel and Recurrent WISP3 Mutations Causing Progressive Pseudorheumatoid Dysplasia in Jammu and Kashmir India Ekta Rai 1, Ankit Mahajan 2,

More information

Bioinformatics for the 100,000 Genomes Project

Bioinformatics for the 100,000 Genomes Project Bioinformatics for the 100,000 Genomes Project Augusto Rendón augusto.rendon@genomicsengland.co.uk Director of Bioinformatics Genomics England Principal Research Associate University of Cambridge Barcelona,

More information

Supplementary Material

Supplementary Material Reverse Transcriptase-Mediated Tropism Switching in Bordetella Bacteriophage Minghsun Liu, Rajendar Deora, Sergei R. Doulatov, Mari Gingery, Frederick A. Eiserling, Andrew Preston, Duncan J. Maskell, Robert

More information

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics

Genetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get

More information

ANNOVAR Variant Annotation and Interpretation

ANNOVAR Variant Annotation and Interpretation 1 ANNOVAR Variant Annotation and Interpretation Copyrighted 2018 Isabelle Schrauwen and Suzanne M. Leal This exercise touches on several functionalities of the program ANNOVAR to annotate and interpret

More information

DNA sequence and chromatin structure. Mapping nucleosome positioning using high-throughput sequencing

DNA sequence and chromatin structure. Mapping nucleosome positioning using high-throughput sequencing DNA sequence and chromatin structure Mapping nucleosome positioning using high-throughput sequencing DNA sequence and chromatin structure Higher-order 30 nm fibre Mapping nucleosome positioning using high-throughput

More information

Analytics Behind Genomic Testing

Analytics Behind Genomic Testing A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical

More information

Supplementary table 1: List of sequences of primers used in sequenom assay

Supplementary table 1: List of sequences of primers used in sequenom assay Supplementary table 1: List of sequences of primers used in sequenom assay SNP_ID 2nd-PCRP Sequence 1st-PCRP Sequence Allele specific (iplex) iplex primer primer Direction ROCK2 1 rs978906 ACGTTGGATGATAAAGCTCTCTCGGCAGTC

More information

How to use Variant Effects Report

How to use Variant Effects Report How to use Variant Effects Report A. Introduction to Ensembl Variant Effect Predictor B. Using RefSeq_v1 C. Using TGACv1 A. Introduction The Ensembl Variant Effect Predictor is a toolset for the analysis,

More information

Using VarSeq to Improve Variant Analysis Research

Using VarSeq to Improve Variant Analysis Research Using VarSeq to Improve Variant Analysis Research June 10, 2015 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda 1 Variant

More information

Genetics 101: What does it mean to have an LBSL-specific mutation

Genetics 101: What does it mean to have an LBSL-specific mutation Genetics 101: What does it mean to have an LBSL-specific mutation Rebecca McClellan, MGC, CGC Johns Hopkins Center for Inherited Heart Disease Kennedy Krieger Metabolism Clinic Genetics 101 A gene is the

More information

Informatics I: Data Standards. Jessie Tenenbaum, PhD,

Informatics I: Data Standards. Jessie Tenenbaum, PhD, Informatics I: Data Standards Jessie Tenenbaum, PhD, FACMI jessie.tenenbaum@duke.edu @jessiet1023 JDT Intro Division of Translational Biomedical Informatics in B&B Dept. Previous life as Program Manager

More information

Briefly, this exercise can be summarised by the follow flowchart:

Briefly, this exercise can be summarised by the follow flowchart: Workshop exercise Data integration and analysis In this exercise, we would like to work out which GWAS (genome-wide association study) SNP associated with schizophrenia is most likely to be functional.

More information

Informatics I: Data Standards. Jessie Tenenbaum,

Informatics I: Data Standards. Jessie Tenenbaum, Informatics I: Data Standards Jessie Tenenbaum, PhD @jessiet1023 JDT Intro Division of Translational Biomedical Informatics in B&B Dept. Previous life as Program Manager at Microsoft PhD 2007 in Biomedical

More information

Introduc)on to Genomics

Introduc)on to Genomics Introduc)on to Genomics Libor Mořkovský, Václav Janoušek, Anastassiya Zidkova, Anna Přistoupilová, Filip Sedlák h1p://ngs-course.readthedocs.org/en/praha-january-2017/ Genome The genome is the gene,c material

More information

Axiom mydesign Custom Array design guide for human genotyping applications

Axiom mydesign Custom Array design guide for human genotyping applications TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required

More information

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014

Alignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE

THE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE : GENETIC DATA UPDATE April 30, 2014 Biomarker Network Meeting PAA Jessica Faul, Ph.D., M.P.H. Health and Retirement Study Survey Research Center Institute for Social Research University of Michigan HRS

More information

Design and Validation of a 2 nd Tier Next Generation Sequencing (NGS) Panel for Newborn Screening for Severe Combined Immunodeficiency Disease (SCID)

Design and Validation of a 2 nd Tier Next Generation Sequencing (NGS) Panel for Newborn Screening for Severe Combined Immunodeficiency Disease (SCID) Design and Validation of a 2 nd Tier Next Generation Sequencing (NGS) Panel for Newborn Screening for Severe Combined Immunodeficiency Disease (SCID) September 13, 2017 Colleen Stevens, Ph.D. Research

More information

Imputation. Genetics of Human Complex Traits

Imputation. Genetics of Human Complex Traits Genetics of Human Complex Traits GWAS results Manhattan plot x-axis: chromosomal position y-axis: -log 10 (p-value), so p = 1 x 10-8 is plotted at y = 8 p = 5 x 10-8 is plotted at y = 7.3 Advanced Genetics,

More information

Supplementary Figure 1 Exome sequencing in five members of the PFBC family and identification of four candidate variants.

Supplementary Figure 1 Exome sequencing in five members of the PFBC family and identification of four candidate variants. a b Supplementary Figure 1 Exome sequencing in five members of the PFBC family and identification of four candidate variants. (a) Pedigree of the PFBC family. Shown are unaffected (open), affected (filled)

More information

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012

Variant Detection in Next Generation Sequencing Data. John Osborne Sept 14, 2012 + Variant Detection in Next Generation Sequencing Data John Osborne Sept 14, 2012 + Overview My Bias Talk slanted towards analyzing whole genomes using Illumina paired end reads with open source tools

More information

Supplementary Information Targeting fidelity of adenine and cytosine base editors in mouse embryos

Supplementary Information Targeting fidelity of adenine and cytosine base editors in mouse embryos Supplementary Information ing fidelity of adenine and cytosine base s in mouse embryos Lee et al. a P = 1.012e-14 b Frequency (%) 100% 80% 60% 40% 20% 0% CB AB On-target Bystander Proximal Indels Frequency

More information

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28.

The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Data mining in Ensembl with BioMart Worked Example The human gene encoding Glucose-6-phosphate dehydrogenase (G6PD) is located on chromosome X in cytogenetic band q28. Which other genes related to human

More information

Supplementary Figures

Supplementary Figures Supplementary Figures A B Supplementary Figure 1. Examples of discrepancies in predicted and validated breakpoint coordinates. A) Most frequently, predicted breakpoints were shifted relative to those derived

More information

Homework 4. Due in class, Wednesday, November 10, 2004

Homework 4. Due in class, Wednesday, November 10, 2004 1 GCB 535 / CIS 535 Fall 2004 Homework 4 Due in class, Wednesday, November 10, 2004 Comparative genomics 1. (6 pts) In Loots s paper (http://www.seas.upenn.edu/~cis535/lab/sciences-loots.pdf), the authors

More information

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014

Alignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG

More information

Structural Variation CYP2D6

Structural Variation CYP2D6 reference gene locus The gene locus contains three genes,, CYP and CYP, of which only is functional. CYP and CYP are considered pseudogenes. All three genes are composed of nine exons and share a high

More information

NIH Public Access Author Manuscript Curr Protoc Bioinformatics. Author manuscript; available in PMC 2015 September 08.

NIH Public Access Author Manuscript Curr Protoc Bioinformatics. Author manuscript; available in PMC 2015 September 08. NIH Public Access Author Manuscript Published in final edited form as: Curr Protoc Bioinformatics. ; 47: 11.12.1 11.12.34. doi:10.1002/0471250953.bi1112s47. BEDTools: the Swiss-army tool for genome feature

More information