Mining GWAS Catalog & 1000 Genomes Dataset. Segun Fatumo
|
|
- Hilary Webster
- 6 years ago
- Views:
Transcription
1 Mining GWAS Catalog & 1000 Genomes Dataset Segun Fatumo
2 What is GWAS Catalog NHGRI GWA Catalog
3 Citation How to cite the NHGRI GWAS Catalog: Hindorff LA, MacArthur J (European Bioinformatics Institute), Morales J (European Bioinformatics Institute), Junkins HA, Hall PN, Klemm AK, and Manolio TA. A Catalog of Published Genome-Wide Association Studies. Available at: Accessed [date of access]. How to cite the NHGRI GWAS Catalog paper: Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, Klemm A, Flicek P, Manolio T, Hindorff L, and Parkinson H. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Research, 2014, Vol. 42 (Database issue): D1001-D1006.
4 Class Exercise Download the NGHRI GWAS catalog In tab delimited format to your Linux system What is the number of columns on your downloaded gwas catalog What is the header on column 28? Using a spreadsheet (Ms Excel) Extract All SNPs that have been found to be associated with HDL Cholesterol BMI Your trait of interest Print only columns 2,8,30,24,22 and 28 in this order. Extract only where p-value is 5 X 10^-8 Count the number of SNPs found for each traits. Report only the unique SNPs. Hint: Use your knowledge of SQL, Bash scripting, python. If you cant get your way around this, we do solve it together.
5 1000 Genomes
6 Glossary Pilot: The 1000 Genomes project ran a pilot study between 2008 and 2010 Phase 1: The initial round of exome and low coverage sequencing of 1000 individuals Phase 2: Expanded sequencing of 1700 individuals and method improvement Phase 3: Sequencing of 2500 individuals and a new variation catalogue SAM/BAM: Sequence Alignment/Map Format, an alignment format VCF: Variant Call Format, a variant format 6
7 The 1000 Genomes Project: Overview International project to construct a foundational data set for human genetics Discover virtually all common human variations by investigating many genomes at the base pair level Consortium with multiple centers, platforms, funders Aims Discover population level human genetic variations of all types (95% of variation > 1% frequency) Define haplotype structure in the human genome Develop sequence analysis methods, tools, and other reagents that can be transferred to other sequencing projects 7
8 Phase 1 populations EUROPE CEU A. 85 B. All LCL C. 45m/40f D. 78t/3d/4s IBS A. 14 B. All LCL C. 7m/7f D. 14t GBR A. 89 B. All LCL C. 41m/48f D. 3d/86s FIN A. 93 B. All LCL C. 35m/58f D. 93s TSI A. 98 B. All LCL C. 50m/48f D. 98s AMERICAS MXL A. 66 B. All LCL C. 31m/35f D. 59t/3d/4s Utah, USA Los Angeles, USA Southwest, USA Finland Great Britain Spain Italy Beijing, China Hu Nan and Fu Jian Provinces, China Tokyo, Japan EAST ASIA JPT A. 89 B. All LCL C. 50m/39f D. 89s PUR A. 55 B. 35bld/20LCL C. 28m/27f D. 47t/8d CLM A. 60 B. All LCL C. 29m/31f D. 55t/5d New 1000 Genomes Puerto Rico Medellín, Colombia ASW A. 61 B. All LCL C. 24m/37f D. 28t/22d/11s Ibadan, Nigeria YRI A. 88 B. All LCL C. 43m/45f D. 65t/21d/2s Webuye, Kenya LWK A. 97 B. All LCL C. 48m/49f D. 4d/93s CHB A. 97 B. All LCL C. 44m/53f D. 97s CHS A. 100 B. All LCL C. 50m/50f D. 100t HapMap 3 AFRICA 8 Figure S Genomes Project Phase I populations Populations collected as part of the HapMap project (blue) and the 1000 Genomes Project (green) include: Europe (IBS (Iberian populations in Spain), GBR (British from England and Scotland ), CEU
9 Phase 2/3 populations Barbados Ghana Pakistan Peru Nigeria India Bangladesh Sierra Leone Sri Lanka USA Vietnam 9
10 Hapmap, The Pilot Project and The Main Hapmap Starting in 2002 Last release contained ~3m snps 1400 individuals 11 populations High Throughput genotyping chips 10 Project 1000 Genomes Pilot project Started in 2008 Paper release contained ~14 million snps 179 individuals 4 populations Low coverage next generation sequencing 1000 Genomes Phase 1 Started in 2009 Phase 1 release has 36.6millon snps, 3.8millon indels and 14K deletions 1094 individuals 14 populations Low coverage and exome next generation sequencing 1000 Genomes Phase 2 Started in individuals 19 Populations Low coverage and exome next generation sequencing
11 Timeline September 2007: 1000 Genomes project formally proposed Cambridge, UK April 2008: First Submission of Data to the Short Read Archive. May 2008: First public data release. October 2008: SAM/BAM Format Defined. December 2008: First High Coverage Variants Released. December 2008: First 1000 genomes browser released May 2009: First Indel Calls released. July 2009: VCF Format defined August 2009: First Large Scale Deletions released. December 2009: First Main Project Sequence Data Released. March 2010: Low Coverage Pilot Variant Release made July 2010: Phased genotypes for 159 Individuals released. October 2010: A Map of Human Variation from population scale sequencing is published in Nature. January 2011: Final Phase 1 Low coverage alignments are released May appears on Twitter May 2011: First Variant Release made on more than 1000 individuals October 2011: Phase 1 integrated variant release made March 2012: Phase 2 Alignment release November 2012: An integrated map of genetic variation from 11 1,092 human genomes in Nature
12 Fraction of variant sites present in an individual that are NOT already represented in dbsnp Date Fraction not in dbsnp February, % February, % April, % February, % Now <1% Ryan Poplin, David Altshuler
13 Data Availability FTP site: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/ Raw Data Files Web site: Release Announcements Documentation Ensembl Style Browser: Browse 1000 Genomes variants in Genomic Context Variant Effect Predictor Data Slicer Other Tools 13
14 The 1000 Genomes Project data Data are available through: The 1000 Genomes website: NCBI: ftp://ftptrace.ncbi.nlm.nih.gov/1000genomes EBI: ftp://ftp.1000genomes.ebi.ac.uk Amazon:
15 Command Line Tools Samtools VCFTools Tabix (Please note it is best to use the trunk svn code for this as the release has a bug) svn co 15
16 Fastq files Sequence HS18_6628:8:1108:8213:186084#2/1 GGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGG + DCDHKHKKIJGNNHIJIIKLLMCLKMAILIJH3K>HL1I=>MK.D 16
17 BAM files Alignment Data ERR M = NAME DESCRIPTION 17 QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL Query NAME of the read or read pair Bitwise FLAG (pairing, strand, mate strand etc Reference Sequence NAME 1-Based leftmost POSition of clipped alignment MAPping Quality (Phred-scaled) Extended CIGAR string (operations: MIDNSHP) Mate Reference NaMe ( = if same as RNAME) 1-Based leftmost Mate POSition Inferred Insert SIZE Query SEQuence on the same strand as the reference Query QUALity (ASCII-33=Phred base quality)
18 More Information About BAM Files 18
19 VCF Files Variant Call Data TAB Delimited Text Format NAME CHROM POS ID REF ALT QUAL FILTER INFO FORMAT DESCRIPTION Chromosome name Position in chromosome Unique Identifer of variant Reference Allele Alternative Allele Phred scaled quality value Site filter information User extensible annotation Describes the format of the subsequent fields, must always contain Genotype Individual Genotype 19 Fields These columns contain the individual genotype data for each individual in the file
20 Variant Call Data Headers ##fileformat=vcfv4.1 ##INFO=<ID=RSQ,Number=1,Type=Float,Description="Genotype imputation quality from MaCH/Thunder"> ##INFO=<ID=AC,Number=.,Type=Integer,Description="Alternate Allele Count"> ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total Allele Count"> ##INFO=<ID=AA,Number=1,Type=String,Description="Ancestral Allele, ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/ancestral_alignmen ts/readme"> ##INFO=<ID=AF,Number=1,Type=Float,Description="Global Allele Frequency based on AC/AN > ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=DS,Number=1,Type=Float,Description="Genotype dosage from MaCH/Thunder"> ##FORMAT=<ID=GL,Number=.,Type=Float,Description="Genotype Likelihoods"> 20
21 Variant Call Data Example 1000 Genomes Data CHROM 4 POS ID rs REF T ALT C QUAL 100 FILTER PASS INFO AA=T;AN=2184;AC=1;RSQ=0.8138;AF=0.0005; FORMAT GT:DS:GL GENOTYPE 0 0:0.000:-0.03,-1.19,
22 More Information About VCF Files VCF variant files All indexed for fast retrieval
23 Class Exercise Download from 1000 Genomes the vcf data for a gene of interest. If you don t have a gene of interest, look for a gene in GWAS catalog that is associated with a Lipid traits eg PCSK9 Extract the genotype of same gene for only European population Convert your vcf genotype data to a plink format (ped & map) Hint : You will need to know the chr and location of the gene before you can download it
24 Data Slicing All alignment and variant files are indexed so subsections can be downloaded remotely Use samtools to get subsections of bam files samtools view ment/hg01375.mapped.illumina.bwa.clm.low_coverage bam 6: Use tabix to get subsections of vcf files tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/working/ _omni_genotypes_and_intensities/Omni25_genotypes_ 2141_samples.b37.vcf.gz 6: You can also use the web Data Slicer interface to do this 24
25 Data Slicing VCFtools provides some useful additional functionality on the command line including: vcf-compare, comparision and stats about two or more vcf files vcf-isec, creates an intersection of two or more vcf files vcf-subset, will subset a vcf file only retaining the specified individual columns vcf-validator, will validate a particular 25
26 Data Slicing ata/selectslice
27 Variant Effect Predictor Predicts Functional Consequences of Variants Both Web Front end and API script Can provide sift/polyphen/condel consequences Refseq gene names HGVS output Can run from a cache as well as Database Convert from one input format to another Script available for download from: ftp://ftp.ensembl.org/pub/misc-scripts/variant_effect_predictor/ Variations 27
28
29 Variant Effect Predictor perl variant_effect_predictor.pl -input 6_ _ vcf -sift p -polyphen p check_existing less variant_effect_output.txt #Uploaded_variation Location Allele Gene Feature Feature_type Consequence cdna_position CDS_position Protein_position Amino_acids Codons Exi sting_variation Extra rs : A ENSG ENST Transcript DOWNSTREAM rs rs : A ENSG ENST Transcript INTRONIC rs _ _c/t 6: T ENSG ENST Transcript NON_SYNONYMOUS_CODING R/H cgc/cac - PolyPhen=possibly_damaging;SIFT=deleterious 29
30 VCF to PED LD Visualization tools like Haploview require PED files VCF to PED converts VCF to PED Will a file divide by individual or population ens/userdata/haploview 30
31 31 VCF to PED
32 VCF to PED perl vcf_to_ped_convert.pl -vcf ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/ /all.chr6.pha se1_integrated_calls snps_indels_svs.genotypes.vcf.gz - sample_panel_file ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/ /phase1_inte grated_calls all.panel -region 6: population CEU Output should be two files 6_ info 6_ ped 32
33 Access to backend Ensembl databases Public MySQL database at mysql-db.1000genomes.org port 4272 Full programmatic access with Ensembl API The 1000 Genomes Pilot uses Ensembl v60 databases and the NCBI36 assembly (this is frozen) The 1000 Genomes main project currently uses Ensembl v63 databases n/index.html ml
34 Announcements genomes-annoucement-mailing-list ts/rss.xml 34
Genome variation - part 1
Genome variation - part 1 Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 21 th January 2016 Aims of the session Introduce major
More informationPopulation description. 103 CHB Han Chinese in Beijing, China East Asian EAS. 104 JPT Japanese in Tokyo, Japan East Asian EAS
1 Supplementary Table 1 Description of the 1000 Genomes Project Phase 3 representing 2504 individuals from 26 different global populations that are assigned to five super-populations Number of individuals
More informationBrowsing Genes and Genomes with Ensembl
Browsing Genes and Genomes with Ensembl Victoria Newman Ensembl Outreach Officer EMBL-EBI Objectives What is Ensembl? What type of data can you get in Ensembl? How to navigate the Ensembl browser website.
More informationSequence variation Introductory bioinformatics for human genomics workshop, UNSW
Sequence variation Dr Jason Wong Prince of Wales Clinical School Introductory bioinformatics for human genomics workshop, UNSW Day 2 Friday 29 th January 2016 Aims of the session Introduce major human
More informationAnnotating your variants: Ensembl Variant Effect Predictor (VEP) Helen Sparrow Ensembl EMBL-EBI 2nd November 2016
Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation
More informationI/O Suite, VCF (1000 Genome) and HapMap
I/O Suite, VCF (1000 Genome) and HapMap Hin-Tak Leung April 13, 2013 Contents 1 Introduction 1 1.1 Ethnic Composition of 1000G vs HapMap........................ 2 2 1000 Genome vs HapMap YRI (Africans)
More informationEnsembl Tools. EBI is an Outstation of the European Molecular Biology Laboratory.
Ensembl Tools EBI is an Outstation of the European Molecular Biology Laboratory. Questions? We ve muted all the mics Ask questions in the Chat box in the webinar interface I will check the Chat box periodically
More informationPrioritization: from vcf to finding the causative gene
Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for
More informationVariant Analysis. CB2-201 Computational Biology and Bioinformatics! February 27, Emidio Capriotti!
Variant Analysis CB2-201 Computational Biology and Bioinformatics February 27, 2015 Emidio Capriotti http://biofold.org/emidio Division of Informatics Department of Pathology Variant Call Format The final
More informationBrowsing Genes and Genomes with Ensembl
Training materials - - - - Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their
More informationExome Sequencing and Disease Gene Search
Exome Sequencing and Disease Gene Search Erzurumluoglu AM, Rodriguez S, Shihab HA, Baird D, Richardson TG, Day IN, Gaunt TR. Identifying Highly Penetrant Disease Causal Mutations Using Next Generation
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationHuman Populations: History and Structure
Human Populations: History and Structure In the paper Novembre J, Johnson, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann A, Nelson MB, Stephens M, Bustamante CD. 2008. Genes mirror geography
More informationDe novo human genome assemblies reveal spectrum of alternative haplotypes in diverse
SUPPLEMENTARY INFORMATION De novo human genome assemblies reveal spectrum of alternative haplotypes in diverse populations Wong et al. The Supplementary Information contains 4 Supplementary Figures, 3
More informationTraining materials.
Training materials - Ensembl training materials are protected by a CC BY license - http://creativecommons.org/licenses/by/4.0/ - If you wish to re-use these materials, please credit Ensembl for their creation
More informationVariant calling in NGS experiments
Variant calling in NGS experiments Jorge Jiménez jjimeneza@cipf.es BIER CIBERER Genomics Department Centro de Investigacion Principe Felipe (CIPF) (Valencia, Spain) 1 Index 1. NGS workflow 2. Variant calling
More informationIn silico variant analysis: Challenges and Pitfalls
In silico variant analysis: Challenges and Pitfalls Fiona Cunningham Variation annotation coordinator EMBL-EBI www.ensembl.org Sequencing -> Variants -> Interpretation Structural variants SNP? In-dels
More informationRNAseq and Variant discovery
RNAseq and Variant discovery RNAseq Gene discovery Gene valida5on training gene predic5on programs Gene expression studies Paris japonica Gene discovery Understanding physiological processes Dissec5ng
More informationVariant Finding. UCD Genome Center Bioinformatics Core Wednesday 30 August 2016
Variant Finding UCD Genome Center Bioinformatics Core Wednesday 30 August 2016 Types of Variants Adapted from Alkan et al, Nature Reviews Genetics 2011 Why Look For Variants? Genotyping Correlation with
More informationAlignment & Variant Discovery. J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014
Alignment & Variant Discovery J Fass UCD Genome Center Bioinformatics Core Tuesday June 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationGenomics: Human variation
Genomics: Human variation Lecture 1 Introduction to Human Variation Dr Colleen J. Saunders, PhD South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development, University
More informationResources at HapMap.Org
Resources at HapMap.Org HapMap Phase II Dataset Release #21a, January 2007 (NCBI build 35) 3.8 M genotyped SNPs => 1 SNP/700 bp # polymorphic SNPs/kb in consensus dataset International HapMap Consortium
More informationBioinformatics small variants Data Analysis. Guidelines. genomescan.nl
Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines
More informationAlignment. J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014
Alignment J Fass UCD Genome Center Bioinformatics Core Wednesday December 17, 2014 From reads to molecules Why align? Individual A Individual B ATGATAGCATCGTCGGGTGTCTGCTCAATAATAGTGCCGTATCATGCTGGTGTTATAATCGCCGCATGACATGATCAATGG
More informationIntroduc)on to Genomics
Introduc)on to Genomics Libor Mořkovský, Václav Janoušek, Anastassiya Zidkova, Anna Přistoupilová, Filip Sedlák h1p://ngs-course.readthedocs.org/en/praha-january-2017/ Genome The genome is the gene,c material
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationEric Green NHGRI Director
Bringing Genomic Medicine into Focus Eric Green, M.D., Ph.D. Director, NHGRI The Relevance of Genomics Biomedical Researchers Healthcare Professionals Patients (and Friends & Relatives of Patients) Human
More informationAnalysing Alu inserts detected from high-throughput sequencing data
Analysing Alu inserts detected from high-throughput sequencing data Harun Mustafa Mentor: Matei David Supervisor: Michael Brudno July 3, 2013 Before we begin... Even though I'll only present the minimal
More informationBriefly, this exercise can be summarised by the follow flowchart:
Workshop exercise Data integration and analysis In this exercise, we would like to work out which GWAS (genome-wide association study) SNP associated with schizophrenia is most likely to be functional.
More informationCITATION FILE CONTENT / FORMAT
CITATION 1) For any resultant publications using single samples please cite: Matthew A. Field, Vicky Cho, T. Daniel Andrews, and Chris C. Goodnow (2015). "Reliably detecting clinically important variants
More informationAxiom mydesign Custom Array design guide for human genotyping applications
TECHNICAL NOTE Axiom mydesign Custom Genotyping Arrays Axiom mydesign Custom Array design guide for human genotyping applications Overview In the past, custom genotyping arrays were expensive, required
More informationIntroduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017
Introduction to Next Generation Sequencing (NGS) Andrew Parrish Exeter, 2 nd November 2017 Topics to cover today What is Next Generation Sequencing (NGS)? Why do we need NGS? Common approaches to NGS NGS
More informationHaplotypes, linkage disequilibrium, and the HapMap
Haplotypes, linkage disequilibrium, and the HapMap Jeffrey Barrett Boulder, 2009 LD & HapMap Boulder, 2009 1 / 29 Outline 1 Haplotypes 2 Linkage disequilibrium 3 HapMap 4 Tag SNPs LD & HapMap Boulder,
More informationFrom Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow
From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow Technical Overview Import VCF Introduction Next-generation sequencing (NGS) studies have created unanticipated challenges with
More informationReference genomes and common file formats
Reference genomes and common file formats Overview Reference genomes and GRC Fasta and FastQ (unaligned sequences) SAM/BAM (aligned sequences) Summarized genomic features BED (genomic intervals) GFF/GTF
More informationBICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte
BICF Variant Analysis Tools Using the BioHPC Workflow Launching Tool Astrocyte Prioritization of Variants SNP INDEL SV Astrocyte BioHPC Workflow Platform Allows groups to give easy-access to their analysis
More informationReference genomes and common file formats
Reference genomes and common file formats Dóra Bihary MRC Cancer Unit, University of Cambridge CRUK Functional Genomics Workshop September 2017 Overview Reference genomes and GRC Fasta and FastQ (unaligned
More informationUK Biobank Axiom Array
DATA SHEET Advancing human health studies with powerful genotyping technology Array highlights The Applied Biosystems UK Biobank Axiom Array is a powerful array for translational research. Designed using
More informationSeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen
SeattleSNPs Interactive Tutorial: Database Inteface Entrez, dbsnp, HapMap, Perlegen The tutorial is designed to take you through the steps necessary to access SNP data from the primary database resources:
More informationMPG NGS workshop I: SNP calling
MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula
More informationVEGAS2: Gene-based test software using 1000 Genomes reference sets. User Manual
VEGAS2: Gene-based test software using 1000 Genomes reference sets. User Manual Version: 16:09:002 Date: 16 th September 2014 By Aniket Mishra, Stuart Macgregor Statistical Genetics Group QIMR Berghofer
More informationS G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics
S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public
More informationIntroduction to RNA-Seq in GeneSpring NGS Software
Introduction to RNA-Seq in GeneSpring NGS Software Dipa Roy Choudhury, Ph.D. Strand Scientific Intelligence and Agilent Technologies Learn more at www.genespring.com Introduction to RNA-Seq In a few years,
More informationSanger vs Next-Gen Sequencing
Tools and Algorithms in Bioinformatics GCBA815/MCGB815/BMI815, Fall 2017 Week-8: Next-Gen Sequencing RNA-seq Data Analysis Babu Guda, Ph.D. Professor, Genetics, Cell Biology & Anatomy Director, Bioinformatics
More informationChapter 2: Access to Information
Chapter 2: Access to Information Outline Introduction to biological databases Centralized databases store DNA sequences Contents of DNA, RNA, and protein databases Central bioinformatics resources: NCBI
More informationExploring genomic databases: Practical session "
Exploring genomic databases: Practical session Work through the following practical exercises on your own. The objective of these exercises is to become familiar with the information available in each
More informationFrom raw reads to variants
From raw reads to variants Sebastian DiLorenzo Sebastian.DiLorenzo@NBIS.se Talk Overview Concepts Reference genome Variants Paired-end data NGS Workflow Quality control & Trimming Alignment Local realignment
More informationProcessing Ion AmpliSeq Data using NextGENe Software v2.3.0
Processing Ion AmpliSeq Data using NextGENe Software v2.3.0 July 2012 John McGuigan, Megan Manion, Kevin LeVan, CS Jonathan Liu Introduction The Ion AmpliSeq Panels use highly multiplexed PCR in order
More informationBulked Segregant Analysis For Fine Mapping Of Genes. Cheng Zou, Qi Sun Bioinformatics Facility Cornell University
Bulked Segregant Analysis For Fine Mapping Of enes heng Zou, Qi Sun Bioinformatics Facility ornell University Outline What is BSA? Keys for a successful BSA study Pipeline of BSA extended reading ompare
More informationThe HapMap Project and Haploview
The HapMap Project and Haploview David Evans Ben Neale University of Oxford Wellcome Trust Centre for Human Genetics Human Haplotype Map General Idea: Characterize the distribution of Linkage Disequilibrium
More informationChroMoS Guide (version 1.2)
ChroMoS Guide (version 1.2) Background Genome-wide association studies (GWAS) reveal increasing number of disease-associated SNPs. Since majority of these SNPs are located in intergenic and intronic regions
More informationApplied Bioinformatics
Applied Bioinformatics In silico and In clinico characterization of genetic variations Assistant Professor Department of Biomedical Informatics Center for Human Genetics Research ATCAAAATTATGGAAGAA ATCAAAATCATGGAAGAA
More informationCourse Presentation. Ignacio Medina Presentation
Course Index Introduction Agenda Analysis pipeline Some considerations Introduction Who we are Teachers: Marta Bleda: Computational Biologist and Data Analyst at Department of Medicine, Addenbrooke's Hospital
More informationBioinformatics Core Facility IDENTIFYING A DISEASE CAUSING MUTATION
IDENTIFYING A DISEASE CAUSING MUTATION MARCELA DAVILA 2/03/2017 Core Facilities at Sahlgrenska Academy www.cf.gu.se 5 statisticians, 3 bioinformaticians Consultation 7-8 Courses / year Contact information
More informationGuided tour to Ensembl
Guided tour to Ensembl Introduction Introduction to the Ensembl project Walk-through of the browser Variations and Functional Genomics Comparative Genomics BioMart Ensembl Genome browser http://www.ensembl.org
More informationSupplementary Information for:
Supplementary Information for: BISQUE: locus- and variant-specific conversion of genomic, transcriptomic, and proteomic database identifiers Michael J. Meyer 1,2,3,, Philip Geske 1,2,, and Haiyuan Yu 1,2,*
More informationIllumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era
Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Anthony Green Sr. Genotyping Sales Specialist North America 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx,
More informationGenetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics
Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get
More informationDNBseq TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing
TM SERVICE OVERVIEW Plant and Animal Whole Genome Re-Sequencing Plant and animal whole genome re-sequencing (WGRS) involves sequencing the entire genome of a plant or animal and comparing the sequence
More informationNext Generation Sequencing: Data analysis for genetic profiling
Next Generation Sequencing: Data analysis for genetic profiling Raed Samara, Ph.D. Global Product Manager Raed.Samara@QIAGEN.com Welcome to the NGS webinar series - 2015 NGS Technology Webinar 1 NGS: Introduction
More informationHuman Genetic Variation. Ricardo Lebrón Dpto. Genética UGR
Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.
More informationNGS in Pathology Webinar
NGS in Pathology Webinar NGS Data Analysis March 10 2016 1 Topics for today s presentation 2 Introduction Next Generation Sequencing (NGS) is becoming a common and versatile tool for biological and medical
More informationPersonal Genomics Platform White Paper Last Updated November 15, Executive Summary
Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,
More informationQIAseq Targeted Panel Analysis Plugin USER MANUAL
QIAseq Targeted Panel Analysis Plugin USER MANUAL User manual for QIAseq Targeted Panel Analysis 1.1 Windows, macos and Linux June 18, 2018 This software is for research purposes only. QIAGEN Aarhus Silkeborgvej
More informationWeek 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers
Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers Web resources: NCBI database: http://www.ncbi.nlm.nih.gov/ Ensembl database: http://useast.ensembl.org/index.html
More informationGenotyping Technology How to Analyze Your Own Genome Fall 2013
Genotyping Technology 02-223 How to nalyze Your Own Genome Fall 2013 HapMap Project Phase 1 Phase 2 Phase 3 Samples & POP panels Genotyping centers Unique QC+ SNPs 269 samples (4 populations) HapMap International
More informationTraining materials.
Training materials Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ If you wish to re-use these materials, please credit Ensembl for their creation
More informationMapping errors require re- alignment
RE- ALIGNMENT Mapping errors require re- alignment Source: Heng Li, presenta8on at GSA workshop 2011 Alignment Key component of alignment algorithm is the scoring nega8ve contribu8on to score opening a
More informationAnalytics Behind Genomic Testing
A Quick Guide to the Analytics Behind Genomic Testing Elaine Gee, PhD Director, Bioinformatics ARUP Laboratories 1 Learning Objectives Catalogue various types of bioinformatics analyses that support clinical
More informationGene Expression analysis with RNA-Seq data
Gene Expression analysis with RNA-Seq data C3BI Hands-on NGS course November 24th 2016 Frédéric Lemoine Plan 1. 2. Quality Control 3. Read Mapping 4. Gene Expression Analysis 5. Splicing/Transcript Analysis
More informationOverview of the next two hours...
Overview of the next two hours... Before tea Session 1, Browser: Introduction Ensembl Plants and plant variation data Hands-on Variation in the Ensembl browser Displaying your data in Ensembl After tea
More informationRead Mapping and Variant Calling. Johannes Starlinger
Read Mapping and Variant Calling Johannes Starlinger Application Scenario: Personalized Cancer Therapy Different mutations require different therapy Collins, Meredith A., and Marina Pasca di Magliano.
More informationVariant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017
Variant Calling CHRIS FIELDS MAYO-ILLINOIS COMPUTATIONAL GENOMICS WORKSHOP, JUNE 19, 2017 Up-front acknowledgments Many figures/slides come from: GATK Workshop slides: http://www.broadinstitute.org/gatk/guide/events?id=2038
More informationEnsembl and ENA. High level overview and use cases. Denise Carvalho-Silva. Ensembl Outreach Team
Ensembl and ENA High level overview and use cases Denise Carvalho-Silva Ensembl Outreach Team On behalf of Ensembl and ENA teams European Molecular Biology Laboratories Euroepan Bioinformatics Institute
More informationSNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es
SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR
More informationThe Whole Genome TagSNP Selection and Transferability Among HapMap Populations. Reedik Magi, Lauris Kaplinski, and Maido Remm
The Whole Genome TagSNP Selection and Transferability Among HapMap Populations Reedik Magi, Lauris Kaplinski, and Maido Remm Pacific Symposium on Biocomputing 11:535-543(2006) THE WHOLE GENOME TAGSNP SELECTION
More informationGenome STRiP ASHG Workshop demo materials. Bob Handsaker October 19, 2014
Genome STRiP ASHG Workshop demo materials Bob Handsaker October 19, 2014 Running Genome STRiP directly on AWS Genome STRiP Structure in Populations Popula'on)aware-discovery-andgenotyping-of-structural-varia'onfrom-whole)genome-sequencing-
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationBioinformatic Analysis of SNP Data for Genetic Association Studies EPI573
Bioinformatic Analysis of SNP Data for Genetic Association Studies EPI573 Mark J. Rieder Department of Genome Sciences mrieder@u.washington washington.edu Epidemiology Studies Cohort Outcome Model to fit/explain
More informationEnsembl: A New View of Genome Browsing
28 TECHNICAL NOTES EMBnet.news 15.3 Ensembl: A New View of Genome Browsing Giulietta M. Spudich and Xosé M. Fernández- Suárez European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxon, Cambs,
More informationUKPMC Funders Group Author Manuscript Nature. Author manuscript; available in PMC 2011 April 1.
UKPMC Funders Group Author Manuscript Published in final edited form as: Nature. 2010 October 28; 467(7319): 1061 1073. doi:10.1038/nature09534. A map of human genome variation from population scale sequencing
More informationDeakin Research Online
Deakin Research Online This is the published version: Church, Philip, Goscinski, Andrzej, Wong, Adam and Lefevre, Christophe 2011, Simplifying gene expression microarray comparative analysis., in BIOCOM
More informationGenomics: Genome Browsing & Annota3on
Genomics: Genome Browsing & Annota3on Lecture 4 of 4 Introduc/on to BioMart Dr Colleen J. Saunders, PhD South African National Bioinformatics Institute/MRC Unit for Bioinformatics Capacity Development,
More informationUsing VarSeq to Improve Variant Analysis Research
Using VarSeq to Improve Variant Analysis Research June 10, 2015 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda 1 Variant
More informationTranscriptomics analysis with RNA seq: an overview Frederik Coppens
Transcriptomics analysis with RNA seq: an overview Frederik Coppens Platforms Applications Analysis Quantification RNA content Platforms Platforms Short (few hundred bases) Long reads (multiple kilobases)
More informationWhole Genome Sequencing. Biostatistics 666
Whole Genome Sequencing Biostatistics 666 Genomewide Association Studies Survey 500,000 SNPs in a large sample An effective way to skim the genome and find common variants associated with a trait of interest
More informationRoadmap: genotyping studies in the post-1kgp era. Alex Helm Product Manager Genotyping Applications
Illumina s GWAS Roadmap: next-generation genotyping studies in the post-1kgp era Alex Helm Product Manager Genotyping Applications 2010 Illumina, Inc. All rights reserved. Illumina, illuminadx, Solexa,
More informationDerrek Paul Hibar
Derrek Paul Hibar derrek.hibar@ini.usc.edu Obtain the ADNI Genetic Data Quality Control Procedures Missingness Testing for relatedness Minor allele frequency (MAF) Hardy-Weinberg Equilibrium (HWE) Testing
More informationSNPTransformer: A Lightweight Toolkit for Genome-Wide Association Studies
GENOMICS PROTEOMICS & BIOINFORMATICS www.sciencedirect.com/science/journal/16720229 Application Note SNPTransformer: A Lightweight Toolkit for Genome-Wide Association Studies Changzheng Dong * School of
More informationSupplementary Figure 1. Study design of a multi-stage GWAS of gout.
Supplementary Figure 1. Study design of a multi-stage GWAS of gout. Supplementary Figure 2. Plot of the first two principal components from the analysis of the genome-wide study (after QC) combined with
More informationVariant prioritization in NGS studies: Annotation and Filtering "
Variant prioritization in NGS studies: Annotation and Filtering Colleen J. Saunders (PhD) DST/NRF Innovation Postdoctoral Research Fellow, South African National Bioinformatics Institute/MRC Unit for Bioinformatics
More informationPopulation differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia
Population differentiation analysis of 54,734 European Americans reveals independent evolution of ADH1B gene in Europe and East Asia Kevin Galinsky Harvard T. H. Chan School of Public Health American Society
More informationReads to Discovery. Visualize Annotate Discover. Small DNA-Seq ChIP-Seq Methyl-Seq. MeDIP-Seq. RNA-Seq. RNA-Seq.
Reads to Discovery RNA-Seq Small DNA-Seq ChIP-Seq Methyl-Seq RNA-Seq MeDIP-Seq www.strand-ngs.com Analyze Visualize Annotate Discover Data Import Alignment Vendor Platforms: Illumina Ion Torrent Roche
More informationVCGDB: A Virtual and Dynamic Genome Database of the Chinese Population
VCGDB: A Virtual and Dynamic Genome Database of the Chinese Population Jiayan Wu Associate Professor Director of Science and Technology Department Director of Core Facility Beijing Institute of Genomics,
More informationAnnotation of Genetic Variants
Annotation of Genetic Variants Valerie Obenchain Fred Hutchinson Cancer Research Center 27-28 February 2012 Read VCF Files Structural location of variants Amino acid coding changes Extras Outline Read
More informationAmapofhumangenomevariationfrom population-scale sequencing
doi:.38/nature9534 Amapofhumangenomevariationfrom population-scale sequencing The Genomes Project Consortium* The Genomes Project aims to provide a deep characterization of human genome sequence variation
More informationVariant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4
WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent
More informationSingle Nucleotide Polymorphisms (SNPs)
Single Nucleotide Polymorphisms (SNPs) Sequence variations Single nucleotide polymorphisms Insertions/deletions Copy number variations (large: >1kb) Variable (short) number tandem repeats Single Nucleotide
More informationQuantitative Genomics and Genetics BTRY 4830/6830; PBSB
Quantitative Genomics and Genetics BTRY 4830/6830; PBSB.5201.01 Lecture20: Haplotype testing and Minimum GWAS analysis steps Jason Mezey jgm45@cornell.edu April 17, 2017 (T) 8:40-9:55 Announcements Project
More informationIntroduction to NGS analyses
Introduction to NGS analyses Giorgio L Papadopoulos Institute of Molecular Biology and Biotechnology Bioinformatics Support Group 04/12/2015 Papadopoulos GL (IMBB, FORTH) IMBB NGS Seminar 04/12/2015 1
More information