HLA and Next Generation Sequencing it s all about the Data

Similar documents
MHC Region. MHC expression: Class I: All nucleated cells and platelets Class II: Antigen presenting cells

NEXT GENERATION SEQUENCING Whole Gene Sequencing

Marcelo Fernández-Viña

EFI 2016 DEBATE: WHOLE GENE VERSUS EXONIC SEQUENCING. Dr Katy Latham Stance: Whole gene sequencing should be the norm for HLA typing

Next Generation Sequencing

Introducing NGS and Holotype HLA EAP User Experiences

HLA-Typing Strategies

Outline General NGS background and terms 11/14/2016 CONFLICT OF INTEREST. HLA region targeted enrichment. NGS library preparation methodologies

SeCore SBT Sequence Based Typing

Basics of RNA-Seq. (With a Focus on Application to Single Cell RNA-Seq) Michael Kelly, PhD Team Lead, NCI Single Cell Analysis Facility

Is NGS for everyone? Is NGS for everyone? Factors to consider in selecting a platform and approach. Costs as incentive and barrier

Next-Generation Sequencing. Technologies

THE WHOLE GENE: APPLICATION AND IMPLEMENTATION OF WHOLE GENE SEQUENCING IN THE CLINICAL LABORATORY.

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Incorporating Molecular ID Technology. Accel-NGS 2S MID Indexing Kits

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

NEXT GENERATION SEQUENCING. Farhat Habib

Setting Standards and Raising Quality for Clinical Bioinformatics. Joo Wook Ahn, Guy s & St Thomas 04/07/ ACGS summer scientific meeting

VALIDATION OF HLA TYPING BY NGS

Summary of Proposed Revisions to the 2013 Standards November 2014

Sequencing technologies. Jose Blanca COMAV institute bioinf.comav.upv.es

Genome 373: Mapping Short Sequence Reads II. Doug Fowler

Deep Sequencing technologies

A Crash Course in NGS for GI Pathologists. Sandra O Toole

Get to Know Your DNA. Every Single Fragment.

Lecture 7. Next-generation sequencing technologies

A*01:02, 68:02 B*15:10, 58:02 DRB1*03:01, 12:01

Next Generation Sequencing of HLA: Challenges and Opportunities in the era of Precision Medicine. Dr. Paul Keown, 2016

We are IntechOpen, the world s leading publisher of Open Access books Built by scientists, for scientists. International authors and editors

Analysis of neo-antigens to identify T-cell neo-epitopes in human Head & Neck cancer. Project XX1001. Customer Detail

NextGen Sequencing Technologies Sequencing overview

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

Quality assurance in NGS (diagnostics)

Analytics Behind Genomic Testing

Principles of HLA for clinicians

GENOTYPING-BY-SEQUENCING USING CUSTOM ION AMPLISEQ TECHNOLOGY AS A TOOL FOR GENOMIC SELECTION IN ATLANTIC SALMON

Studying the Human Genome. Lesson Overview. Lesson Overview Studying the Human Genome

Certificate of Analysis of the Holotype HLA 24/7 Configuration A1 & CE

PrimePCR Assay Validation Report

Introduction to the MiSeq

Introductie en Toepassingen van Next-Generation Sequencing in de Klinische Virologie. Sander van Boheemen Medical Microbiology

HaloPlex HS. Get to Know Your DNA. Every Single Fragment. Kevin Poon, Ph.D.

PCR-SSP primer mixes for KIR3DL3 non-synonymous polymorphism, and SNP linkage (L) reactions.

Genomic Technologies. Michael Schatz. Feb 1, 2018 Lecture 2: Applied Comparative Genomics

Processing Ion AmpliSeq Data using NextGENe Software v2.3.0

Outline. General principles of clonal sequencing Analysis principles Applications CNV analysis Genome architecture

Targeted Sequencing in the NBS Laboratory

NGS in Pathology Webinar

2nd (Next) Generation Sequencing 2/2/2018

Genomic resources. for non-model systems

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Next-generation sequencing and quality control: An introduction 2016

C3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère

BST 226 Statistical Methods for Bioinformatics David M. Rocke. March 10, 2014 BST 226 Statistical Methods for Bioinformatics 1

NGS technologies: a user s guide. Karim Gharbi & Mark Blaxter

HLA-DR TYPING OF GENOMIC DNA

QIAseq SPE technology for Illumina : Redefining amplicon sequencing

SNP calling and VCF format

Matthew Tinning Australian Genome Research Facility. July 2012

RIPTIDE HIGH THROUGHPUT RAPID LIBRARY PREP (HT-RLP)

Welcome to the NGS webinar series

Resolution of Ambiguous HLA Genotyping in Korean by Multi-Group-Specific Sequence-Based Typing

QIAseq Targeted Panel Analysis Plugin USER MANUAL

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis

Ecole de Bioinforma(que AVIESAN Roscoff 2014 GALAXY INITIATION. A. Lermine U900 Ins(tut Curie, INSERM, Mines ParisTech

Human Genetic Variation. Ricardo Lebrón Dpto. Genética UGR

UHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009

Revolutionize Genomics with SMRT Sequencing. Single Molecule, Real-Time Technology

Phenotype analysis: biological-biochemical analysis. Genotype analysis: molecular and physical analysis


Design a super panel for comprehensive genetic testing

Next- gen sequencing. STAMPS 2015 Hilary G. Morrison Joe Vineis, Nora Downey, Be>e Hecox- Lea, Kim Finnegan

Application of high-throughput, high-resolution and cost-effective next generation sequencing-based large-scale HLA typing in donor registry

SEQUENCING FROM SAMPLE TO SEQUENCE READY

Illumina s Suite of Targeted Resequencing Solutions

Next-generation sequencing technologies

Gap Filling for a Human MHC Haplotype Sequence

Nanopore long read sequencing for detection of point mutations and structural variants

Lesson Overview. Studying the Human Genome. Lesson Overview Studying the Human Genome

Quality Control of Next Generation Sequence Data

Aaron Liston, Oregon State University Botany 2012 Intro to Next Generation Sequencing Workshop

Next Generation Sequencing in Genetic Diagnostics Alan Pittman, PhD

Add 2016 GBS Poster As Slide One

Next Gen Sequencing. Expansion of sequencing technology. Contents

RNA-Seq data analysis course September 7-9, 2015

1

Services Presentation Genomics Experts

MiSeq. system applications

HISTO TYPE SSP typing kits for HLA Class I + II low resolution SSP typing kits for HLA Class II high resolution

Chapter 7. DNA Microarrays

resequencing storage SNP ncrna metagenomics private trio de novo exome ncrna RNA DNA bioinformatics RNA-seq comparative genomics

Manipulating DNA. Nucleic acids are chemically different from other macromolecules such as proteins and carbohydrates.

Next Generation Sequencing. Target Enrichment

TECH NOTE Pushing the Limit: A Complete Solution for Generating Stranded RNA Seq Libraries from Picogram Inputs of Total Mammalian RNA

Schedule of Accreditation issued by United Kingdom Accreditation Service 2 Pine Trees, Chertsey Lane, Staines-upon-Thames, TW18 3HR, UK

CBC Data Therapy. Metagenomics Discussion

Report. B2. dbmhc. Introduction

scgem Workflow Experimental Design Single cell DNA methylation primer design

Admera Health RUO Services

Next Generation Sequencing. Jeroen Van Houdt - Leuven 13/10/2017

Transcription:

HLA and Next Generation Sequencing it s all about the Data John Ord, NHSBT Colindale and University of Cambridge BSHI Annual Conference Manchester September 2014

Introduction In 2003 the first full public version of the human genome sequence was announced. It took 13 years to complete this at a cost of $3 billion The sequencing was all done using Sanger methodology Today we can do the same job overnight using Next Generation Sequencing (NGS) technology at a cost of about $3000 NGS can achieve this because it sequences hundreds of thousands of separate DNA templates in parallel and at high speed. However, one NGS run generates vast amounts of data which require a range of powerful bioinformatics tools to analyse it.

Next Generation Sequencing for HLA Genes Brief overview of NGS How NGS can improve HLA typing Data analysis for NGS HLA sequences

Next Generation Sequencing By Clonal Amplification Genomic DNA PCR Strong Sequencing Signal Amplicons Clonal Amplification Single Strand Binding Substrate

Paired End Sequencing Extending the data from fixed length sequencing runs Sequence 1: 250 bp Seq Primer 1 DNA Fragment 300 bp Seq Primer 2 Sequence 2: 250 bp Combined Paired Sequence: 300 bp

HLA typing standards The gold standard for HLA is typing to a single allele level and this is recommended for unrelated haematopoietic stem cell transplantation Typing Stem Cell Registry donors to allele level precludes the need for extended re-typing and can speed up the matching and donor selection process Sanger sequencing (SBT) can achieve allelic level typing in some circumstances but often requires further steps using Group Specific Primers because of the problem of assigning heterozygous base combinations (phase ambiguity or the cis/trans problem) SBT using targeted exons of HLA genes can give rise to ambiguities where differences between alleles lie outside of the regions sequencied. Can HLA typing based on Next Generation Sequencing give us accurate and reliable one step allele level HLA typing?

Phasing Ambiguity (the Cis/Trans Problem) Sanger Sequencing NGS Systems T C G T T C C G T G56 A G A G A G C T T T C C A G A G A G C A C A C A C G T G T G T A C G C A T???? T A

Allele Ambiguity Where differences between alleles lie outside the region sequenced then you will get allele ambiguity. For example, the difference between C*03:20N and C:03:03:01 is a single base change (C>T) in exon 1. this generates a premature stop codon which results in non-expression of the protein (Null allele). gdna 20 30 40 50 C*03:03:01 TGGCGCCCCG AACCCTCATC CTGCTGCTCT CGGGAGCCCT C*03:20N --------T- ---------- ---------- ----------

NGS Target Preparation Exon Based 5 UTR Ex 1 Ex 2 Ex 3 Ex 4 Ex 5 3 UTR Fragmentation Based 5 UTR Ex 1 Ex 2 Ex 3 Ex 4 Ex 5 3 UTR

NGS Data Example The data from the sequencer can come in several formats we use FASTQ This is a text file containing data from one sample A sequence identifier The DNA sequence itself (with the indexes and linker sequences stripped off) FF10F = Q37, Q16, Q15,Q37 A separator character The quality score for each base (Phred score) A typical FASTQ file for HLA-A,B,C, DRB1 contains ~250-300,000 sequences and is 20 to 40MB in size when compressed Each paired end sequencing run generates 2 FASTQ files per sample: up to 80MB of data Each of the half million resulting sequences needs to be assessed, aligned to the genome and then to the IMGT HLA allele database to figure out the HLA type.

NGS HLA Data Analysis NGS produces lots of data very quickly but it consists of hundreds of thousands of relatively short sequences (typically 150 to 400 bp) and this presents a major challenge for the bioinformatician The target preparation method can affect the approach to analysis Once the sequencing has been completed there are several questions to answer for each piece of data: Is the sequence long enough and of high enough quality? Which gene/part of the gene does the sequence come from? (alignment) Which haplotype does the sequence come from? (phasing) How much of the gene has been sequenced (breadth of coverage) How many times has each base been sequenced? (depth of coverage)

Colindale NGS HLA Project Long range PCR to amplify whole genes for HLA-A, -B, C 2 and DQB1 1 Two amplicons for DRB1: 5 Exon 1 + Intron 1, Exon 2 to 3 end 2. Illumina NextEra enzyme based tagmentation kit to fragment and index the long range PCR products Illumina MiSeq platform for sequencing a pool of 95 samples for 5 loci Commercial software for alignment and allele assignment. 1 Hosomichi et al BMC Genomics 2013 14:355 2 Shiina et el Tissue Antigens 2012 80: 305-316

An approach to analysis of NGS HLA data hg19 reference Paired end FASTQ Sequence File Align to HLA gene in hg19 reference Detect SNPs, Insertions and Deletions Phase reads using heterozygous SNPs Generate 2 consensus sequences Analysis Pipeline Tools ==================== Genome Analysis Toolkit BWA Picard Samtools In house PERL scripts Integrated Genome Viewer Align each consensus to IMGT/HLA Data HLA genotype Based on Hosomichi et al BMC Genomics 2013 14:355

Determine HLA Genotype IMGT HLA reference fully sequenced Phased sequence including non-coding regions IMGT HLA reference data missing from non-coding regions Phased sequence with just the coding regions

Reference Sequence Breadth of coverage Sequencing reads of each fragment 12X Read Depth Depth of coverage or Read depth (X): -Number of reads covering a single base 29X Read Depth - Average number of reads covering the target Breadth of Coverage or Sensitivity (%): - Proportion of genomic target covered to a pre-determined depth

Integrated Analysis Packages Omixon Target Ion Torrent HLA Plug In GENDX NGSengine Conexio Wellcome Trust Centre for Human Genetics NextGENe HLA module

Omixon Target Result Screen

HLA-A Alignment GENDX

HLA-C Alignment GENDX

Choice of Analysis Software There are a growing number of high quality commercial software packages available. Choice of package should be based on suitability for your platform and requirements and in-house expertise. There are also some non-commercial packages which may do the job but are less likely to have a unified user interface. Choice of package should also be based on careful validation with a panel of well-characterised samples.

Data storage 125TB Microsoft Azure cloud MiSeq Run 66 GB (short term storage) 10,000 donors 631 GB Stored for 30 years 6 GB Sanger Data 95 Donors 250 MB 95 Donors, 5 loci, 190 FASTQ files

Summing Up Next Generation Sequencing will deliver reliable and accurate HLA types Choice of analysis software is of prime importance Not yet cost effective for routine low throughput or urgent typing Could be useful for research projects where samples can be batched Need to consider carefully what data to keep and how to store it

NGS feel the fear and do it anyway

Dramatis Personae NHSBT Cristina Navarrete Lisa Creary John Girdlestone Sue Davey Colin Brown Zareen Goburdhun Monica Kyriacou John Ord University of Cambridge John Todd Howard Martin Kim Brugger Sam Haldenby Anthony Rogers (Willem Ouwehand)

NGS Selected Reading Hosomichi et al, : Phase-defined complete sequencing of the HLA genes by next-generation sequencing. BMC Genomics 2013, 14:355 Lange et al, : Cost-efficient high-throughput HLA typing by Miseq amplicon sequencing. BMC Genomics 2014 15:63 Shiina et al,: Super high resolution for single molecule-sequence-based typing of classical HLA loci at the 8-dogot level using next generation sequencers. Tissue Antigens 2012 80: 305-316 H. Ehrlich HLA DNA typing: past, present, and future. Tissue Antigens 2012 80: 1-11 Wang et al,: High-throughput, high-fidelity HLA genotyping with deep sequencing. PNAS 2012 109:8676 8681 Michael L Metzler : Sequencing technologies the next generation. Nature Reviews/Genetics 2010 11:31-46 NGS Analysis Software Sources http://www.omixon.com/hla/ http://www.gendx.com/products/ngsengine http://www.conexio-genomics.com/ http://www.softgenetics.com/nextgene.html http://www.lifetechnologies.com/