Previously on... Andreas Laner MGZ München

Size: px
Start display at page:

Download "Previously on... Andreas Laner MGZ München"

Transcription

1 Previously on... Andreas Laner MGZ München

2

3 Overview: Typical human genome: 45 mio variants 99.9% SNV s short indels Also: few hundred thousand private variants HGMD: variants liked to phenotype GenCode: about 100 genes have no effect because the are observed in complete deletion state in healthy people Usually benign (silent/synonymous) and potentially deleterious (truncating, consensus SD/SA, missense depending on context/location/conservation etc.) Repeat expansion diseases: common but NGS (panels/wes) not suitable for detecting these variants (wait for WGS) Unusual variants: Alpha-globin: loss of intergenic regulatory elements kb distant abolish expression Gene silencing by deletion of a gene nearby via an antisense RNA! (like EPCAM-MSH2) Alpha-thalassemia (Malaysia): Intragenic SNV causes activation of an upstream regulatory element (GATA-1 de novo creation region becomes a promotor) Unusual mechanism: Trans genomic contextual variant: podocin dimers/multimers, missense p.r229q with relatively high frequency: homozygous: no phenotype but pathogenic when heterozygous with other variant Compensated pathogenic deviation (CPD). Path / not path in orthologues species! Cis mutations in the same protein can compensate an otherwise pathogenic missense variant Message: Be carefull when interpreting pathogenicity, sometimes it is more complicated.

4 Overview: standard NGS workflow and Sequencing platforms Illumina (bridge amplification) PacBio (long reads) Nanopore (single molecule) Roche + Helicos (not dead.yet) Illumina made the race; new 4000 machines do 8 genomes 50x in one run (X10 even more) Proton PGM: Ion torrent: ph-meter, mono nucleotide runs are difficult to analyse correctly; Polonies: de-phasing issue General NGS problems: Biases: primer may fail, DNA quality, polymerases not perfect, contamination, GC-content, variable amplification Rationale for single molecule sequencing: avoid PCR amplification that causes many of the problems PacBio: Very long read length (14.000/ > ) highest consensus accuracy, greatest uniformity, you can seq native DNA Error profile still not suitable for SNV routine Dx. Possible solution: circular consensus sequencing Epigenetics application: methylated A s need a little more time to be incorporated, can be measured New technologies Nanopores: Not bases are read but words, high potential, not yet routine minion, benchtop seq, short reads, high error rate Message: Each currently available technology has its advantages and downsides

5 Ask bioinformaticians the exact question, because they will always give an answer Variant calling tools will start by calling EVERY potential variant they observe True positive and false positives (library prep artefacts, PCR artefacts, seq errors, mapping issues, algorithm issues) Various biases to be considered: Strand bias: variant call on one strand only (or a strong bias on one strand) Tale distance bias: significant positional bias within reads for variant calls across all samples Maximum strand bias Indel identification makes problems Local realignment helps in identifying indels in certain instances CNAG / NIST: Actually very good for SNV s, but for indels concordance not so good GChr37 to GChr38

6 UCSC browser is so important that the university changed its name for it! UCSC now accepts HGVS nomenclature term for search Database consistencies: variants with known phenotypic association known (LOVD most of them) only of them are listed in all databases. CAVE: due to copying of datasets a private variant could pop up in many DB s implying that it was found more than once ExAC (gnomad) new database, larger than ExAC Don t use dbsnp as a filter (polymorphism/clinical significant etc..), use the common SNPs track instead Hg19 still better annotated

7 UCSC browser is so important that the university changed its name for it! UCSC now accepts HGVS nomenclature term for search Database consistencies: variants with known phenotypic association known (LOVD most of them) only of them are listed in all databases. LOVDen CAVE: due to copying of datasets a private variant could pop up in many DB s implying that it was found more than once ExAC (gnomad) new database, larger than ExAC Don t use dbsnp as a filter (polymorphism/clinical significant etc..), use the common SNPs track instead Hg19 still better annotated

8 UCSC browser is so important that the university changed its name for it! UCSC now accepts HGVS nomenclature term for search Database consistencies: variants with known phenotypic association known (LOVD most of them) only of them are listed in all databases. CAVE: due to copying of datasets a private variant could pop up in many DB s implying that it was found more than once ExAC (gnomad) new database, larger than ExAC Don t use dbsnp as a filter (polymorphism/clinical significant etc..), use the common SNPs track instead Hg19 still better annotated

9 No sharing = no DNA Dx that s simple DMD example: linking clinical information to genetical information is key to Dx Differences in quality and content of databases: one inch deep and a mile wide and vice versa Also: There are too many DB s Merge of DB s is desireable Problems of funding and sustainability Provide as much info as possible, this avoids requests.true We would all profit enormously if we would share our data Convince your collegues (and your boss) and bring in the data

10 Phenotype: phenotypic abnormality which can be observed Starting point: inconsistent use of phenotypic descriptions, spelling mistakes etc. This makes a computer based analysis very difficult/ impossible Annotation of disease and definition of the HPO terms HPO controlled vocabulary for phenotypic abnormalities in human Freely available For the community from the community Novel approaches towards differential diagnosis (Phenomyzer) Phenomyzer: like a blast search on phenotypic terms Standard patient descriptions Cross species phenotypes

11 VEP: variant effect predictor Why do we need Genome browsers? Make the huge amount of information accessible The reference genome is actually a mix of different individuals Golden transcripts (Ensembl and Havana agree) different methods, same result If there is a CCDS ID: transcript probably exists Gene ontologies : controlled vocabulary for genes and descriptions Reference alleles: could be carrier of a rare susceptibility allele (e.g. allele T is pathogenic, still in the reference sequence). Ensembl browser: always the F strand is used to display variants dbsnp report on ether strand EBI training page with tutorials and webinars on youtube Ensembl team comes for free

12 WES very successful, WES will be replaced/ supported by WGS Discovery rate: single 25%; trio: 35% Still many limitations: here emphasis on annotation and filtering Different gene level information like expression profile (GTEx, expression atlas, RNAseq) Gene ontology and biological pathway information (KEGG, etc.) Different phenotype level information like LSDB s, animal models Automatic systems: Exome walker, Exomizer, Extasy, Mirtrios, OMIMexplorer, OVA, wkggseq to screen for disease causing genes based on pedigree and phenotypic data Semi-automatic systems: ANNOVAR, BIERapp, FILTUS, FMFFilter, Gemini, Vanno, VarAFT etc. Different systems/ methods all have special advantages and downsides, there is no best way. Davies et al: choice of transcript and software can have large effects on the variant annotation! Variant filtration / prioritization: good HPO description is mandatory, MOI should be known UMD-predictor: precomputed all possible substitutions for all nucleotides of any human transcripts Combined multiple feature in unique score (0-100) There is no ideal annotation/filtration systems You should know what you possible miss which each pipeline WES to WGS needs adaptions

13 Summary of the summary. "If I have seen further, it is by standing on the shoulders of giants. The dwarf sees farther than the giant, when he has the giant's shoulder to mount on

14 Summary of the summary.