In silico variant analysis: Challenges and Pitfalls
|
|
- Eunice Butler
- 5 years ago
- Views:
Transcription
1 In silico variant analysis: Challenges and Pitfalls Fiona Cunningham Variation annotation coordinator EMBL-EBI
2 Sequencing -> Variants -> Interpretation Structural variants SNP? In-dels What is known about these variants? What can you say about unknown variants?
3 Deciphering gene-disease relationships 3 billion bases 4 million variants 21,000 coding variants missense protein truncating variants??? Possible disease calling variants
4 Deciphering gene-disease relationships 3 billion bases 4 million variants 21,000 coding variants missense protein truncating variants??? Possible disease calling variants
5 New assembly, GRCh38 = new variants 3.6Mb novel sequence 153 genes that are only on alts
6 Deciphering gene-disease relationships 3 billion bases 4 million variants 21,000 coding variants missense protein truncating variants??? Possible disease calling variants
7 Discrepant variant calling RefSeq vs GENCODE BRCA2 transcript SNP rs c.7397 RefSeq transcript NM_ C>T Ancestral allele, non-reference GENCODE / Ensembl transcript ENST T>C Non-ancestral but in GRCh38
8 Discrepant variant calling RefSeq vs GENCODE BRCA2 transcript SNP rs c.7397 ENST : SNP called in 91% of AFR NM_ : SNP called in 9% of AFR
9 Deciphering gene-disease relationships 3 billion bases 4 million variants 21,000 coding variants missense protein truncating variants??? Possible disease calling variants
10 Ensembl Variant Effect Predictor (VEP) In silico analysis of variants VEP predicts consequences of all variants: SNPs, indels, Structural variants: insertion, deletion, duplication, tandem duplication For effects on coding and non-coding regions In any species Flexible and extensible Commitment to user support McLaren et al (Bioinformatics), McCarthy et al (GenomeMedicine)
11 ESP OMIM Regulation Ensembl dbs Variants Ensembl dbs Compara Ensembl dbs Core Ensembl dbs
12 VEP is built on Ensembl
13
14 VEP web Input form
15 VEP web - output
16 Information For known variants: ESP Natural variation data: allele frequencies, ethnicity, MAF from 1000 Genomes, ESP populations, ExAC Clinical significance data (ClinVar), LOVD data For all variants: Gene and transcript identifiers, exon and intron numbers Consequence (SO terms), SIFT, Polyphen Genomic, cdna, CDS and protein coordinates Amino acid and codon change TFBS: position within motif, if high info position, motif score change HGVS nomenclature
17 VEP: instant, web, script, REST Instant VEP Web interface Perl script REST API XML Maximum speed: up to 3,000,000 variants an hour Perl script: most extensible and flexible, off-line for private data REST is optimal for integration into other systems 15,000 variants per second End points for variants and SVs McLaren et al (Bioinformatics), McCarthy et al (Genome Medicine)
18 VEP plugins
19 Deciphering gene-disease relationships 3 billion bases 4 million variants 21,000 coding, missense variants loss of fu missense protein truncating variants G2P??? Possible disease calling variants
20 Integrating curated data: gene 2 phenotype Collaboration with David FitzPatrick and Helen Firth
21 Gene2phenotype: search
22 Gene2phenotype: data
23 G2P Gene to Phenotype Database DD G2P Cardiac G2P G2P Ear G2P Eye G2P Skin G2P
24 Acknowledgements Funding European Commission Framework Programme 7
25 Acknowledgements G2P Anja Thormann David FitzPatrick Helen Firth
26
27 Future Regulatory regions and eqtls Nearest gene plugin Using LD to infer eqtls GTEx project data Splicing for VEP dbscsnv plugin Indels by SIFT (e.g. Provean) Protein structure, pathways (Reactome, PDBe) Integration with gene lists e.g. DDG2P
28 Deciphering gene-disease relationships 3 billion bases 4 million variants 21,000 coding variants 10,000 non-synonymous variants loss of function variants
29 How VEP works 5 UTR Intronic Input Regulatory ID: rs12345 MAF: 0.05 PubMed: , Ref Alt Leu Asn His TTG AAC CAT TTG AAA CAT Leu Lys His Missense Core Regulatory Variants Ensembl dbs
30 VEP script - advantages Faster Off-line access Any species Your data is secure Additional datasets Extend functionality
31 Link to variants in Ensembl with disease
32 VEP Regulatory data
33 VEP Cell types Regulatory build: 17 cell types: segmentation analysis