Applied Bioinformatics

Similar documents
Transcription:

Applied Bioinformatics In silico and In clinico characterization of genetic variations Assistant Professor Department of Biomedical Informatics Center for Human Genetics Research

ATCAAAATTATGGAAGAA ATCAAAATCATGGAAGAA Single Nucleotide Polymorphisms About every 300 th nucleotide pair is polymorphic Used as markers for genetic studies 14,653,228 validated SNPs in dbsnp SNPs William S. Bush 2014 2

dbsnp http://www.ncbi.nlm.nih.gov/projects/snp/ Contains submitted data and computed content Identified by RefSNP numbers (rs1234) Information on the position, frequency, relevant populations http://www.ncbi.nlm.nih.gov/books/ NBK21101 William S. Bush 2014 3

What does this SNP/Allele do? Ensembl Consequence Type http://ensembl.org Intergenic Downstream Within non- coding gene Upstream Intronic Splice Site Intronic Non- synonymous coding Synonymous coding 5 UTR 3 UTR FrameshiB coding Upstream regulatory region William S. Bush 2014 4

SIFT and PolyPhen SIFT http://sift.jcvi.org/ PolyPhen-2 http:// genetics.bwh.harvard.edu/pph2/ Use known deleterious mutations to predict the impact of similar mutations in similar proteins William S. Bush 2014 5

Expression QTLs ~3.3 million SNPs + ~14,925 expression probes William S. Bush 2014 6

U Chicago eqtl Browser http://eqtl.uchicago.edu/cgi-bin/gbrowse/ eqtl/ Searchable by SNP, Gene, Region, etc Mostly cis eqtls (within the regulatory region) Multiple studies and tissue types William S. Bush 2014 7

Linkage Disequilibrium When a mutation occurs near an existing SNP, the two become linked on the chromosome Two SNPs that flow through the population in successive generations said to be in LD Assuming recombination occurs at random points throughout the genome, the LD between two SNPs eventually fades William S. Bush 2014 8

The International HapMap http://www.hapmap.org Catalog of SNPs in 11 human subpopulations Contains roughly 4 million SNPs Project Allows calculation of LD, facilitating GWAS studies William S. Bush 2014 9

Genome-wide Association Linkage Disequilibrium Studies Compare Controls Cases William S. Bush 2014 10

GWAS Associations Thousands of confirmed associations for hundreds of human phenotypes http://www.genome.gov/gwastudies/ William S. Bush 2014 11

Genetic Association Database http://geneticassociationdb.nih.gov/ A catalog of mostly non-gwas studies Contains both positive and negative results A variety of traits and disease classes are represented William S. Bush 2014 12

OMIM Online Mendelian Inheritance in Man http://www.ncbi.nlm.nih.gov/omim Contains narratives and extensive references for the genetic basis of nearly every disease Manually curated by the community (think WikiPedia) Also contains narratives by gene William S. Bush 2014 13

Human Gene Mutation Database http://www.hgmd.cf.ac.uk/ac/index.php From the makers of Transfac, another subscription-based database Public and private versions Has extensive lists of references for known mutations and their categories William S. Bush 2014 14

In Clinico Characterizaton Left-over blood samples from clinical care DNA linked to deidentified electronic medical records William S. Bush 2014 15

Phenome-Wide Association Associate a single variant to many traits Popular in electronic medical records Studies William S. Bush 2014 16

Genetic PREDICTion Uses specific genetic variants to predict a person s drug response Clopidogrel response Warfarin dosing William S. Bush 2014 17

Direct to Consumer Genetics William S. Bush 2014 18

Sequencing GWAS is quickly becoming boring Sequencing is the new fun thing to do Whole-exon designs Whole-genome sequencing RNA-seq William S. Bush 2014 19

1000 Genomes Project http://www.1000genomes.org/ Goal is to sequence the whole genomes of ~2500 people Currently have sequence for 180 samples Exons will be sequenced with better quality Data are being loaded into a special Ensembl-style browser William S. Bush 2014 20