Oral Cleft Targeted Sequencing Project
|
|
- Esmond West
- 6 years ago
- Views:
Transcription
1 Oral Cleft Targeted Sequencing Project Oral Cleft Group January, 2013 Contents I Quality Control 3 1 Summary of Multi-Family vcf File, Jan. 11, Analysis Group Quality Control (Proposed Protocol) vcftools Targeted Region Capture & Read Generation 7 4 Sequence Alignment and Processing 7 5 Sample QC 7 6 Relationship QC 7 7 De novo and inherited variant calling 8 8 Generation of Multi-family SNP genotypes 9 II Descriptive Statistics 10 9 SNPs and SNVs MAFs and Heterozygosity 10 III Gen. Epi Polymorphisms (maf 0.01) LD & Ethnicity Linkage & Association snpstats trio Rare Variants (maf < 0.01) Scan Statistic for Rare Variants in Trios de novo mutations
2 List of Tables 1 Ethnicites genotype flags List of Figures 1 Missingness GQ histogram
3 Part I Quality Control 1 Summary of Multi-Family vcf File, Jan. 11, 2013 In total, the vcf file contains 4,495 individuals and 175,189 markers. The target regions span 6.7 MB, for a marker density of 1 marker per 38 bp. Of these 4,495 subjects in the vcf file only 4,139 are contained in the pedigree file. The breakdown of subjects in the vcf and pedigree file, by ethnicity is given in Table 1. In Figure 1 we display the missingness per subject and per site, and in Figure 2 the genotypic quality (GQ) averaged across subjects. 2 Analysis Group Quality Control (Proposed Protocol) We begin by defining the characteristics on which to filter, and what the criterion for exclusion is. We divided the filters into three types (as does vcftools), Genotype, Subject and Site filters. The following is an outline of our initial protocol. 1. Genotype Filters Remove all non- PASS flagged genotypes (See Table 2) vcftools --remove-filtered-geno FLAGNAME Filter on genotypic GQ 40 and Depth 10 vcftools --mingq 40 --mindp Subject Filters Remove Subject with missingness vcftools --mind Remove Subjects with average coverage 20 vcftools --min-indv-meandp Site Filters Remove Markers with missingness 0.05 vcftools --geno 0.95 Remove markers with mean depth 20 vcftools --min-meandp 20 Di-allelic variants only vcftools --vcf file1.vcf --min-alleles 2 --max-alleles 2 Quality, Filter and Info are not available in our vcf file. Should we remove markers with mean GQ 90? Perhaps we should use the median?. We have not implemented a site-wide filter on GQ as of Jan. 11., We should investigate Mendelian inconsistencies and Hardy-Weinberg equilibrium. 3
4 2.1 vcftools We implement the above filter with one run of vcftools in which we call the command. vcftools --gzvcf $vcf \ --remove-filtered-geno NRC \ --remove-filtered-geno SB1 \ --remove-filtered-geno IRC \ --remove-filtered-geno MMQSD50 \ --remove-filtered-geno PB10 \ --remove-filtered-geno MQD30 \ --remove-filtered-geno DETP20 \ --remove-filtered-geno MVC4 \ --remove-filtered-geno HPMR5 \ --remove-filtered-geno MVF5 \ --remove-filtered-geno RLD25 \ --mingq 40 \ --mindp 10 \ --mind \ --min-indv-meandp 20 \ --geno 0.95 \ --min-meandp 20 \ --min-alleles 2 \ --max-alleles 2 \ --recode \ --out $out \ 1> $logfile 2> $errfile bgzip $out.recode.vcf tabix $out.recode.vcf.gz It took almost six and a half hours to run, but took very little memory. Job (qc) Complete User = syounkin Queue = gwas.q@compute-0-43.local Host = compute-0-43.local Start Time = 01/10/ :59:15 End Time = 01/11/ :17:38 User Time = 06:14:17 System Time = 00:00:55 Wallclock Time = 06:18:23 CPU = 06:15:12 Max vmem = M Exit Status = 0 I do not know the order in which the filters were processed. This could make a difference. I suppose running it through the filter a second time could alleviate some of those concerns. Although the results still could be order-dependent, it is likely that the differences will be insignificant. 4
5 Ethnicity Count European 968 Chinese 1,371 Filipino 1,776 Guatemalan 24 In VCF and Pedigree 4,139 In VCF 4,495 In Pedigree 4,998 Table 1: Ethnicites Flag Description NRC Unable to grab readcounts for variant allele SB1 Reads supporting the variant have less than 0.01 fraction of the reads on one strand, but reference supporting reads are not similarly biased IRC Unable to grab any sort of readcount for either the reference or the variant allele MMQSD50 Difference in average mismatch quality sum between variant and reference supporting reads is greater than 50 PB10 Average position on read less than 0.10 or greater than 0.9 fraction of the read length MQD30 Difference in average mapping quality sum between variant and reference supporting reads is greater than 30 DETP20 Average distance of the variant base to the effective 3 end is less than 0.20 MVC4 Less than 4 high quality reads support the variant HPMR5 Variant is flanked by a homopolymer of the same base and of length greater than or equal to 5 MVF5 Variant allele frequency is less than 0.05 RLD25 Difference in average clipped read length between variant and reference supporting reads is greater than 25 Table 2: Flags for genotypes found in vcf file. All genotypes with any of these flags were removed with vcftools. (Presumably, to remove a genotype the call is set to missing.) 5
6 Histogram of missing.snp Histogram of missing.subject Frequency Frequency missing.snp missing.subject Figure 1: Missingness Cleft Targeted Sequencing Frequency Mean GQ per marker Figure 2: GQ histogram 6
7 3 Targeted Region Capture & Read Generation Do we need to discuss the methods behind the physical targeting of the regions? I m curious to know how we created the fragments for sequencing. 4 Sequence Alignment and Processing Data is aligned with BWA1 v0.5.9 with quality trimming (-q 5) to remove low quality bases at the ends of reads to the GRCh37-lite reference sequence. Data from individual runs is merged, if necessary, with Picard v1.46 ( All reads are deduplicated using Picard MarkDuplicates. 5 Sample QC Coverage across the target regions is evaluated using RefCov2 and >70% of targets must reach an average coverage of 20X in order to pass QC. If genotypes from another platform are available, a genotyping concordance QC is performed by comparing genotypes called using Samtools and to those from the outside platform. Any samples with an overall concordance of below 90% are flagged. Columns from this QC report are listed below: 1. SNPs called: SNPs reported by Samtools 2. With Genotype: The SNPs called are compared to the imported SNPs by position, so only the SNPs in common with the external data can be compared. 3. MetMinDepth: The SNP sites have to have a minimum depth of coverage at that position of 20X. Anything with lower coverage will be ignored in the concordance check. 4. Reference: How many SNP calls match the reference sequence (ie, build 37). 5. RefMatch: How many of the SNP sites that match the reference sequence also match the external array data. 6. Variant: How many SNP calls are different than the reference sequence (ie, build 37). 7. VarMatch: How many variant SNP sites match the external array data. Whether or not the different calls changed from heterozygous to homozygous or vice versa for both reference mismatches and variant calls is also evaluated. Finally, the % concordance is calculated as: (RefMatch + VarMatch)/MetMinDepth. 6 Relationship QC All offspring are required to have a significant relationship with their parents. To evaluate this, BEAGLE s fastibd command is used to calculate the identity by descent between children and their expected parents. This is done at the family level using both common and private SNPs within the target region. Variant sites are included in the calculation with the following criteria: the site is in the target region, is variant in at least one individual, and has 20X coverage in all individuals. After fastibd evaluation of these sites, the number of shared markers between each parent-child pair is calculated. If every marker is shared, 7
8 then those two individuals share 50% of their genome (this is the max that fastibd can detect, since it doesn t consider both haplotypes together when comparing individuals). If less than 40% of the target region is shared between parent and child in this way, the family is flagged as failing QC. If a family fails the initial, family-level QC evaluation (i.e., one parent is not highly related to child), then that entire family is subsequently evaluated as part of a pool containing all families failing the initial QC. For this cross-family IBD assessment, sites are selected as follows: the site is in the target region, is variant in any individual and individual genotypes are set to missing if coverage in that individual is <20X. As with the initial QC, if less than 40% of the target-region is shared between parent and child in this way, the family is flagged as failing the QC. If this cross-family QC identifies high IBD sharing between two ostensibly unrelated samples, manual checking is performed to confirm a sample swap. 7 De novo and inherited variant calling De novo variants and inherited variants were called using polymutt 0.11 ( com/ernfrid/polymutt, with the calling restricted to chromosomes containing target regions and all other options set to their defaults. GLF files were generated for input to polymutt using samtools-0.1.7a-hybrid ( with BAQ applied as in the following command: samtools-hybrid view uh some.bam samtools-hybrid calmd Aur refseq.fa 2> /dev/null samtools-hybrid pileup - -g r refseq.fa > output.glf Polymutt has two modes of variant calling, one for standard calling and one for de novo mutation calling. The VCF files for both of these modes were merged into a single VCF for each family and filters were applied. We used bam-readcount v0.4 ( with a minimum base quality of 15 (-b 15) to generate metrics (for both de novo and germline variant calls) and marked sites as filtered based on the following requirements: 1. Minimum variant base frequency at the site of 5% 2. Percent of reads supporting the variant on the plus strand 1% and 99% (variants failing these criteria are filtered only if the reads supporting the reference do not show a similar bias) 3. Minimum variant base count of 4 4. Variant falls within the middle 90% of the aligned portion of the read 5. Maximum difference between the quality sum of mismatching bases in reads supporting the variant and reads supporting the reference of Maximum mapping quality difference between reads supporting the variant and reads supporting the reference of Maximum difference in aligned read length between reads supporting the variant base and reads supporting the reference base of Minimum average distance to the effective 3 end1 of the read for variant supporting reads of 20% of the sequenced read length 8
9 9. Maximum length of a flanking homopolymer run of the variant base of 5. In addition to the above filters, a binomial test filter is applied to the de novo calls in order to remove likely false positive de novo calls. The input to the test is calculated by generating readcounts with base quality? 15 and mapping quality? 20 for all unaffected family members at a putative de novo mutation location. For each individual, reads are divided into either supporting the de novo allele or supporting some other allele. Using an assumed error rate of 0.01, the probability that the reads came from a binomial distribution where p = 0.01 (fraction of reads supporting the de novo allele) is calculated. If the resulting p-value is less than 10 4 for any one unaffected, the de novo prediction is marked as failing the filter. 8 Generation of Multi-family SNP genotypes Once all variant sites in all samples were predicted, sites were limited to the precise target space of the capture product, buffered by 500 bp on either side and aggregated into a list of segregating sites for the cohort. Each segregating site was (re)genotyped in all samples using polymuttvcf-0.01 (polymutt pos). The resulting genotypes were added to sites missing from the original VCF for each sample in order to distinguish between missing data and homozygous reference calls. The resulting single-sample VCF files, containing genotypes for all segregating sites, were subsequently merged using joinx1.6 ( All variant calls from this process are included in the final files. 9
10 Part II Descriptive Statistics 9 SNPs and SNVs Here we can breakdown the distribution of polymorphic markers (SNPs) vs. non-polymorphic (SNVs). 10 MAFs and Heterozygosity Here we can display the distribution of minor allele frequencies and heterozygosity across ethnicities. 10
11 Part III Gen. Epi. We begin with the common variants. Note that we refer to variants with estimated population allele frequency greater than 0.01 as polymorphisms. 11 Polymorphisms (maf 0.01) 11.1 LD & Ethnicity Here we can investigate LD structure and how it differs across ethnicities. Likewise with heterozygosity Linkage & Association We begin with four different formulations of the classic Transmission Disequilibrium Test. The TDT tests for an increase in the transmission rate from parent to offspring. If that rate is significantly greater than the expect rate under Mendelian inheritence of 1 2, then we conclude that transmission is preferred over non-transmission among the affected offspring, and the genetic marker is nearby the causal locus Clayton s regression-based test Allelic or genotypic. R Package: snpstats Given large-scale SNP data for families comprising both parents and one or more affected offspring, this function computes 1 df tests (the TDT test) and a 2 df test based on observed and expected transmissions of genotypes. Tests based on imputation rules can also be carried out Holger s gtdt and standard TDT Allelic or genotypic. R Package: trio 12 Rare Variants (maf < 0.01) 12.1 Scan Statistic for Rare Variants in Trios We have received and compiled the C++ source code for scan-trios from Dr. Iuliana Ionita- Laza at Columbia University Dept. of Biostatistics. Now we need to make the input files. The required input files are: pedigree, map, regions and weights. The pedigree file is slightly atypical, as the rows must be ordered to indicate trio structure. It does not use the information present in the pedigree file to order the data, it must be done manually. I do not know what the pedigree files look like that we have made using vcftools and plink de novo mutations 11
SNP calling. Jose Blanca COMAV institute bioinf.comav.upv.es
SNP calling Jose Blanca COMAV institute bioinf.comav.upv.es SNP calling Genotype matrix Genotype matrix: Samples x SNPs SNPs and errors A change in a read may due to: Sample contamination Cloning or PCR
More informationSingle Nucleotide Variant Analysis. H3ABioNet May 14, 2014
Single Nucleotide Variant Analysis H3ABioNet May 14, 2014 Outline What are SNPs and SNVs? How do we identify them? How do we call them? SAMTools GATK VCF File Format Let s call variants! Single Nucleotide
More informationGBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University
GBS Usage Cases: Non-model Organisms Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University Q: How many SNPs will I get? A: 42. What question do you really want to ask?
More informationSUPPLEMENTARY INFORMATION
Contents De novo assembly... 2 Assembly statistics for all 150 individuals... 2 HHV6b integration... 2 Comparison of assemblers... 4 Variant calling and genotyping... 4 Protein truncating variants (PTV)...
More informationS G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics
S G ection ON tatistical enetics Design and Analysis of Genetic Association Studies Hemant K Tiwari, Ph.D. Professor & Head Section on Statistical Genetics Department of Biostatistics School of Public
More informationC3BI. VARIANTS CALLING November Pierre Lechat Stéphane Descorps-Declère
C3BI VARIANTS CALLING November 2016 Pierre Lechat Stéphane Descorps-Declère General Workflow (GATK) software websites software bwa picard samtools GATK IGV tablet vcftools website http://bio-bwa.sourceforge.net/
More informationH3A - Genome-Wide Association testing SOP
H3A - Genome-Wide Association testing SOP Introduction File format Strand errors Sample quality control Marker quality control Batch effects Population stratification Association testing Replication Meta
More informationWhy can GBS be complicated? Tools for filtering & error correction. Edward Buckler USDA-ARS Cornell University
Why can GBS be complicated? Tools for filtering & error correction Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Maize has more molecular diversity than humans and apes combined
More informationGenome-Wide Association Studies (GWAS): Computational Them
Genome-Wide Association Studies (GWAS): Computational Themes and Caveats October 14, 2014 Many issues in Genomewide Association Studies We show that even for the simplest analysis, there is little consensus
More informationLecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2015
Lecture 3: Introduction to the PLINK Software Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2015 1 / 1 PLINK Overview PLINK is a free, open-source whole genome association analysis
More informationUnderstanding genetic association studies. Peter Kamerman
Understanding genetic association studies Peter Kamerman Outline CONCEPTS UNDERLYING GENETIC ASSOCIATION STUDIES Genetic concepts: - Underlying principals - Genetic variants - Linkage disequilibrium -
More informationLecture 3: Introduction to the PLINK Software. Summer Institute in Statistical Genetics 2017
Lecture 3: Introduction to the PLINK Software Instructors: Timothy Thornton and Michael Wu Summer Institute in Statistical Genetics 2017 1 / 20 PLINK Overview PLINK is a free, open-source whole genome
More informationDNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros
DNA Collection Data Quality Control Suzanne M. Leal Baylor College of Medicine sleal@bcm.edu Copyrighted S.M. Leal 2016 Blood samples For unlimited supply of DNA Transformed cell lines Buccal Swabs Small
More informationUsing the Association Workflow in Partek Genomics Suite
Using the Association Workflow in Partek Genomics Suite This user guide will illustrate the use of the Association workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within
More informationHuman SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1
Human SNP haplotypes Statistics 246, Spring 2002 Week 15, Lecture 1 Human single nucleotide polymorphisms The majority of human sequence variation is due to substitutions that have occurred once in the
More informationNovel Variant Discovery Tutorial
Novel Variant Discovery Tutorial Release 8.4.0 Golden Helix, Inc. August 12, 2015 Contents Requirements 2 Download Annotation Data Sources...................................... 2 1. Overview...................................................
More informationGenome-wide association studies (GWAS) Part 1
Genome-wide association studies (GWAS) Part 1 Matti Pirinen FIMM, University of Helsinki 03.12.2013, Kumpula Campus FIMM - Institiute for Molecular Medicine Finland www.fimm.fi Published Genome-Wide Associations
More informationSNP calling and VCF format
SNP calling and VCF format Laurent Falquet, Oct 12 SNP? What is this? A type of genetic variation, among others: Family of Single Nucleotide Aberrations Single Nucleotide Polymorphisms (SNPs) Single Nucleotide
More informationComparing a few SNP calling algorithms using low-coverage sequencing data
Yu and Sun BMC Bioinformatics 2013, 14:274 RESEARCH ARTICLE Open Access Comparing a few SNP calling algorithms using low-coverage sequencing data Xiaoqing Yu 1 and Shuying Sun 1,2* Abstract Background:
More informationGenotype quality control with plinkqc Hannah Meyer
Genotype quality control with plinkqc Hannah Meyer 219-3-1 Contents Introduction 1 Per-individual quality control....................................... 2 Per-marker quality control.........................................
More informationHaplotypes, linkage disequilibrium, and the HapMap
Haplotypes, linkage disequilibrium, and the HapMap Jeffrey Barrett Boulder, 2009 LD & HapMap Boulder, 2009 1 / 29 Outline 1 Haplotypes 2 Linkage disequilibrium 3 HapMap 4 Tag SNPs LD & HapMap Boulder,
More informationARTICLE High-Resolution Detection of Identity by Descent in Unrelated Individuals
ARTICLE High-Resolution Detection of Identity by Descent in Unrelated Individuals Sharon R. Browning 1,2, * and Brian L. Browning 1,2 Detection of recent identity by descent (IBD) in population samples
More informationWhy do we need statistics to study genetics and evolution?
Why do we need statistics to study genetics and evolution? 1. Mapping traits to the genome [Linkage maps (incl. QTLs), LOD] 2. Quantifying genetic basis of complex traits [Concordance, heritability] 3.
More informationOffice Hours. We will try to find a time
Office Hours We will try to find a time If you haven t done so yet, please mark times when you are available at: https://tinyurl.com/666-office-hours Thanks! Hardy Weinberg Equilibrium Biostatistics 666
More informationPLINK gplink Haploview
PLINK gplink Haploview Whole genome association software tutorial Shaun Purcell Center for Human Genetic Research, Massachusetts General Hospital, Boston, MA Broad Institute of Harvard & MIT, Cambridge,
More informationLet s call the recessive allele r and the dominant allele R. The allele and genotype frequencies in the next generation are:
Problem Set 8 Genetics 371 Winter 2010 1. In a population exhibiting Hardy-Weinberg equilibrium, 23% of the individuals are homozygous for a recessive character. What will the genotypic, phenotypic and
More informationGenotype Prediction with SVMs
Genotype Prediction with SVMs Nicholas Johnson December 12, 2008 1 Summary A tuned SVM appears competitive with the FastPhase HMM (Stephens and Scheet, 2006), which is the current state of the art in genotype
More informationEPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011
EPIB 668 Genetic association studies Aurélie LABBE - Winter 2011 1 / 71 OUTLINE Linkage vs association Linkage disequilibrium Case control studies Family-based association 2 / 71 RECAP ON GENETIC VARIANTS
More informationImproving the accuracy and efficiency of identity by descent detection in population
Genetics: Early Online, published on March 27, 2013 as 10.1534/genetics.113.150029 Improving the accuracy and efficiency of identity by descent detection in population data Brian L. Browning *,1 and Sharon
More informationWhat is genetic variation?
enetic Variation Applied Computational enomics, Lecture 05 https://github.com/quinlan-lab/applied-computational-genomics Aaron Quinlan Departments of Human enetics and Biomedical Informatics USTAR Center
More informationWhy can GBS be complicated? Tools for filtering, error correction and imputation.
Why can GBS be complicated? Tools for filtering, error correction and imputation. Edward Buckler USDA-ARS Cornell University http://www.maizegenetics.net Many Organisms Are Diverse Humans are at the lower
More informationPersonal Genomics Platform White Paper Last Updated November 15, Executive Summary
Executive Summary Helix is a personal genomics platform company with a simple but powerful mission: to empower every person to improve their life through DNA. Our platform includes saliva sample collection,
More informationCS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016
CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016 Topics Genetic variation Population structure Linkage disequilibrium Natural disease variants Genome Wide Association Studies Gene
More informationCUMACH - A Fast GPU-based Genotype Imputation Tool. Agatha Hu
CUMACH - A Fast GPU-based Genotype Imputation Tool Agatha Hu ahu@nvidia.com Term explanation Figure resource: http://en.wikipedia.org/wiki/genotype Allele: one of two or more forms of a gene or a genetic
More informationPopulation stratification. Background & PLINK practical
Population stratification Background & PLINK practical Variation between, within populations Any two humans differ ~0.1% of their genome (1 in ~1000bp) ~8% of this variation is accounted for by the major
More informationUsing the Trio Workflow in Partek Genomics Suite v6.6
Using the Trio Workflow in Partek Genomics Suite v6.6 This user guide will illustrate the use of the Trio/Duo workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within the
More informationHuman linkage analysis. fundamental concepts
Human linkage analysis fundamental concepts Genes and chromosomes Alelles of genes located on different chromosomes show independent assortment (Mendel s 2nd law) For 2 genes: 4 gamete classes with equal
More informationAnswers to additional linkage problems.
Spring 2013 Biology 321 Answers to Assignment Set 8 Chapter 4 http://fire.biol.wwu.edu/trent/trent/iga_10e_sm_chapter_04.pdf Answers to additional linkage problems. Problem -1 In this cell, there two copies
More informationHuman linkage analysis. fundamental concepts
Human linkage analysis fundamental concepts Genes and chromosomes Alelles of genes located on different chromosomes show independent assortment (Mendel s 2nd law) For 2 genes: 4 gamete classes with equal
More informationBICF Variant Analysis Tools. Using the BioHPC Workflow Launching Tool Astrocyte
BICF Variant Analysis Tools Using the BioHPC Workflow Launching Tool Astrocyte Prioritization of Variants SNP INDEL SV Astrocyte BioHPC Workflow Platform Allows groups to give easy-access to their analysis
More informationTHE HEALTH AND RETIREMENT STUDY: GENETIC DATA UPDATE
: GENETIC DATA UPDATE April 30, 2014 Biomarker Network Meeting PAA Jessica Faul, Ph.D., M.P.H. Health and Retirement Study Survey Research Center Institute for Social Research University of Michigan HRS
More informationFactors affecting statistical power in the detection of genetic association
Review series Factors affecting statistical power in the detection of genetic association Derek Gordon 1 and Stephen J. Finch 2 1 Laboratory of Statistical Genetics, Rockefeller University, New York, New
More informationSEGMENTS of indentity-by-descent (IBD) may be detected
INVESTIGATION Improving the Accuracy and Efficiency of Identity-by-Descent Detection in Population Data Brian L. Browning*,1 and Sharon R. Browning *Department of Medicine, Division of Medical Genetics,
More informationVariation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?
Frequency 10/6/2014 Variation Chapter 9 Some terms Genotype Allele form of a gene, distinguished by effect on phenotype Haplotype form of a gene, distinguished by DNA sequence Gene copy number of copies
More informationProstate Cancer Genetics: Today and tomorrow
Prostate Cancer Genetics: Today and tomorrow Henrik Grönberg Professor Cancer Epidemiology, Deputy Chair Department of Medical Epidemiology and Biostatistics ( MEB) Karolinska Institutet, Stockholm IMPACT-Atanta
More informationGeneral aspects of genome-wide association studies
General aspects of genome-wide association studies Abstract number 20201 Session 04 Correctly reporting statistical genetics results in the genomic era Pekka Uimari University of Helsinki Dept. of Agricultural
More informationAssignment 9: Genetic Variation
Assignment 9: Genetic Variation Due Date: Friday, March 30 th, 2018, 10 am In this assignment, you will profile genome variation information and attempt to answer biologically relevant questions. The variant
More informationMidterm 1 Results. Midterm 1 Akey/ Fields Median Number of Students. Exam Score
Midterm 1 Results 10 Midterm 1 Akey/ Fields Median - 69 8 Number of Students 6 4 2 0 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96 101 Exam Score Quick review of where we left off Parental type: the
More informationRV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test
RV-TDT: Rare Variant Extensions of the Transmission Disequilibrium Test Copyrighted 2018 Zongxiao He & Suzanne M. Leal Introduction Many population-based rare-variant association tests, which aggregate
More informationAssociation studies (Linkage disequilibrium)
Positional cloning: statistical approaches to gene mapping, i.e. locating genes on the genome Linkage analysis Association studies (Linkage disequilibrium) Linkage analysis Uses a genetic marker map (a
More informationAnalysis of genome-wide genotype data
Analysis of genome-wide genotype data Acknowledgement: Several slides based on a lecture course given by Jonathan Marchini & Chris Spencer, Cape Town 2007 Introduction & definitions - Allele: A version
More informationPrioritization: from vcf to finding the causative gene
Prioritization: from vcf to finding the causative gene vcf file making sense A vcf file from an exome sequencing project may easily contain 40-50 thousand variants. In order to optimize the search for
More informationB) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).
Homework questions. Please provide your answers on a separate sheet. Examine the following pedigree. A 1,2 B 1,2 A 1,3 B 1,3 A 1,2 B 1,2 A 1,2 B 1,3 1. (1 point) The A 1 alleles in the two brothers are
More informationExploring the Genetic Basis of Congenital Heart Defects
Exploring the Genetic Basis of Congenital Heart Defects Sanjay Siddhanti Jordan Hannel Vineeth Gangaram szsiddh@stanford.edu jfhannel@stanford.edu vineethg@stanford.edu 1 Introduction The Human Genome
More informationVariant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4
WHITE PAPER Oncomine Comprehensive Assay Variant calling workflow for the Oncomine Comprehensive Assay using Ion Reporter Software v4.4 Contents Scope and purpose of document...2 Content...2 How Torrent
More informationSummary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits
Summary for BIOSTAT/STAT551 Statistical Genetics II: Quantitative Traits Gained an understanding of the relationship between a TRAIT, GENETICS (single locus and multilocus) and ENVIRONMENT Theoretical
More informationGenetic Variation and Genome- Wide Association Studies. Keyan Salari, MD/PhD Candidate Department of Genetics
Genetic Variation and Genome- Wide Association Studies Keyan Salari, MD/PhD Candidate Department of Genetics How many of you did the readings before class? A. Yes, of course! B. Started, but didn t get
More informationGenetic data concepts and tests
Genetic data concepts and tests Cavan Reilly September 21, 2018 Table of contents Overview Linkage disequilibrium Quantifying LD Heatmap for LD Hardy-Weinberg equilibrium Genotyping errors Population substructure
More informationAn introduction to genetics and molecular biology
An introduction to genetics and molecular biology Cavan Reilly September 5, 2017 Table of contents Introduction to biology Some molecular biology Gene expression Mendelian genetics Some more molecular
More informationMONTE CARLO PEDIGREE DISEQUILIBRIUM TEST WITH MISSING DATA AND POPULATION STRUCTURE
MONTE CARLO PEDIGREE DISEQUILIBRIUM TEST WITH MISSING DATA AND POPULATION STRUCTURE DISSERTATION Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate
More informationCrash-course in genomics
Crash-course in genomics Molecular biology : How does the genome code for function? Genetics: How is the genome passed on from parent to child? Genetic variation: How does the genome change when it is
More informationQTL Mapping Using Multiple Markers Simultaneously
SCI-PUBLICATIONS Author Manuscript American Journal of Agricultural and Biological Science (3): 195-01, 007 ISSN 1557-4989 007 Science Publications QTL Mapping Using Multiple Markers Simultaneously D.
More informationMPG NGS workshop I: SNP calling
MPG NGS workshop I: SNP calling Mark DePristo Manager, Medical and Popula
More informationBioinformatics small variants Data Analysis. Guidelines. genomescan.nl
Next Generation Sequencing Bioinformatics small variants Data Analysis Guidelines genomescan.nl GenomeScan s Guidelines for Small Variant Analysis on NGS Data Using our own proprietary data analysis pipelines
More informationQuality Control Report for Exome Chip Data University of Michigan April, 2015
Quality Control Report for Exome Chip Data University of Michigan April, 2015 Project: Health and Retirement Study Support: U01AG009740 NIH Institute: NIA 1. Summary and recommendations for users A total
More informationRedefine what s possible with the Axiom Genotyping Solution
Redefine what s possible with the Axiom Genotyping Solution From discovery to translation on a single platform The Axiom Genotyping Solution enables enhanced genotyping studies to accelerate your research
More informationTopics in Statistical Genetics
Topics in Statistical Genetics INSIGHT Bioinformatics Webinar 2 August 22 nd 2018 Presented by Cavan Reilly, Ph.D. & Brad Sherman, M.S. 1 Recap of webinar 1 concepts DNA is used to make proteins and proteins
More informationWhole Genome Sequencing. Biostatistics 666
Whole Genome Sequencing Biostatistics 666 Genomewide Association Studies Survey 500,000 SNPs in a large sample An effective way to skim the genome and find common variants associated with a trait of interest
More informationUHT Sequencing Course Large-scale genotyping. Christian Iseli January 2009
UHT Sequencing Course Large-scale genotyping Christian Iseli January 2009 Overview Introduction Examples Base calling method and parameters Reads filtering Reads classification Detailed alignment Alignments
More informationSupplementary Note: Detecting population structure in rare variant data
Supplementary Note: Detecting population structure in rare variant data Inferring ancestry from genetic data is a common problem in both population and medical genetic studies, and many methods exist to
More informationChang Xu Mohammad R Nezami Ranjbar Zhong Wu John DiCarlo Yexun Wang
Supplementary Materials for: Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller Chang Xu Mohammad R Nezami Ranjbar Zhong Wu John
More informationHuman Genetic Variation. Ricardo Lebrón Dpto. Genética UGR
Human Genetic Variation Ricardo Lebrón rlebron@ugr.es Dpto. Genética UGR What is Genetic Variation? Origins of Genetic Variation Genetic Variation is the difference in DNA sequences between individuals.
More informationHuman Genetics and Gene Mapping of Complex Traits
Human Genetics and Gene Mapping of Complex Traits Advanced Genetics, Spring 2018 Human Genetics Series Thursday 4/5/18 Nancy L. Saccone, Ph.D. Dept of Genetics nlims@genetics.wustl.edu / 314-747-3263 What
More informationHaplotype phasing in large cohorts: Modeling, search, or both?
Haplotype phasing in large cohorts: Modeling, search, or both? Po-Ru Loh Harvard T.H. Chan School of Public Health Department of Epidemiology Broad MIA Seminar, 3/9/16 Overview Background: Haplotype phasing
More informationEnhanced Resolution and Statistical Power Through SNP Distributions Within the Short Tandem Repeats
Enhanced Resolution and Statistical Power Through SNP Distributions Within the Short Tandem Repeats John V. Planz, Ph.D. Associate Professor, Associate Director UNT Center for Human Identification UNT
More informationPopulation Genetics. If we closely examine the individuals of a population, there is almost always PHENOTYPIC
1 Population Genetics How Much Genetic Variation exists in Natural Populations? Phenotypic Variation If we closely examine the individuals of a population, there is almost always PHENOTYPIC VARIATION -
More informationUAB DNA-Seq Analysis Workshop. John Osborne Research Associate Centers for Clinical and Translational Science
+ UAB DNA-Seq Analysis Workshop John Osborne Research Associate Centers for Clinical and Translational Science ozborn@uab.,edu + Thanks in advance You are the Guinea pigs for this workshop! At this point
More informationModule 2: Introduction to PLINK and Quality Control
Module 2: Introduction to PLINK and Quality Control 1 Introduction to PLINK 2 Quality Control 1 Introduction to PLINK 2 Quality Control Single Nucleotide Polymorphism (SNP) A SNP (pronounced snip) is a
More informationARTICLE Haplotype Estimation Using Sequencing Reads
ARTICLE Haplotype Estimation Using Sequencing Reads Olivier Delaneau, 1 Bryan Howie, 2 Anthony J. Cox, 3 Jean-François Zagury, 4 and Jonathan Marchini 1,5, * High-throughput sequencing technologies produce
More informationApplication of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato
Application of Genotyping-By-Sequencing and Genome-Wide Association Analysis in Tetraploid Potato Sanjeev K Sharma Cell and Molecular Sciences The 3 rd Plant Genomics Congress, London 12 th May 2015 Potato
More informationb. (3 points) The expected frequencies of each blood type in the deme if mating is random with respect to variation at this locus.
NAME EXAM# 1 1. (15 points) Next to each unnumbered item in the left column place the number from the right column/bottom that best corresponds: 10 additive genetic variance 1) a hermaphroditic adult develops
More informationProf. Dr. Konstantin Strauch
Genetic Epidemiology and Personalized Medicine Prof. Dr. Konstantin Strauch IBE - Lehrstuhl für Genetische Epidemiologie Ludwig-Maximilians-Universität Institut für Genetische Epidemiologie Helmholtz-Zentrum
More informationPopulation and Statistical Genetics including Hardy-Weinberg Equilibrium (HWE) and Genetic Drift
Population and Statistical Genetics including Hardy-Weinberg Equilibrium (HWE) and Genetic Drift Heather J. Cordell Professor of Statistical Genetics Institute of Genetic Medicine Newcastle University,
More informationGenome wide association studies. How do we know there is genetics involved in the disease susceptibility?
Outline Genome wide association studies Helga Westerlind, PhD About GWAS/Complex diseases How to GWAS Imputation What is a genome wide association study? Why are we doing them? How do we know there is
More informationPUBH 8445: Lecture 1. Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota
PUBH 8445: Lecture 1 Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota saonli@umn.edu Statistical Genetics It can broadly be classified into three sub categories:
More informationAlgorithms for Genetics: Introduction, and sources of variation
Algorithms for Genetics: Introduction, and sources of variation Scribe: David Dean Instructor: Vineet Bafna 1 Terms Genotype: the genetic makeup of an individual. For example, we may refer to an individual
More informationIntroduction to Quantitative Genomics / Genetics
Introduction to Quantitative Genomics / Genetics BTRY 7210: Topics in Quantitative Genomics and Genetics September 10, 2008 Jason G. Mezey Outline History and Intuition. Statistical Framework. Current
More informationImplementing direct and indirect markers.
Chapter 16. Brian Kinghorn University of New England Some Definitions... 130 Directly and indirectly marked genes... 131 The potential commercial value of detected QTL... 132 Will the observed QTL effects
More informationDNA concentration and purity were initially measured by NanoDrop 2000 and verified on Qubit 2.0 Fluorometer.
DNA Preparation and QC Extraction DNA was extracted from whole blood or flash frozen post-mortem tissue using a DNA mini kit (QIAmp #51104 and QIAmp#51404, respectively) following the manufacturer s recommendations.
More informationStructure, Measurement & Analysis of Genetic Variation
Structure, Measurement & Analysis of Genetic Variation Sven Cichon, PhD Professor of Medical Genetics, Director, Division of Medcial Genetics, University of Basel Institute of Neuroscience and Medicine
More informationEstimation problems in high throughput SNP platforms
Estimation problems in high throughput SNP platforms Rob Scharpf Department of Biostatistics Johns Hopkins Bloomberg School of Public Health November, 8 Outline Introduction Introduction What is a SNP?
More informationGenetics and Psychiatric Disorders Lecture 1: Introduction
Genetics and Psychiatric Disorders Lecture 1: Introduction Amanda J. Myers LABORATORY OF FUNCTIONAL NEUROGENOMICS All slides available @: http://labs.med.miami.edu/myers Click on courses First two links
More informationPackage snpready. April 11, 2018
Version 0.9.6 Date 2018-04-11 Package snpready April 11, 2018 Title Preparing Genotypic Datasets in Order to Run Genomic Analysis Three functions to clean, summarize and prepare genomic datasets to Genome
More informationJean-Simon Brouard 1, Brian Boyle 2, Eveline M. Ibeagha-Awemu 1 and Nathalie Bissonnette 1*
Brouard et al. BMC Genetics (2017) 18:32 DOI 10.1186/s12863-017-0501-y RESEARCH ARTICLE Low-depth genotyping-by-sequencing (GBS) in a bovine population: strategies to maximize the selection of high quality
More informationUsing VarSeq to Improve Variant Analysis Research
Using VarSeq to Improve Variant Analysis Research June 10, 2015 G Bryce Christensen Director of Services Questions during the presentation Use the Questions pane in your GoToWebinar window Agenda 1 Variant
More informationLecture 23: Causes and Consequences of Linkage Disequilibrium. November 16, 2012
Lecture 23: Causes and Consequences of Linkage Disequilibrium November 16, 2012 Last Time Signatures of selection based on synonymous and nonsynonymous substitutions Multiple loci and independent segregation
More informationGoal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases
GCTA Practical 2 Goal: To use GCTA to estimate h 2 SNP from whole genome sequence data & understand how MAF/LD patterns influence biases GCTA practical: Real genotypes, simulated phenotypes Genotype Data
More informationBiology 445K Winter 2007 DNA Fingerprinting
Biology 445K Winter 2007 DNA Fingerprinting For Friday 3/9 lab: in your lab notebook write out (in bullet style NOT paragraph style) the steps for BOTH the check cell DNA prep and the hair follicle DNA
More informationBST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics
BST227 Introduction to Statistical Genetics Lecture 3: Introduction to population genetics!1 Housekeeping HW1 will be posted on course website tonight 1st lab will be on Wednesday TA office hours have
More information