Using the Trio Workflow in Partek Genomics Suite v6.6

Similar documents
Using the Association Workflow in Partek Genomics Suite

Crash-course in genomics

Analyzing Affymetrix GeneChip SNP 6 Copy Number Data in Partek (Allele Specific)

Inference of Relationships in Population Data Using Identity-by-Descent and Identity-by-State

Office Hours. We will try to find a time

Runs of Homozygosity Analysis Tutorial

Gene-Level Analysis of Exon Array Data using Partek Genomics Suite 6.6

DNA Collection. Data Quality Control. Whole Genome Amplification. Whole Genome Amplification. Measure DNA concentrations. Pros

Implementation of MPS into a Specialist Casework Laboratory. David Ballard

Analysis of a Tiling Regulation Study in Partek Genomics Suite 6.6

Genotype quality control with plinkqc Hannah Meyer

Genetics of dairy production

Copy Number Analysis in Partek Genomics Suite 6.6

Analyzing Affymetrix GeneChip SNP 6 Copy Number Data in Partek

Maternal Grandsire Verification and Detection without Imputation

Shaare Zedek Medical Center (SZMC) Gaucher Clinic. Peripheral blood samples were collected from each

FORENSIC GENETICS. DNA in the cell FORENSIC GENETICS PERSONAL IDENTIFICATION KINSHIP ANALYSIS FORENSIC GENETICS. Sources of biological evidence

Linkage & Genetic Mapping in Eukaryotes. Ch. 6

The Lander-Green Algorithm. Biostatistics 666

SUPPLEMENTARY INFORMATION

Human linkage analysis. fundamental concepts

Oral Cleft Targeted Sequencing Project

Agilent CytoGenomics 2.0

Human linkage analysis. fundamental concepts

Genotype Prediction with SVMs

Southern hybridization technique

Report. General Equations for P t, P s, and the Power of the TDT and the Affected Sib-Pair Test. Ralph McGinnis

Association Mapping in Plants PLSC 731 Plant Molecular Genetics Phil McClean April, 2010

CS273B: Deep Learning in Genomics and Biomedicine. Recitation 1 30/9/2016

Genomic resources. for non-model systems

Quality Control Assessment in Genotyping Console

Release Notes. JMP Genomics. Version 3.1

What is genetic variation?

Detecting copy-neutral LOH in cancer using Agilent SurePrint G3 Cancer CGH+SNP Microarrays

ANNUAL REPORT SUMMARY FOR TESTING IN 2001 Prepared by the Parentage Testing Program Unit October 2002

Characterization of Allele-Specific Copy Number in Tumor Genomes

Uniparental disomy (UPD) analysis of chromosome 15

Chapter 9. Gene Interactions. As we learned in Chapter 3, Mendel reported that the pairs of loci he observed segregated independently

GENETICS - CLUTCH CH.20 QUANTITATIVE GENETICS.

Approaches for the Assessment of Chromosomal Alterations using Copy Number and Genotype Estimates

Estimation problems in high throughput SNP platforms

Variation Chapter 9 10/6/2014. Some terms. Variation in phenotype can be due to genes AND environment: Is variation genetic, environmental, or both?

Introduction to Genome Wide Association Studies 2014 Sydney Brenner Institute for Molecular Bioscience/Wits Bioinformatics Shaun Aron

A genome wide association study of metabolic traits in human urine

Our motivation for a NGS/MPS SNP panel

Part I: Predicting Genetic Outcomes

EPIB 668 Genetic association studies. Aurélie LABBE - Winter 2011

Quality Measures for CytoChip Microarrays

7-1. Read this exercise before you come to the laboratory. Review the lecture notes from October 15 (Hardy-Weinberg Equilibrium)

Using Mapmaker/QTL for QTL mapping

Book chapter appears in:

MEASURES OF GENETIC DIVERSITY

Single Tumor-Normal Pair Parent-Specific Copy Number Analysis

Introduction to Animal Breeding & Genomics

Use of marker information in PIGBLUP v5.20

Frequently asked questions

Quality Control Report for Exome Chip Data University of Michigan April, 2015

Potential of human genome sequencing. Paul Pharoah Reader in Cancer Epidemiology University of Cambridge

Human SNP haplotypes. Statistics 246, Spring 2002 Week 15, Lecture 1

ARTICLE High-Resolution Detection of Identity by Descent in Unrelated Individuals

Would expect variation to disappear Variation in traits persists (Example: freckles show up in unfreckled parents offspring!)

Genome-wide association studies (GWAS) Part 1

Genetics SI Mock Exam I 9/16/2013

Performance of the Newly Developed Non-Invasive Prenatal Multi- Gene Sequencing Screen

B) You can conclude that A 1 is identical by descent. Notice that A2 had to come from the father (and therefore, A1 is maternal in both cases).

Wu et al., Determination of genetic identity in therapeutic chimeric states. We used two approaches for identifying potentially suitable deletion loci

Population stratification. Background & PLINK practical

Introduction to Genome Wide Association Studies 2015 Sydney Brenner Institute for Molecular Bioscience Shaun Aron

Lab 1: A review of linear models

Chapter 1 Analysis of ChIP-Seq Data with Partek Genomics Suite 6.6

H3A - Genome-Wide Association testing SOP

GINDEL: Accurate genotype calling of insertions and deletions from low coverage population sequence reads

An introductory overview of the current state of statistical genetics

GBS Usage Cases: Non-model Organisms. Katie E. Hyma, PhD Bioinformatics Core Institute for Genomic Diversity Cornell University

Division of Biostatistics, NYU School of Medicine, New York University, 650 First Avenue, 5th Floor, New York, NY 10016, USA

Dr. Mallery Biology Workshop Fall Semester CELL REPRODUCTION and MENDELIAN GENETICS

The Clark Phase-able Sample Size Problem: Long-Range Phasing and Loss of Heterozygosity in GWAS

S G. Design and Analysis of Genetic Association Studies. ection. tatistical. enetics

Exam 1 Answers Biology 210 Sept. 20, 2006

Gen e e n t e i t c c V a V ri r abi b li l ty Biolo l gy g Lec e tur u e e 9 : 9 Gen e et e ic I n I her e itan a ce

1b. How do people differ genetically?

Analysis of genome-wide genotype data

Supplemental Figure Legends

Factors affecting statistical power in the detection of genetic association

Understanding genetic association studies. Peter Kamerman

Improving the accuracy and efficiency of identity by descent detection in population

Agilent s CGH Platform: Recent Expansions for Cytogenetic Research

PUBH 8445: Lecture 1. Saonli Basu, Ph.D. Division of Biostatistics School of Public Health University of Minnesota

Novel Variant Discovery Tutorial

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Papers for 11 September

D) Gene Interaction Takes Place When Genes at Multiple Loci Determine a Single Phenotype.

HARDY-WEINBERG EQUILIBRIUM

Estimation of Genetic Recombination Frequency with the Help of Logarithm Of Odds (LOD) Method

Mathematics of Forensic DNA Identification

BST227 Introduction to Statistical Genetics. Lecture 3: Introduction to population genetics

B I O I N F O R M A T I C S

A simple and rapid method for calculating identity-by-descent matrices using multiple markers

Non-Mendelian Inheritance

Transcription:

Using the Trio Workflow in Partek Genomics Suite v6.6 This user guide will illustrate the use of the Trio/Duo workflow in Partek Genomics Suite (PGS) and discuss the basic functions available within the workflow. Please note that this user guide focuses on microarray data, while the NGS duo/trio analysis is discussed in the DNA-seq documentation (available at Help > On-line Tutorials). The workflow requires files with genotype calls, such as Affymetrix.CHP or genotyping text files, a project file exported from Illumina Genome Studio or Illumina final report text file. For the purpose of this example 10 samples genotyped on Affymetrix SNP 6.0 arrays were chosen (Figure 1). Figure 1: Genotype calls imported into the Trio workflow. Sample Quality Control Following the sample import please follow the workflow to perform the quality control (QC) steps. Based on the results, you might decide to omit some samples from the downstream steps. The Sample QC option will invoke the sample_qc spreadsheet (Figure 2), which shows one sample per row. The rate of missing genotype calls for each sample is given in the Sample NC Rate column. The rate is determined by dividing the number of no-calls (NC) by the total number of genotypes in the sample and an unusually high number in this column indicates that the overall genotyping quality in the sample is poor. As a rough guide, one can tolerate a NC rate of up to 5%. The Sample Heterozygosity Rate can also be used as a quality indicator. As a rule of a thumb, you might want to reconsider the samples with the heterozygosity rate which falls out of the interval mean ± 3 standard deviations of all the samples. To calculate the mean and standard deviation for the heterozygosity rate of the samples, please use Start > Descriptive > Column Statistics and select mean and standard deviation from the list of available Candidate Measures (not shown).

Figure 2: Sample QC spreadsheet showing no-call (NC) rate and heterozygous rate for each sample Data Analysis The Data Analysis section of the workflow enables you to perform identity by state (IBS) analysis, duo analysis, and trio analysis. All three methods provide an insight in relationships in population data, and can be used, for instance to test paternity, detect relatives, or detect allele sharing. Generate Sample IBS The Generate Sample IBS option will create two spreadsheets: ibs_frequency and ibs_matrix. We shall first focus on the former one. The ibs_frequency spreadsheet (Figure 3) shows one pairwise comparison per row. The samples being compared are listed in the first column, and then (for filtering purposes) given separately in the next two columns. Figure 3: The ibs_frequency spreadsheet features one pairwise comparison of samples per row. Columns #4 #6 give the number of loci within particular IBS class. For a pair of individuals with genotype information, IBS can be observed at a given locus with three possible outcomes, depending on the number of alleles that the individuals have in common. If the alleles are labeled as A and B, the classification of IBS is as follows. IBS2: two alleles in common (AA/AA, AB/AB, or BB/BB)

IBS1: one allele in common (AA/AB, AB/BB) IBS0: no common alleles (AA/BB) The mean IBS state and its variance are found in columns #7 and #8, respectively. The calculation of the mean is: Total = IBS 0" + "IBS 1" + "IBS 2" IBS Mean = (0 * "IBS 0" + 1 * "IBS 1" + 2 * "IBS 2")/Total The variance is to get the sum of square of errors for each IBS and divided by total: sqerror_0 = ( IBS 0 IBS Mean ) 2 sqerror_1 = ( IBS 1 IBS Mean ) 2 sqerror_2 = ( IBS 2 IBS Mean ) 2 IBS Variance = ("IBS 0" * sqerror_0 + "IBS 1" * sqerror_1 + "IBS 2" * sqerror_2) / total For the analysis of genetic relatedness an IBS sub-class is of special interest, namely the concordant heterozygotes (i.e. AB/AB), labeled as IBD2* (column #9). The percent IBD2* (column #10) is the ratio of concordant heterozygotes to the sum of concordant heterozygotes plus discordant homozygotes (that is IBS2*/(IBS 2* + IBS 0). Column #11 (percent informative SNPs) gives the fraction of informative SNPs between the two individuals and is calculated using the following equation: (IBS0 + IBS2*) / Total. To ease the within-population comparisons you can use PGS to display a plot having percent IBS2* on the x-axis and percent informative SNPs on the y-axis (the plot is commonly referred to as an IBS2* plot). To invoke the plot, please select both percent IBS2* and percent informative SNPs columns, and select the Scatter Plot icon ( ). An example of the plot is shown in Figure 4. It can be used for fast identification of identical samples, siblings, and parent & child pairs. For an excellent discussion on the topic, please refer to Stevens EL et al. (2011).

Figure 4: IBS2* plot, used to interpret genetic relatedness. Each dot on the plot is one pairwise comparison of the samples. Columns #12 and #13 provide results of statistical tests which could also be used to assess the relatedness of the samples. If two individuals are from the same population, the observed proportion of IBS0 loci pairs to IBS2* pairs should be 1 : 2. The significant p-value of binomial population (#12) indicates fewer IBS2* loci than expected, in the other words that the individuals are from different populations. Column #13, conversely, shows p-value of the binomial relation (i.e. 1 p(binomial population)) and low p-value indicates the higher than expected number of IBS2* loci, i.e. that the two samples are more likely to be related. In addition, pairwise percent IBS2* comparisons of the samples are given in the ibs_matrix spreadsheet and shown in Figure 5. SNP Duo Figure 5: The ibs_matrix spreadsheet shows pairwise matrix of percent IBS2* values. The SNPDuo option of the workflow compares the genotypes of two individuals and states how many identical alleles do they have in common. The possibilities are: IBS2: two alleles in common (AA/AA, AB/AB, or BB/BB) IBS1: one allele in common (AA/AB, AB/BB) IBS0: no common alleles (AA/BB) If at least one individual has a no call at a given locus, PGS will indicate it by?. In the SNPDuo dialog please specify the pairs to be analyzed. That could be done in several ways (Figure 6). A column can be used to identify the potential family member (Family), you can manually specify which samples form a duo (Manual), or you can use columns from an existing spreadsheet (From Spreadsheet).

Figure 6: SNPDuo setup dialog. Paired samples need to be specified. PGS will then generate a SNPduo spreadsheet, with each duo shown in one row (an example of a single duo is shown in Figure 7). The samples are specified in three leftmost columns, while each of the remaining columns corresponds to one SNP. The codes 0, 1, 2, and? are used as explained at the beginning of this section. The results of SNP duo analysis can be visualized, and that will be discussed in the SNP Trio section. SNP Trio Figure 7: Result of SNP duo analysis. Each column represents one SNP while the cell entries show the number of alleles shared between the samples. SNP Trio function enables the analysis of blocks of uniparental inheritance, based on SNP calls of child-mother-father trios. The following scenarios are possible: NI (non informative) (0): e.g. child AA, mother AA, father AA, or child AA, mother AA, father AB, etc. BPI (biparental inheritance) (1): e.g. child AB, mother AA, father BB. UPI-P (paternal uniparental disomy) (2): e.g. child BB, mother AA, father AB (both B alleles inherited from the father) UPI-M (maternal uniparental disomy) (3): e.g. child BB, mother AB, father AA (both B alleles inherited from the mother) MI-S (single allele mendelian inconsisteny) (4): e.g. child AB, mother AA, father AA MI-D (double allele mendelian inconsistency) (5): e.g. child AA, mother BB, father BB. Upon invocation of the SNP Trio function, a dialog appears which is used to specify the trios (Figure 8). It could be done by either specifying the columns with mothers and fathers (Trios tab) or by manually selecting samples (Manual tab). Figure 8: SNPDuo setup dialog. Family trios need to be specified.

PGS will then generate a SNPtrio spreadsheet, with each trio shown in one row (an example of a single trio is shown in Figure 9. The samples are specified in three leftmost columns, while each of the remaining columns corresponds to one SNP. The codes 0 5 are used as explained at the beginning of this section. Figure 9: Result of SNP trio analysis. Each column represents one SNP while the cell entries show the analysis of uniparental inheritance. Furthermore, the results of trio analysis can be visualized in the chromosome view (in the Visualization section of the workflow). The central part of the plot is the SNPTrio track, as illustrated in Figure 10. Each dot on the plot shows a SNP, while the possible inheritance scenarios are given on the y-axis. Figure 10: SNPTrio track of the chromosome view. Each dot represents one SNP. Various modes of uniparental inheritance are shown on the y-axis. End of User Guide This is the end of the user guide. If you need additional assistance, please call our technical support staff at +1-314-878-2329 or email support@partek.com. References Stevens EL et al. Inference of Relationships in Population Data Using Identity-by- Descent and Identity-by-State. PLoS Genetics 2011;7: e1002287

Note Last revision: January 2012