Lab Retriever Manual. Applies to Lab Retriever version /5/2014

Size: px
Start display at page:

Download "Lab Retriever Manual. Applies to Lab Retriever version /5/2014"

Transcription

1 Proper use of this software assumes prior training and education in analyzing and exporting STR genotype data, formulating hypotheses, and the use of likelihood ratios. This manual does not provide, nor is it a substitute for, that training and education. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. This program is free software: you can redistribute it and/or modify it under the terms of the Creative Commons license. ; You are free to: Share copy and redistribute the material in any medium or format Adapt remix, transform, and build upon the material Under the following terms: Attribution You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. NonCommercial You may not use the material for commercial purposes. ShareAlike If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original. Before implementing for casework, users are advised to check results from this software using an independent calculation or software program. Credits: Based on the original work of: David Balding John Buckleton Research and development: Keith Inman Kirk Lohmueller Norah Rudin Programmers: Ken Cheng Luke Inman-Semerau Chris Robinson Adam Kirschner Data Assistance Allison Bricker 1

2 Introduction Lab Retriever is a program to calculate likelihood ratios that incorporates a probability of drop-out (P(D O )). It is based on R-code originally created by David Balding (Balding, D.J., Buckleton J., Interpreting low template DNA profile, Forensic Science International: Genetics 4 (2009). The code has been rewritten in C++ and a graphical user interface (GUI) added for ease of use. This manual provides basic instructions for using Lab Retriever. It assumes that the user has received the appropriate education and training to understand the underlying principles and is competent in their application. I. Determine an empirical threshold A. An integral part of calculating a likelihood ratio incorporating a P(D O ) is applying an empirical threshold to the data so that the information content is maximized. B. Many methods exist to estimate an empirical threshold. One easy procedure is to calculate 2X the maximum non-artifactual peak in 1 or more negative samples. The empirical threshold, by definition, can vary with any particular combination of hardware/software/chemistry. Based on our experience with current systems (e.g. 3130, Identifiler, Identifiler plus, PowerPlex 16), this result is frequently about 30 RFU. C. Re-analyze the evidence sample of interest, along with any associated negative controls at the new empirically determined threshold. In some instances you may wish to turn off the stutter filter. This is appropriate if a minor component is in the same peak height range as stutter peaks from the major component. Note: create a profile in your.csv input data file that contains peaks in stutter positions in the evidence sample that are: 1. Above empirical AT 2. Below stutter threshold 3. Also present in the suspected contributor II. Determine an empirical P(D O ) A. It is useful to derive an empirical estimate of the probability of drop-out P(D O ) for the sample or sub-sample peaks of interest. This makes a good starting point for your calculations. B. The most reliable approach to determining a P(D O ) is to use the empirical end product, that is, the peak heights of the sample or sub-sample peaks of interest as an indicator of the P(D O ). It is possible for each laboratory to determine a P(D O ) function from their internal validation data. Fortunately, the systems used in forensic DNA testing are highly standardized, thus a universal P(D O ) function derived from NIST data is a reasonable substitute. C. If you have not generated your own function, you can download a P(D O ) calculator based on NIST data at D. To calculate P(D O ) 1. Calculate the average peak height: a. Add the peak heights of either all the peaks in the profile or of the relevant component. 2

3 b. Divide by the number of peaks to obtain the average 2. Launch the Excel file that contains the P(D O ) calculator found at a. It will open on the first tab labeled Average RFU b. Input the average peak height in the indicated cell. c. Click return. d. Click on either the Identifiler or PowerPlex 16 tab to see the calculated P(D O ) III. i. Results are displayed for an AT of 50 RFU and 30 RFU ii. Use the value closest to your empirically determined AT e. Save as in the location of your choice to save your work. Prepare data for export from your genetic analysis program Note1: The following instructions are for GeneMapper ID (GMID). The data can be prepared and exported from any genetic analysis program than can export a.csv file. The following instructions should be easy to adapt to other programs. Note 2: Although Lab Retriever is fairly forgiving in terms of input format, preparing your data properly streamlines the process. As with any computer program GIGO; Lab Retriever will return a result for properly prepared data even if the data is incorrect. The user is responsible for confirming that the correct data is in the correct place in the input file. A. Export the data 1. Select and open the HID Table. 3

4 a. In Lab Retriever version it is no longer necessary to check Duplicate homozygous alleles prior to exporting your data. 2. Select the Genotypes tab. 3. Check: a. Sample File b. Sample Name c. Marker d. Allele e. Height i. Lab Retriever does not use the height value, but you should choose to extract this data to calculate an empirical threshold 4. Make sure you allow for a sufficient number of alleles to accommodate complex mixtures. Ten (10) is usually sufficient. IV. Export the data Note: Lab Retriever will accept exported files that include all samples, ladders, controls etc. However, it may be expedient to prepare a project containing only the samples of interest. This is not absolutely necessary, but generates a simplified list from which to choose later in Lab Retriever. The samples in a single project can reside in different run folders. Samples need not all reside in the same project. Multiple.csv files can later be imported into Lab Retriever A. If you do wish to simplify the project by removing ladders, controls, and other extraneous files: 1. Options exist to simply your data prior to import. You may either: a. Delete extraneous files in the genetic analysis program itself, or b. After export, delete extraneous files and shorten names in the.csv file (see below) 4

5 B. To export the sample from GMID: 1. Select the Genotypes tab. a. This is a good time to scan for and delete any missed OL peaks. 2. Under File, select Export Table. a. Under Export File As, select.csv b. Navigate to an appropriate folder to save your exported file and name it in a way that is meaningful to you. c. Click Export File. V. Import the data into Lab Retriever. A. Launch the Lab Retriever program 1. Both Windows and Mac versions are available and work in exactly the same way. 2. Input Case ID, Sample ID, and Analyst in a way that is meaningful to you. 5

6 3. Unless you have some particular reason to believe that the probability of drop-in P(D I ) is high, you may leave the default value of Alternatively, insert a laboratory-specific empirically-derived value. 4. Input the estimated P(D O ) a. The sample can be run multiple times with different P(D O )s. b. It is useful to check how the calculated LR responds to a range of values on either side of the empirically determined P(D O ) to understand the effect of a specific P(D O ) on the final LR. 5. You can select a single race (AA, Cauc, Hisp) or leave the default selection of All. 6. Input the desired value for F ST (or θ) a. This parameter intends to correct for population substructure that derives from distant ancestry. Formally, θ is a particular derivation of F ST, (Weir and Cockerham 1984 ) but the terms are considered approximately equivalent for this purpose. 7. The Identical by Decent (IBD) drop-down menu may be used when the denominator (H 2 ) is defined as a random close relative rather than a random unrelated person. a. IBD can be used when the numerator (H 1 ) contains only one suspected contributor. b. Similarly, only one contributor in the denominator can share alleles IBD with the suspected contributor (i.e. be a relative of the suspected contributor), even when multiple unknown contributors are present. For example, if 3 unknown contributors are specified for H2, one will be a relative (as specified by the IBD field) of the suspected contributor and two of them will be random individuals, unrelated to each other and unrelated to the suspected contributor. c. The 0, 1 and 2 next to the IBD fields specify the number of alleles Identical By Decent between the suspected contributor and the hypothesized alternate contributor. i. The default IBD value is for unrelated individuals. This is represented as a 1 in the 0 IBD field to indicate no alleles in the profile should be hypothesized to be IBD. ii. For example, a parent-child relationship would, by standard inheritance rules, share 1 allele IBD at each locus. So the values to input for this hypothesized relationship would be: 6

7 iii. Similarly for siblings, the standard inheritance rules would predict: 8. Choose H1 and H2. iv. A list of common relationships is provided in Appendix III. a. The total number of UNKs in H1 and H2 are allowed to differ. It is up to the analyst to choose supportable hypotheses. b. The number of Assumed contributors is not limited c. The total number of contributors in the sample is not limited d. The total number of contributors in H1 and H2 are allowed to differ. It is up to the analyst to choose supportable hypotheses. e. The number of Suspected contributors in the numerator (H1) may be either one or two. It is up to the analyst to choose supportable hypotheses. f. The maximum number of unknown contributors (UNK) in the denominator (H2) is Click on the Load a File button: a. Navigate to the location of the exported.csv file and select it. i. A confirmation of file upload will appear i. Currently Lab Retriever uses the collective set of 29 autosomal loci defined by the NIST 1036 population dataset. ( This includes the Globalfiler and PowerPlex Fusion loci or any subset of them. 7

8 ii. The loci reflected in Lab Retriever are determined by the loci typed in the evidence (detected) profile. a.) Please retain loci that were typed, but for which no result was obtained. Such loci contribute a small, but non-0 weight to the LR. ii. You can upload multiple.csv files and choose your samples from amongst them. The program holds all uploaded files in cache while it is open. b. Click on the plus sign in the Detected column and select the Evidence profile. The sample file name appears in the drop-down box and allows you to select between duplicate injections (if any) with the same sample name. i. It will populate the column with the Detected alleles. ii. It will also populate the Unattributed column. c. Select the Assumed donor (if any) i. If the Assumed donor profile is in the same.csv file as the evidence, click on the plus sign in the Assumed column and select the profile of an Assumed contributor. ii. If the Assumed donor is in a different.csv file, click Load File again, navigate to the file containing the Assumed donor profile, and select it. Then, click on the plus sign in the Assumed column and select the profile of an Assumed contributor. iii. Note that those alleles will be subtracted from the Unattributed column. iv. If there is more than one Assumed contributor, just click the plus sign again, and choose the appropriate sample. This will generate an additional column. Those alleles will also be subtracted from the Unattributed column. d. Select the Suspected donor i. If the Suspected donor profile is in the same.csv file as the evidence, click on the plus sign in the Suspected column and select the profile of a Suspected contributor. ii. If the Suspected donor is in a different.csv file, click Load File again, navigate to the file containing the Suspected donor profile, and select it. Then, click on the plus sign in the Suspected column and select the profile of an Suspected contributor. iii. If you wish to run two Suspected donors simultaneously, click on the plus sign in the Suspected column to bring up a second column in which a different Suspected profile may be chosen. Note: If you choose to run two Suspected donors simultaneously we recommend that you also run each Suspected donor separately to determine their relative Norah Rudin 10/6/14 7:08 PM Comment [1]: image 8

9 VI. Run Lab Retriever A. Click the Run! button. contributions to the final LR. Also consider running each separately and using one and or the other as an Assumed contributor if warranted. e. You can clear a column by hovering over the sample name to bring up a red remove button. You can then re-select a different profile for that column. f. Column header cells containing sample names can be edited by clicking in them and selecting the text to be deleted or changed. i. This can be useful to simplify long names 1. Grayed-out data as well as a gray progress bar and a blue cursor ball indicate that the program is running. 2. When the computations are complete, the output will appear on the right side of the screen. a. A sliding door obscures the Assumed and Suspected input columns. These can be revealed by clicking on the double arrow at the top of the Detected column. 9

10 3. Click the save button to save the results as a.csv file. a. The file name will auto-populate with the date, sample file and chosen hypotheses. You can edit the file name to your liking if you wish. b. Navigate to an appropriate location and save the file. c. Open the saved.csv file using Excel or spreadsheet of your choice to view the results file. 4. You can easily run the sample configuration with multiple P(Do)s by changing just this parameter and rerunning the calculation. 5. To remove all profiles and calculations and start over, click the Clear button 10

11 For technical support, or to offer comments or suggestions, please go to 11

12 Appendix I Selected References Schneider, P.M., Gill, P., Garracedo, A., Editorial on the recommendations of the DNA commission of the ISFG on the interpretation of mixtures, Forensic Science International 160, (2006) 89. Gill, P. et al. DNA commission of the International Society of Forensic Genetics: Recommendations on the interpretation of mixtures, Forensic Science International 160 (2006) Buckleton, J.S., Curran, J.M., Gill, P., Towards understanding the effect of uncertainty in the number of contributors to DNA stains, Forensic Science International: Genetics 1 (2007) Gill, P., Kirkham, A., Curran, J., LoComatioN: A software tool for the analysis of low copy number DNA profiles, Forensic Science International 166 (2007) Balding, D.J., Buckleton J., Interpreting low template DNA profile, Forensic Science International: Genetics 4 (2009) Tvedebrink, T., Eriksen, P.S., Mogensen, H.S., Morling, N., Estimating the probability of allelic drop-out of STR alleles in forensic genetics. Forensic Science International: Genetics 3 (2009) Perlin, M.W., Sinelnikov, A., An Information Gap in DNA Evidence Interpretation, PLOS one, (2009), 4(12) Gill, P., Buckleton, J., A universal strategy to interpret DNA profiles that does not require a definition of low-copy-number, Forensic Science International: Genetics 4 (2010) Haned, H., Forensim: An open-source initiative for the evaluation of statistical methods in forensic genetics, Forensic Science International: Genetics 5 (2011) Perlin, M.W., et al., Validating TrueAllele DNA Mixture Interpretation, J Forensic Sci, (2011), Vol. 56, No. 6 Haned, H., Egeland, T., Pontier, D., Pene, L., Gill, P., Estimating drop-out probabilities in forensic DNA samples: A simulation approach to evaluate different model, Forensic Science International: Genetics 5 (2011) Tvedebrink, T., Eriksen, P.S., Mogensen, H.S., Morling, N., Statistical model for degraded DNA samples and adjusted probabilities for allelic drop-out, Forensic Science International: Genetics 6 (2012) Carracedo., A., Schneider, P.M., Butler, J., Prinz, M., Focus issue Analysis and biostatistical interpretation of complex and low template DNA samples, Forensic Science International: Genetics 6 (2012) Gill et al., DNA commission of the International Society of Forensic Genetics: Recommendations on the evaluation of STR typing results that may include drop-out and/or drop-in using probabilistic methods, Forensic Science International: Genetics 6 (2012) Benschop, C.C.G., Assessment of mock cases involving complex low template DNA mixtures: A descriptive study, Forensic Science International: Genetics 6 (2012) Biedermann, A., Bozzza, S., Konis, K., Taroni, F., Inference about the number of contributors to a DNA mixture: Comparative analyses of a Bayesian network approach and the maximum allele count method, Forensic Science International: Genetics 6 (2012) Haned, H., Slooten, K., Gill, P., Exploratory data analysis for the interpretation of low template DNA mixtures, Forensic Science International: Genetics 6 (2012)

13 Bright, J-A., McManus, K., Harbison, S., Gill, P., Buckleton., A comparison of stochastic variation in mixed and unmixed casework and synthetic samples., Forensic Science International: Genetics 6 (2012) Kelly, H., Bright, J-A., Curran, J., Buckleton, J., The interpretation of low level DNA mixtures, Forensic Science International: Genetics 6 (2012) Mitchell, A. et al., Validation of a DNA mixture statistics tool incorporating allelic drop-out and drop-in, Forensic Science International: Genetics 6 (2012) Pfeifer, C.M., et al., Comparison of different interpretation strategies for low template DNA mixture, Forensic Science International: Genetics 6 (2012) Rakay, C.A., Bregu, J., Gricak, C.M., Maximizing allele detection: Effects of analytical threshold and DNA levels on rates of allele and locus drop-out, Forensic Science International: Genetics 6 (2012) Tvedebrink, T., et al., Allelic drop-out probabilities estimated by logistic regression Further considerations and practical implementation, Forensic Science International: Genetics 6 (2012) Lohmueller, K.E., Rudin, N., Calculating the Weight of Evidence in Low-Template Forensic DNA Casework, J Forensic Sci, 2012 Balding, D., likeltd (likelihoods for low-template DNA profiles) Westen, A.A., et al., Assessment of the stochastic threshold, back- and forward stutter filters and low template techniques for NGM, Forensic Science International: Genetics 6 (2012) Bright, J-A, Taylor, D., Curran, J.M., Buckleton, J.S., Developing allelic and stutter peak height models for a continuous method of DNA interpretation, Forensic Science International: Genetics 7 (2013) Taylor, D., Bright, J-A., Buckleton, J., The interpretation of single source and mixed DNA profiles, Forensic Science International: Genetics 7 (2013) Gill, P., Haned, H., A new methodological framework to interpret complex DNA profiles using likelihood ratios, Forensic Science International: Genetics 7 (2013) Balding, D.J., Evaluation of mixed-source, low-template DNA profiles in forensic science, PNAS early edition (2013) Lohmueller, K.E., Rudin, N., Inman, K., Analysis of allelic drop-out using the Identifiler and PowerPlex 16 forensic STR typing systems, Forensic Science International: Genetics 12 (2014) 1-11 Steele, C.D., and Balding, D.J., Statistical Evaluation of Forensic DNA Profile Evidence in Annu. Rev Stat. Appl : Timken, M.D., Klein, S.B., Buoncristiani, M.R., Stochastic sampling effects in STR typing; Implications for analysis and interpretation, Forensic Science International: Genetics 11 (2014) Weir B.S. and Cockerham, C. Clark, Estimating F-Statistics for the Analysis of Population Structure, Evolution, Vol. 38, No. 6 (Nov., 1984), pp Forensic DNA Evidence Interpretation, eds. Buckleton, J., Triggs, C.B., Walsh, S.J. (2005) pg

14 Appendix II Technical data 1. Lab Retriever is based on code derived from the equations published in: Balding, D.J., Buckleton J., Interpreting low template DNA profile, Forensic Science International: Genetics 4 (2009) Theta is a variable designed to compensate for population sub-structure. It is hard-coded at Alpha is variable designed to compensate for drop-out rates applied to homozygotes as compared with heterozygotes. It is hard-coded at For rare alleles, the frequency is implemented at a minimum 5/2N allele count, where N is the size of the database. 5. Allele frequency data is taken from the NIST 1036 U.S. Population Dataset, found at 14

15 Appendix III Buckleton et al