You use the UCSC Genome Browser (www.genome.ucsc.edu) to assess the exonintron structure of each gene. You use four tracks to show each gene:

Similar documents
Genome edi3ng with the CRISPR-Cas9 system

Investigating Inherited Diseases

Hands-On Four Investigating Inherited Diseases

CRISPR/Cas9 Genome Editing: Transfection Methods

MODULE 1: INTRODUCTION TO THE GENOME BROWSER: WHAT IS A GENE?

Nature Genetics: doi: /ng Supplementary Figure 1

SUPPLEMENTAL MATERIALS

Supplementary Information. Optimization of the production of knock-in alleles by CRISPR/Cas9 microinjection into the mouse zygote

Using CRISPR for genetic alteration

Improving CRISPR-Cas9 Gene Knockout with a Validated Guide RNA Algorithm

Genome editing. Knock-ins

Easi CRISPR for conditional and insertional alleles

Interpretation of sequence results

Week 1 BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

Supplementary Figures and Figure legends

Nature Biotechnology: doi: /nbt.4166

Transfection of CRISPR/Cas9 Nuclease NLS ribonucleoprotein (RNP) into adherent mammalian cells using Lipofectamine RNAiMAX

Genomic Annotation Lab Exercise By Jacob Jipp and Marian Kaehler Luther College, Department of Biology Genomics Education Partnership 2010

Genome-wide genetic screening with chemically-mutagenized haploid embryonic stem cells

SUPPLEMENTARY INFORMATION

Generation of App knock-in mice reveals deletion mutations protective against Alzheimer s. disease-like pathology. Nagata et al.

Genome Engineering with ZFNs, TALENs and CRISPR/Cas9

Annotating Fosmid 14p24 of D. Virilis chromosome 4

MODULE 5: TRANSLATION

Designing TaqMan MGB Probe and Primer Sets for Gene Expression Using Primer Express Software Version 2.0

Solutions to Quiz II

From Variants to Pathways: Agilent GeneSpring GX s Variant Analysis Workflow

CRISPR/Cas9 Gene Editing Tools

user s guide Question 1

CRISPR/Cas9 Gene Editing Tools

Figure 1. FasterDB SEARCH PAGE corresponding to human WNK1 gene. In the search page, gene searching, in the mouse or human genome, can be done: 1- By

Annotation of contig27 in the Muller F Element of D. elegans. Contig27 is a 60,000 bp region located in the Muller F element of the D. elegans.

Unit 6 DNA ppt 3 Gene Expression and Mutations Chapter 8.6 & 8.7 pg

Construct Design and Cloning Guide for Cas9-triggered homologous recombination

A Guide to CRISPR/Cas9

User Instructions:Transfection-ready CRISPR/Cas9 Reagents. Target DNA. NHEJ repair pathway. Nucleotide deletion. Nucleotide insertion Gene disruption

Single Nucleotide Variant Analysis. H3ABioNet May 14, 2014

Supplementary Figure 1 Activities of ABEs using extended sgrnas in HEK293T cells.

Supplementary Materials. China

BIOSTAT516 Statistical Methods in Genetic Epidemiology Autumn 2005 Handout1, prepared by Kathleen Kerr and Stephanie Monks

GENETICS - CLUTCH CH.15 GENOMES AND GENOMICS.

SUPPLEMENTARY INFORMATION

Materials Protein synthesis kit. This kit consists of 24 amino acids, 24 transfer RNAs, four messenger RNAs and one ribosome (see below).

Introducing Your Students To Gene Editing With CRISPR

Supplementary Materials

Applicazioni biotecnologiche

Gene mutation and DNA polymorphism

Nature Biotechnology: doi: /nbt Supplementary Figure 1. In vitro validation of OTC sgrnas and donor template.

Annotation Walkthrough Workshop BIO 173/273 Genomics and Bioinformatics Spring 2013 Developed by Justin R. DiAngelo at Hofstra University

BS 50 Genetics and Genomics Week of Oct 24

Disease and selection in the human genome 3

Guide-it sgrna In Vitro Transcription and Screening Systems User Manual

Applications of Cas9 nickases for genome engineering

A) (5 points) As the starting step isolate genomic DNA from

Protocol for cloning SEC-based repair templates using Gibson assembly and ccdb negative selection

HC70AL SUMMER 2014 PROFESSOR BOB GOLDBERG Gene Annotation Worksheet

Getting started with CRISPR: A review of gene knockout and homology-directed repair

Supplementary Information

Understanding Genes & Mutations. John A Phillips III May 16, 2005

CRISPR RNA-guided activation of endogenous human genes

Year III Pharm.D Dr. V. Chitra

Genomics and Gene Recognition Genes and Blue Genes

Introduction to CRISPR/Cas9 Background DNA Cleavage and Repair (NHEJ and HDR) Alternative Cas9 Variants Delivery of Cas9 and sgrna Library Products

MODULE TSS1: TRANSCRIPTION START SITES INTRODUCTION (BASIC)

Annotating 7G24-63 Justin Richner May 4, Figure 1: Map of my sequence

XactEdit Cas9 Nuclease with NLS User Manual

Supporting Information

BCHM 6280 Tutorial: Gene specific information using NCBI, Ensembl and genome viewers

PLNT2530 (2018) Unit 9. Genome Editing

Sequence Annotation & Designing Gene-specific qpcr Primers (computational)

Question 2: There are 5 retroelements (2 LINEs and 3 LTRs), 6 unclassified elements (XDMR and XDMR_DM), and 7 satellite sequences.

Mutations. Lecture 15

30 Gene expression: Transcription

Application Note: Generating GFP-Tagged Human CD81 Tetraspanin Protein Using SBI s PrecisionX SmartNuclease System And HR Tagging Vectors

ksierzputowska.com Research Title: Using novel TALEN technology to engineer precise mutations in the genome of D. melanogaster

TRANSGENIC ANIMALS. -transient transfection of cells -stable transfection of cells. - Two methods to produce transgenic animals:

Experimental genetics - 2 Partha Roy

The Nature of Genes. The Nature of Genes. Genes and How They Work. Chapter 15/16

Lecture 10, 20/2/2002: The process of solution development - The CODEHOP strategy for automatic design of consensus-degenerate primers for PCR

CRISPR Design Considera1ons

The ENCODE Encyclopedia. & Variant Annotation Using RegulomeDB and HaploReg

Biol 321 Spring 2013 Quiz 4 25 pts NAME

Surrogate reporter-based enrichment of cells containing RNA-guided Cas9 nucleaseinduced

i-stop codon positions in the mcherry gene

(i) A trp1 mutant cell took up a plasmid containing the wild type TRP1 gene, which allowed that cell to multiply and form a colony

Molecular Genetics of Disease and the Human Genome Project

Go to Bottom Left click WashU Epigenome Browser. Click

Sequence Alignments. Week 3

Exam 2 CSS/Hort 430/

Chapter 7: Genetics Lesson 7.1: From DNA to Proteins

dsxf - wt wt dsxf bp 2493bp Nature Biotechnology: doi: /nbt.4245 Supplementary Figure 1

What would this eye color phenomenon be called?

Supplementary Information

GENOME 371, Problem Set 6

(a) (3 points) Which of these plants (use number) show e/e pattern? Which show E/E Pattern and which showed heterozygous e/e pattern?

Supplementary Figure 1 An overview of pirna biogenesis during fetal mouse reprogramming. (a) (b)

Aaditya Khatri. Abstract

Guide-it sgrna In Vitro Transcription and Screening Systems User Manual

Amplicons, Heteroduplexes and Enzymes - Proper Processing Elevates Detection of CRISPR Gene Editing Events

ORFs and genes. Please sit in row K or forward

Transcription:

CRISPR-Cas9 genome editing Part 1: You would like to rapidly generate two different knockout mice using CRISPR-Cas9. The genes to be knocked out are Pcsk9 and Apoc3, both involved in lipid metabolism. In each case, you would like to take advantage of non-homologous end joining (NHEJ) to introduce frameshift mutations into the coding sequence of the gene. You begin by choosing the gene exons within which to introduce mutations. You use the UCSC Genome Browser (www.genome.ucsc.edu) to assess the exonintron structure of each gene. You use four tracks to show each gene: (1) UCSC Genes (2) Ensembl Genes (3) RefSeq Genes (4) Other RefSeq Genes (this shows orthologs from other species) Introns are portrayed as lines with arrows showing the direction of transcription. Exons are portrayed as bars. Within the exons, the thick bars indicate coding sequence, and the thin bars indicating noncoding sequence (i.e., 5' UTR and 3' UTR). You note that in the mouse genome, both genes are transcribed from the "minus" strand (i.e., in the reverse direction). Pcsk9 exon/intron structure The first two rows and the last row show different annotated transcripts of the mouse Pcsk9 gene. The other rows show transcripts of Pcsk9 orthologs from other species. In general, most of the transcripts share the same exons, as well as the thicker portions of the exons, which are the coding sequence. (For an example of a non-coding part of an exon, look at exon 1 in the first transcript it has a thicker portion that is coding sequence and a thinner portion that is the 5 untranslated region.) Exon 1 is shared by almost all of the transcripts, including all of the mouse transcripts, so it would be reasonable to target exon 1 with CRISPR-Cas9. Exon 2 is shared by even more transcripts, so it would also be reasonable to target exon 2. Many of the later exons are also shared by all of the transcripts, but the closer one targets to the end of the gene, the more likely it is that a functional portion of the protein will be produced and thus not be a true knockout.

Apoc3 exon/intron structure In this case, the first eight rows and the last four rows show different annotated transcripts of the mouse Apoc3 gene. The other rows show transcripts of Apoc3 orthologs from other species. In contrast to Pcsk9, there is significant heterogeneity among these transcripts. Within exon 1, which is not shared by many of the transcripts, there is no coding sequence (only thin bars). While exon 2 does have coding sequence (mostly thick bar), it is only shared by three transcripts. Almost all transcripts share exon 3, which is mostly coding sequence, so it would be reasonable to target the thick part of exon 3 with CRISPR-Cas9. Arguably exon 4 would be safer, since 100% of the transcripts share exon 4, but exon 4 is also the next-to-last exon, so there is the risk of targeting too late in the gene. You see that for at least one of the genes, there are several different annotated transcripts. This raises the possibility that putting a frameshift mutation into the "first" exon of the longest transcript may not actually knock out the gene, since it would not affect all of the transcripts. A similar situation could arise if you targeted an alternatively spliced exon. At the same time, you want to target as early as possible in the coding sequence so that a frameshift mutation will truncate/disrupt as much of the protein as possible. A major point is that there is often no unambiguously correct answer. One has to use one s best judgment and hope for a clean knockout. 1. Which exon (counting from right to left, since the gene is transcribed in the reverse direction) would you target in Pcsk9 to ensure that you knock out the gene? [Use the exon numbering at the bottom of the figure. You may have to zoom in on the picture with your browser to see all of the details.] As related above, 1 or 2 would be reasonable. 2. Which exon (counting from right to left, since the gene is transcribed in the reverse direction) would you target in Apoc3 to ensure that you knock out the gene? [Use the exon numbering at the bottom of the figure. You may have to zoom in on the

picture with your browser to see all of the details.] As related above, 3 is perhaps the best answer, though an argument can be made for 4. Part 2: You choose an exon in Pcsk9 to target. The first 250 basepairs of the coding sequence (in the sense direction) within the exon is: ATGTCCTTCCCGAGGCCGCGCGCACCTCTCCTCGCCCCGATGGGCACCCACTGCT CTGCGTGGCTGCGGTGGCCGCTGTTGCCGCTGTTGCCGCCGCTGCTGCTGCTGT TGCTGCTACTGTGCCCCACCGGCGCTGGTGCCCAGGACGAGGATGGAGATTATG AAGAGCTGATGCTCGCCCTCCCGTCCCAGGAGGATGGCCTGGCTGATGAGGCCG CACATGTGGCCACCGCCACCTTCCGCCGTTGCT (The entire exon is 255 basepairs in length.) This is actually the first 250 basepairs of the coding portion of exon 1 of Pcsk9. To search for potential CRISPR guide RNAs within this sequence, go to crispr.mit.edu. This server will identify guide RNAs with protospacers adjacent to an appropriate PAM (NGG), as well as calculate an off-target score. Enter a search name of your choosing, an email address to which to send a link to the results, click on "other region", click on "mouse (mm9)", and copy and paste the 250-bp sequence above into the "sequence" box. Note that the maximum allowed length of sequence with the CRISPR Design server is 250-bp. It will take a while for the server to return the full results. It will immediately show you a Results page with two choices: "Guides & offtargets" and "Nickase analysis". Click on "Guides & offtargets". The server will instantly show you the full list of possible guide RNAs (protospacer in black, PAM in green), as well as their relative positions within the exotic sequence you entered. It will take longer to calculate off-target "quality scores" for all of the guide RNAs. Refresh the page periodically to see an updated list. See appendix at the end of this document to see the output of the CRISPR Design server for the Pcsk9 sequence. The guides highlighted by a green bar to the left are considered to be good with respect to the off-target effects. Yellow is mediocre; red is poor. As one moves one s mouse over each of the guides, the top 20 predicted offtarget sites are shown to the right. The number of mismatches and genomic location of each predicted off-target site are indicated. The off-target sites in coding regions are indicated in the column UCSC gene. Choosing guide RNAs remains more of an art than a science. Considerations are:

(1) picking guide RNAs with very high off-target "quality scores" that predict (but do NOT guarantee) fewer off-target mutations (2) picking guide RNAs based on position in the exon, i.e., you may want to aim for the middle of the exon rather than either end of the exon, so that you do not disrupt splicing and cause unintended consequences (3) avoiding guide RNAs that are too GC-rich (which tend to have more off-target effects) or too many of the same base in a row (especially multiple Ts, as 5 to 6 Ts is enough to prematurely terminate transcription of the guide RNA) 3. Based on the results returned by the CRISPR Design server, choose four 20-bp protospacer sequences and enter them below. [If the server is taking too long to return the results, you can go on to the next page and come back to finish this page later.] There is no correct answer. This is largely a judgment call. The top four scoring protospacers, according to the off-target analysis of the CRISPR Design server, are: CTACTGTGCCCCACCGGCGC AGTGGGTGCCCATCGGGGCG TCCTGGGCACCAGCGCCGGT AGGAGAGGTGCGCGCGGCCT but these are by no means the only choices. Part 3: You obtain vectors that express Cas9 and each guide RNA, transfect them into a cultured mouse cell line (NIH 3T3 cells), wait several days, and then harvest the genomic DNA from the cells. For each guide RNA condition, you use PCR to amplify the region around the target site and then perform a CEL nuclease-based assay to determined the efficacy of mutagenesis (the more on-target genomic mutations are present in the cells, the more cleavage of the PCR amplicon will be observed, with the cleavage products being of predictable sizes based on the position of the target site within the amplicon). Below are the results from two of the guide RNAs you tested, on an agarose gel. The lanes are: (1) DNA ladder (2) control cells (3) control cells (4) cells that received the first guide RNA (5) cells that received the second guide RNA

The arrows indicate the two cleavage products. You choose the first guide RNA shown above, since the amount of cleavage products relative to the amount of uncleaved PCR amplicon is higher. (The more cleavage, the higher the proportion of mutant alleles.) Below is the target sequence for the CRISPR guide RNA (protospacer + PAM) aligned to the gene (which is portrayed as being transcribed in the forward direction). The arrow indicates the predicted CRISPR-Cas9 cleavage site. You arrange to have Cas9 mrna and in vitro transcribed guide RNA injected into single-cell mouse embryos. Three weeks after implantation of the embryos into surrogate mothers, you obtain several litters of pups. You prepare genomic DNA from tail samples, perform PCR to amplify the region around the target site, and then arrange for the PCR amplicons to undergo Sanger sequencing. From the first litter of mice, most of the electropherograms show wild-type sequences, an example of which is shown below. However, you obtain interesting electropherograms for two of the mice, shown below. Looking closely at the electropherograms from the two mice, you realize that they are each heterozygous for a wild-type allele and a mutant allele. The arrows show where the sequence traces diverge, after which there are overlapping peaks reflecting contributions from both alleles. The reason for the divergence is that the mutant alleles have small insertions/deletions (indels). By closely comparing each of the mouse's electropherograms with a wild-type electropherogram, you can figure out the sizes of the indel in the mutant alleles.

4. For mouse #1, is there an insertion or deletion in the mutant allele, and how many basepairs in size is the indel? Wild-type electropherogram Electropherogram from mouse #1 The trick to identifying the size of the indel is to closely compare the traces where they start to diverge (the location of the arrow). Before the arrow, the three bases are blackblue-green (G-C-A). For the next base, the normal (wild-type) base is blue (C). If one looks closely at the overlapping peaks for the first base after the arrow in the mouse #1 electropherogram, there are blue (C) and red (T) peaks. Since the wild-type allele is blue (C), the mutant allele must be red (T). Going to the next base, the overlapping peaks are green (A) and black (G). The base in the wild-type trace is green (A), so the mutant allele must be black (G). Going to the next base, there is only one peak, red (T). This means that both the wild-type and mutant alleles are red (T). Continuing this analysis, the mutant allele after the arrow reads red-green-red-black-black-blue (T-G-T- G-G-C). One can take this mutant allele sequence and ask if it is found in the wild-type sequence. It is! However, it is shifted by two bases from the location of the arrow the wild-type sequence from that location reads: C-A-T-G-T-G-G-C. This means that the mutant allele has the C-A that starts the wild-type sequence deleted, so that it instead starts T-G-T-G-G-C. One can conclude that the mutant allele has a 2-basepair deletion.

5. For mouse #2, is there an insertion or deletion in the mutant allele, and how many basepairs in size is the indel? Wild-type electropherogram Electropherogram from mouse #2 This can be determined in the same way as above. The mutant allele, from the location of the arrow, reads: T-G-G-C-C-A-C-C. The wild-type allele reads: A-G-G-C-C-G-C-A-C- A-T-G-T-G-G-C-C-A-C-C. Thus, there is a 12-basepair deletion. 6. Of the two mice, which has the indel that is more likely to result in knockout of the gene? - Mouse #1 - Mouse #2 - Both indels are equally likely to result in gene knockout Mouse #1, because a 2-basepair deletion represents a frameshift; the 12-basepair deletion in mouse #2 would only represent an in-frame deletion of 4 codons, which would cleanly remove 4 amino acids from the protein, which may not knock out its function.

Part 4: You decide that, in addition to making a Pcsk9 knockout mouse, you want to make a knock-in mouse model with the D374Y gain-of-function mutation associated with familial hypercholesterolemia. In each case, you would like to take advantage of homologydirected repair (HDR), using a single-strand DNA oligonucleotide carrying the mutation as the repair template with which to introduce the mutation into the mouse genome. You identify 240 basepairs of genomic sequence surrounding the codon for D374 in the mouse ortholog of Pcsk9: TCACAGTCGGGGCCACGAATGCCCAGGACCAGCCAGTTACCTTGGGGACTTTGG GGACTAATTTTGGACGCTGTGTGGATCTCTTTGCCCCCGGGAAGGACATCATCGG AGCGTCCAGTgacTGCAGCACATGCTTCATGTCACAGAGTGGGACCTCACAGGCTG CTGCCCACGTGGCCGgtgagtcaccaccccaccatcatcctgccaccatagccctttgacaggcagcagggt ctg The codon is shown in lower-case letters in the middle of the sequence (gac). Otherwise, upper-case letters indicate exonic sequence and lower-case letters indicate intronic sequence. The first order of business is to design a guide RNA. Ideally, you want the guide RNA to result in CRISPR-Cas9 cleaving the DNA very close to the site of the mutation you want to introduce (within a few basepairs). As you look closely at the genomic sequence above, there is only one protospacer + PAM sequence that fits this criterion. Keep in mind that the protospacer + PAM can be either on the sense strand or the antisense strand. If on the sense strand, the PAM is NGG. If on the antisense strand, the PAM as read on the sense strand will be the reverse-complement of NGG, or CCN. 7. What is the protospacer sequence for the single guide RNA that would cleave near the site of the desired mutation? [Make sure it is written in the right orientation, i.e., exactly as it needs to be incorporated into the guide RNA.] The sequence immediately around the codon to be targeted is: ACATCATCGGAGCGTCCAGTgacTGCAGCACATGCTTCATGTC There is only one NGG (PAM on forward strand) or CCN (PAM on reverse strand) near the codon: it is shown in bold above. The protospacer (on reverse strand) is underlined. (The codon is in lower-case letters.) To properly read the protospacer, one must use the reverse-complement of the underlined sequence: GAAGCATGTGCTGCAgtcAC (the adjacent PAM would be TGG) The next order of business is to design the single-strand DNA oligonucleotide. To introduce the D374Y mutation, use the table below to decide which mutation to

incorporate into the oligonucleotide. (Note: in the table below, the RNA base U is used instead of the DNA base T.) 8. Since the original codon is GAC, into which codon would you change it to create the D374Y mutation, if you are only allowed to change one base? The only codon that encodes Y (tyrosine, Tyr) and is different by one base from GAC is TAC. Now that you have designed the single-strand DNA oligonucleotide, you plan to arrange to have Cas9 mrna, in vitro transcribed guide RNA, and the synthesized single-strand DNA oligonucleotide injected into single-cell mouse embryos. In principle, two types of mutations could happen: (1) CRISPR-Cas9 cleaves at the target site, and HDR with the single-strand DNA oligonucleotide results in knock-in of the D374Y mutation. This is the desired outcome. (2) CRISPR-Cas9 cleaves at the target site, and NHEJ occurs, resulting in an indel of unpredictable size that knocks out or otherwise disrupts the gene. This is an undesired outcome (unless, possibly, you are trying to simultaneously make a Pcsk9 knockout mouse).

Because NHEJ occurs at significantly higher frequency than HDR, (2) is more likely to occur than (1), but you hope to obtain at least one mouse in which (1) has occurred on at least one Pcsk9 allele. As you review your strategy, a horrible thought occurs to you--what if (1) happens, as desired, but within the same embryo CRISPR-Cas9 then re-cleaves the knock-in allele and results in (2), thereby disrupting the knock-in allele? You might end up with no usable knock-in mice. 9. Does the protospacer sequence overlap the site of the D374Y mutation? If so, does the single-nucleotide mismatch (between the knock-in mutation introduced into the genome and the protospacer sequence in the guide RNA) guarantee that re-cleavage of the knock-in mutant allele will not occur? Yes, the protospacer includes the targeted codon (in lower-case letters above). This will likely reduce the rate at which CRISPR-Cas9 will retarget the mutant sequence. However, a single mismatch may not be enough to ensure that CRISPR-Cas9 will never retarget the sequence. You remain concerned that CRISPR-Cas9 will re-cleave the knock-in mutant allele in the genome, reducing your chances of getting a clean knock-in mouse. You decide to put an extra mutation into the single-strand DNA oligonucleotide so that both the D374Y mutation and the extra mutation are incorporated into the genome, ensuring enough mismatching between the mutant allele and the protospacer so that re-cleavage of the mutant allele does not occur. You decide the best strategy is introduce the extra mutation into the PAM sequence (i.e., change one of the Gs in the NGG sequence), which should definitely disrupt CRISPR- Cas9 action. However, you do not want to change the amino acid sequence, beyond the desired D374Y mutation. Thus, the PAM mutation needs to be a synonymous mutation. Here is the 27-nucleotide sequence spanning the D374 codon (gac), grouped by codon: GGA GCG TCC AGT gac TGC AGC ACA TGC 10. Which two base changes could you introduce that would result in both the desired D374Y mutation as well as a synonymous change that nevertheless disrupts the PAM? Retype the entire 27-nucleotide sequence above with the two changed bases. There are multiple possibilities. One possibility (with the changes in bold and underlined): GGA GCG TCG AGT Tac TGC AGC ACA TGC

Now you are satisfied that you have an optimal targeting strategy. You design a complete 200-nucleotide single-strand DNA oligonucleotide with the two changed bases. You place the two mutations at the center of the oligonucleotide, so that you have homology arms of equal size flanking the mutations. 11. Using cut-and-paste from the 240-nucleotide sequence at the top of this page, type in the full 200-nucleotide single-strand DNA oligonucleotide sequence (incorporating the two mutations) below. There are different possibilities, depending on where one chooses the boundaries of the 200-nucleotide sequence. One possibility: GCCCAGGACCAGCCAGTTACCTTGGGGACTTTGGGGACTAATTTTGGACGCTGTG TGGATCTCTTTGCCCCCGGGAAGGACATCATCGGAGCGTCGAGTTacTGCAGCAC ATGCTTCATGTCACAGAGTGGGACCTCACAGGCTGCTGCCCACGTGGCCGgtgagtc accaccccaccatcatcctgccaccatagccct The mutations should be kept as close to the middle of the sequence as possible.

Using ENCODE data Part 1: You are studying human genetic variation with respect to the APOA5 gene, involved in lipid metabolism (most notably triglycerides). Association studies have found that the single nucleotide polymorphism (SNP) rs662799 in the promoter region of the APOA5 gene is strongly associated with blood triglyceride levels in individuals of European descent. Although your initial hypothesis is that rs662799 is the causal variant, i.e., the SNP affects APOA5 function (probably by changing its expression) and thus influences blood triglyceride levels, you recognize that rs662799 is strongly linked to a number of nearby SNPs. You seek to use ENCODE data to prioritize among the SNPs to choose one SNP as the focus of your planned functional studies. You first use HaploReg, a Web server that summarizes information from ENCODE and other data sources, to quickly assess which SNPs are worth further consideration. You can access HaploReg at www.broadinstitute.org/mammals/haploreg/haploreg.php. Enter rs662799 in the box labelled "Query". Then click the "Submit" button. You will see a list of SNPs along with many columns of information. [Here is a link that may be useful in orienting you to HaploReg: www.genome.gov/pages/research/encode/ashg_2013_using_haploreg_regulom edb_to_mine_encode_data.pdf] The most relevant information for the purpose of this case: (1) Promoter histone marks - this indicates if the location of the SNP in the genome has been found to have histone marks suggestive of promoter activity in any cell types (2) Enhancer histone marks - this indicates if the location of the SNP in the genome has been found to have histone marks suggestive of enhancer activity in any cell types (3) DNAse - this indicates if the location of the SNP in the genome has been found to display DNAse I hypersensitivity (which indicates "open" chromatin suggestive of transcriptional activity or transcription regulatory activity, such as enhancer activity) in any cell types (4) Proteins bound - this indicates whether any proteins were found to bind to the location of the SNP in the genome by the ChIP-seq technique (5) eqtl tissues - this indicates if the SNP genotype was found to be associated with the expression level of any gene in any tissue types

(6) Motifs changed - this indicates whether the reference vs. alternate (or major vs. minor) SNP alleles are predicted to alter any transcription factor binding sites at the location of the SNP in the genome (7) GENCODE genes/dbsnp func annot - these two columns indicate whether there is any gene in the vicinity of the gene and, if so, the region of the gene (5'-UTR, 3'-UTR, intron, outside of the gene, etc.) The SNP you entered (rs662799) is highlighted in red in the list of SNPs. If you want to see more specific information on this SNP (or any of the SNPs), click on the SNP name. 1. Based on the data in the columns described above, and the extra data available when you click on rs662799, is there evidence from ENCODE that rs662799 lies in a region of genomic DNA with enhancer activity (in at least some types of cells)? Yes. According to HaploReg (v2), at that location there are enhancer histone marks in 6 cell types, DNAse I hypersensitivity in 5 cell types, and the predicted disruption of 5 types of transcription factor binding sites. If one clicks on the SNP name and looks at the detailed view, Enhancer or Enh is listed a large number of times. 2. Based only on the data in the columns described above, how many SNPs (including rs662799) have strong evidence from ENCODE - that is, evidence in several categories - that they lie in regions of genomic DNA with transcription regulatory activity (in at least some types of cells)? Four SNPs rs2266788, rs2072560, rs651821, rs662799 You next check your pared-down list of SNPs (based on your answer to question #2) using RegulomeDB, another Web server that summarizes information from ENCODE and other data sources. You can access RegulomeDB at www.regulomedb.org. Enter your list of SNPs, one per row, in the box. Then click the "Submit" button. You will see a list of the SNPs ranked by the "RegulomeDB Score". To interpret the scores:

[eqtl = expression quantitative trait locus; this indicates if the SNP genotype was found to be associated with the expression level of any gene in any tissue types] [TF = transcription factor] In general, the lower the score, the better the evidence that the SNP will affect a transcription regulatory site in the genome. Thus, the SNP at the top of your list has the best evidence. Clicking on the score itself will bring up additional information. [Here is a link that may be useful in orienting you to RegulomeDB: www.genome.gov/pages/research/encode/ashg_2013_using_haploreg_regulom edb_to_mine_encode_data.pdf] 3. Based on the score given by RegulomeDB, which single SNP has the most evidence that it affects a transcription regulatory site in the genome? The scores: rs651821 1f rs2072560 4 rs662799 4 rs2266788 5 rs651821 has the best score and thus the best evidence. Having chosen your "best" SNP, you want to prove that it lies in a transcription regulatory site in the genome. You hypothesize that the alternate allele (minor allele) creates a transcription factor binding site, whereas the reference allele (major allele) disrupts the site and prevents binding. To test this hypothesis, you decide to use CRISPR-Cas9 to specifically disrupt the alternate allele (minor allele) in a cell line that is homozygous for the alternate allele and assess whether this results in a change in the expression of APOA5. Reference allele sequence: GGTGAGCACGGCAGCCATGCTTGCCATTA[C]CTGCTCTGAGAAGACAGGTGGAGG GAGGC Alternate allele sequence: GGTGAGCACGGCAGCCATGCTTGCCATTA[T]CTGCTCTGAGAAGACAGGTGGAGG GAGGC Bonus question: using the sequences of the genomic site with either of the two alleles of your "best" SNP, as shown above, design a protospacer that will specifically cleave the site with the alternate allele, and not the site with the reference allele.

This is in fact the sequence surrounding rs651821. The only PAM near the SNP site is the CCA three bases upstream of the SNP site: GGTGAGCACGGCAGCCATGCTTGCCATTA[T]CTGCTCTGAGAAGACAGGTGGAGG GAGGC The protospacer spans the alternate allele, and so CRISPR-Cas9 using this guide RNA should preferentially target the alternate allele over the reference allele. To properly read the protospacer, one must use the reverse-complement of the underlined sequence: TGTCTTCTCAGAGCAG[A]TAA (the adjacent PAM would be TGG)

CRISPR Design (/) / Job "Bootcamp" (/job/5192567165890914) / Guides & Offtargets Help (/about) Forum (https://groups.google.com/forum/#!forum/crispr) "Bootcamp" Spacers Interactive results: mouse over a guide or explore below for details mm9 chr4 +106136304-106136554 #1 #2 #3 #4 #5 all guides scored by inverse likelihood of offtarget binding mouse over for details... show legend guide #1 quality score: 94 guide sequence: CTACTGTGCCCCACCGGCGC TGG on-target locus: chr4:+106136418 number of offtarget sites: 53 (12 are in genes) score sequence

score sequence Guide #1 94 CTACTGTGCCCCACCGGCGC TGG Guide #2 93 AGTGGGTGCCCATCGGGGCG AGG Guide #3 88 TCCTGGGCACCAGCGCCGGT GGG Guide #4 87 AGGAGAGGTGCGCGCGGCCT CGG Guide #5 87 GGCTGATGAGGCCGCACATG TGG Guide #6 87 CCTGGGCACCAGCGCCGGTG GGG Guide #7 87 CGCACCTCTCCTCGCCCCGA TGG Guide #9 86 GTCCTGGGCACCAGCGCCGG TGG Guide #10 83 CTCGTCCTGGGCACCAGCGC CGG Guide #11 83 AGGTGCGCGCGGCCTCGGGA AGG Guide #12 83 GTGCCCATCGGGGCGAGGAG AGG Guide #13 81 GGAGAGGTGCGCGCGGCCTC GGG Guide #14 80 CATAATCTCCATCCTCGTCC TGG Guide #15 79 GCAACGGCGGAAGGTGGCGG TGG Guide #16 78 GATGCTCGCCCTCCCGTCCC AGG Guide #17 77 ATAATCTCCATCCTCGTCCT GGG Guide #18 76 CCCCACCGGCGCTGGTGCCC AGG Guide #19 76 AGAGCAGTGGGTGCCCATCG GGG Guide #20 76 CATGTGCGGCCTCATCAGCC AGG Guide #21 72 CGGCGCTGGTGCCCAGGACG AGG Guide #22 72 GGTGGCGGTGGCCACATGTG CGG Guide #23 69 GCTGGTGCCCAGGACGAGGA TGG Guide #24 69 ACTGCTCTGCGTGGCTGCGG TGG Guide #27 68 CAGAGCAGTGGGTGCCCATC GGG Guide #28 66 GGCCATCCTCCTGGGACGGG AGG Guide #29 65 TGGGCACCCACTGCTCTGCG TGG Guide #30 64 GCCATCCTCCTGGGACGGGA GGG Guide #31 63 CCCACTGCTCTGCGTGGCTG CGG Guide #32 62 CCGCAGCCACGCAGAGCAGT GGG Guide #33 60 GGGGCGAGGAGAGGTGCGCG CGG Guide #34 60 ATCAGCCAGGCCATCCTCCT GGG Guide #35 59 top 20 genome-wide off-target sites sequence score mismatch ACCCTGGGCCCCACCGGCGCAGG 0.9 4MMs [1:2: AAACTGCGCGCCACCGGCGCAGG 0.9 4MMs [1:2:7 CCGCTGTACCCCACCGGCTCCGG 0.5 4MMs [2:3:8 CTTCCGTGCACCACAGGCGCGAG 0.4 4MMs [3:5:1 CTACGGTGCTCTACCGGCTCAAG 0.2 4MMs [5:10:1 CTACTTCTCCCCACCGGCTCTGG 0.2 4MMs [6:7:8 CTTCTGTTCCCCACCGTCCCAGG 0.2 4MMs [3:8:1 CTCCTGTGCCCCACCTGCCCAAG 0.2 3MMs [3:16 CTACCTTGCTCCACCGGTGCAGG 0.2 4MMs [5:6:1 CTCCCGTGCCCCGCCGGTGCAAG 0.1 4MMs [3:5:1 CTCCTGAGCCCTACCGGGGCAGG 0.1 4MMs [3:7:1 CCACTGGGCCCCACAGGCCCCAG 0.1 4MMs [2:7:1 ACACTGTGCCCCACCAGCCCCAG 0.1 4MMs [1:2:1 AGACTGTGCCCCACAGGTGCCAG 0.1 4MMs [1:2:1 CTCCTGTGCTCCACCTGCCCAAG 0.1 4MMs [3:10:1 CTGCTGTGCACCACCAGCCCTGG 0.1 4MMs [3:10:1 CTACTGTGCCCCACAGTCCCAAG 0.1 3MMs [15:17 CTCCTGTGCCGCACAGGCACAAG 0.1 4MMs [3:11:1 CTACTGGGCGCCACCGAGGCCGG 0.1 4MMs [7:10:1 CTGCTGTGCCCCACAGTCGAAGG 0.1 4MMs [3:15:1 CCACTGAGCCCCACCAGCCCTGG 0.1 4MMs [2:7:1 CTCCTGGGCCCCAGCGTCGCGGG 0.1 4MMs [3:7:1 TTAGTGTGCCCCACCAGTGCAGG 0.1 4MMs [1:4:1 CCACTCTGCCCCACCTGCCCAGG 0.1 4MMs [2:6:1 TTACTGTGGCCCACGGGTGCTAG 0.1 4MMs [1:9:1 CTTCTGTGCCCCACTGCCACCAG 0.1 4MMs [3:15:1 CTTCTGTGCCTCACCTGCCCTAG 0.0 4MMs [3:11:1 CTACTGTGCCCCTCATGCGCCAG 0.0 3MMs [13:15 CTACTGTGGGCCACTGGTGCTGG 0.0 4MMs [9:10:1

Guide #35 59 CCCGTCCCAGGAGGATGGCC TGG Guide #36 57 ACCGCAGCCACGCAGAGCAG TGG Guide #37 55 GCAGAGCAGTGGGTGCCCAT CGG Guide #38 54 GGAGGATGGCCTGGCTGATG AGG Guide #39 54 CATCAGCCAGGCCATCCTCC TGG Guide #40 53 GGCGGCAACAGCGGCAACAG CGG Guide #41 53 CCAGGCCATCCTCCTGGGAC GGG Guide #42 50 GCCCTCCCGTCCCAGGAGGA TGG Guide #43 49 GCCAGGCCATCCTCCTGGGA CGG Guide #44 38 AGCAGCAGCGGCGGCAACAG CGG Guide #45 8 AGCAACAGCAGCAGCAGCGG CGG Guide #46 3 AGCAGCAACAGCAGCAGCAG CGG Guide #8 86 GCACCTCTCCTCGCCCCGAT GGG Guide #25 68 TTGCTGCTACTGTGCCCCAC CGG Guide #26 68 GCTCGCCCTCCCGTCCCAGG AGG CTACTGTGGGCCACTGGTGCTGG 0.0 4MMs [9:10:1 CAACTGTTCCCCATCAGCGCTGG 0.0 4MMs [2:8:1 CTATTGTGCCCCCACGGCGGGAG 0.0 4MMs [4:13:1 CTACTGTGCAGCACTGGGGCCAG 0.0 4MMs [10:11: CCACTGTGCCCCTCCTGCACAGG 0.0 4MMs [2:13:1 CTACTGCTCCCCACCAGGGCTGG 0.0 4MMs [7:8:1 CTACTGTCCCCCACCTGCAAAGG 0.0 4MMs [8:16:1 CTACTGAGCCCCTCAGGCCCCAG 0.0 4MMs [7:13:1 CTACAGTGCCCCACCAACACCAG 0.0 4MMs [5:16:1 CTACTGTGCCCGCCCGCCCCCGG 0.0 4MMs [12:13: CTGCTCTGCCCCACCAGAGCTGG 0.0 4MMs [3:6:1 CTACTGTGACCCACTGGCAGAGG 0.0 4MMs [9:15:1 CTACTGTGTCCCTCAGGCTCGGG 0.0 4MMs [9:13:1 CTACTGTGACCCACTGACTCCAG 0.0 4MMs [9:15:1 CTCCTGTGCCCCAGGGTCGCAGG 0.0 4MMs [3:14:1 CTACTGTGCCCACCAGGCCCTGG 0.0 4MMs [12:13: CTACTATGCCCCACTGTAGCAGG 0.0 4MMs [6:15:1 CTACTGTGCCTCACCCTCCCAGG 0.0 4MMs [11:16: CTACTGTGACCCAGCGCCTCAGG 0.0 4MMs [9:14:1 CTACTGTGCCCTACCATCACAAG 0.0 4MMs [12:16: CTACTGTGCCCTAACGGCTTAAG 0.0 4MMs [12:14: CTACTGTGCCCCACCGCACACAG 0.0 4MMs [17:18: Zhang Lab, MIT 2013