CRISPRseek Workshop Design of target-specific guide RNAs in CRISPR-Cas9 genome-editing systems

Size: px
Start display at page:

Download "CRISPRseek Workshop Design of target-specific guide RNAs in CRISPR-Cas9 genome-editing systems"

Transcription

1 April 2008 CRISPRseek Workshop Design of target-specific guide RNAs in CRISPR-Cas9 genome-editing systems Lihua Julie Zhu August 1st 2014

2 Outline Background and Motives CRISPRseek Functionality Dependency Installation Get help and Reference Demo Exercise

3 Genome editing plays an increasingly central role in biomedical research Targeted gene mutation Study the function of individual gene Create transgenic animals Model disease Targeted transgene addition Engineer crops to be disease-resistant Gene therapy Erase a genetic mutation Correct faulty genes that cause genetic diseases like Huntington s disease Introduce a mutation to induce the resistance to virus infection, such as HIV infection (CCR5) HIV infection results in the death of immune system cells, particularly CD4+ T-cells leading to AIDS CCR5 is a co-receptor for HIV entry into T-cells and, if CCR5 is not expressed on their surface, HIV infects them with lower efficiency.

4 OVERVIEW OF GENOME EDITING METHODS Transgenesis Randomness Homologous Recombination More precise Lower efficiency Engineered nucleases (molecular scissors) Cut DNA at targeted site in the genome High efficiency Take advantage of endogenous DNA repair machinery

5 Targeted DNA breaks mobilize DNA repair mechanism J. Keith Joung and Jeffry D. Sander January; 14(1):49-55.

6 Earlier Engineered Nucleases Zinc Finger Nuclease - ZFN (two subunits) TALE Nuclease - TALEN (two subunits) Zinc finger array 1 FokI FokI Zinc finger array 2 FokI FokI TALE domain 1 TALE domain 2 site 1 (9-12 bps) spacer (6 bps) site 2 (9-12 bps) TALEN binding site 1 top strand (15-20 bps) Spacer (15-20 bps) TALEN binding site 2 bottom strand (15-20 bps)

7 CRISPR-CAS9 SYSTEM Clustered Regularly Interspaced Short Palindromic Repeats and CRISPR-associated (Cas) proteins An adaptive immune defense system found in archaea and bacteria. It provides resistance to potential invaders, such as plasmids and phages, by recognizing and cleaving foreign nucleic acids In 2012, it was shown that CRISPR-cas9 can be easily modified to activate or deactivate target genes with high efficiency and low expense CRISPR Therapeutics Editas Medicine

8 GENOME EDITING WITH CRISPR-CAS9 SYSTEM Target DNA Protospacer adjacent motif (PAM)

9 grna design: minimize off-target cleavage CAS9 protein 20 bp guide sequence DNA target cut site guide RNA (derived from CRISPR crna + tracrna) GG PAM Effects of single mismatches on cleavage efficiency guide position PAM position normal cleavage no cleavage P. Hsu, et al., Nature Biotechnology July 2013

10 Paired nickases will reduce off-target rates - Nickase only cleaves one strand: low mutation rate - Requires two nickases to create mutation - Paired nickases can have high mutation rate with greater specificity * F. Zhang, et al Cell Aug 28, 2013

11 Identify INDELs by restriction enzyme digestion For example sgrna atgaacagctcgacttctgcagg PCR to amplify Target locus No indel Cleavage site Pst1 PstI Slide courtesy of Huan Yang PAM PstI digestion - + PstI with indel e.g., one A insertion A Loss of PstI

12 CHALLENGES Minimize off-target cleavage Respond rapidly to CRISPR-cas9 technology Cas9 from different species New off-target analysis data Novel configurations Different methods for synthesis and delivery of nucleases to cells - Impose different constraints on the grnas Design nucleases to analyze closely related sequences Cleave one allele but the other Cleave both with similar efficiency Cleave endogenous DNA but not donor DNA

13 CRISPRSEEK Facilitate design of CRISPR-Cas9 genome-editing systems Identify target-specific grnas for input sequences Identify grnas common to two related sequences Identify grnas unique to one of two related sequences

14 FUNCTIONALITY Find potential grnas in input sequences. Output grnas in fasta or genbank format Output grnas in paired configuration Output grnas annotated with restriction enzyme cut sites Off-target searching and scoring in a given genome for each grna. Searching for off-targets with user defined maximum mismatch allowed Using position-dependent mismatch penalty scoring system from experimental data Filtering grnas with minimum off-target cutting Retrieve genomic sequences flanking off-target sites and indicating whether the off-target sites are in critical region of the gene such as exon. Compare two input sequences to identify grnas that specifically target one of the sequences or both.

15 DEPENDENCY BiocGenerics S4 generic functions needed by many Bioconductor packages. BSgenome Supplies infrastructure for efficiently representing, accessing and analyzing whole genome Biostrings Implements functions for pattern matching, sequence alignment and string manipulation

16 INSTALLATION Install R base/ Windows: OS X: Source (Linux): sources.html

17 INSTALLATION CNT D CRISPRseek can be installed from R as: source(" bioclite("crisprseek") All the dependent packages will be installed automatically BiocGenerics Biostring BSgenome The organism-specific package BSgenome.Hsapiens.UCSC.hg19, TxDb.Hsapiens.UCSC.hg19.knownGene need to be installed to run the code snippets in the vignette. bioclite("bsgenome.hsapiens.ucsc.hg19") bioclite("txdb.hsapiens.ucsc.hg19.knowngene")

18 MAIN FUNCTIONS offtargetanalysis Find potential grnas for the input sequence Output grnas in fasta or genbank format Output grnas in paired configuration Output grnas with restriction enzyme cut sites Searching and scoring off-target sites for each grna Retrieve genomic sequences flanking off-target sites and indicating whether the off-target sites are in critical region of the gene such as exon Output off-target details including mismatch position and cleavage score Output grnas with topn off-target cutting scores

19 MAIN FUNCTIONS CNT D compare2sequences Find potential grnas for both input sequences Output grnas in fasta or genbank format Output grnas in paired configuration Output grnas with restriction enzyme cut sites Assign relative cleavage score to each grna for both input sequences Facilitate identification of grnas that specifically target one of the two input sequences Facilitate identification of grnas that target both input sequences

20 DNA TARGET SITES S. PYOGENES guide sequence /grna in CRISPRseek PAM sequence CCACTGTGTGCACTTCATCCTGG

21 DEFAULT PARAMETERS OFFTARGETANALYSIS offtargetanalysis(inputfilepath, format = "fasta", findgrnas = TRUE, exportallgrnas = c("all", "fasta", "genbank", "no"), findgrnaswithrecutonly = TRUE, REpatternFile, minrepatternsize = 6, overlap.grna.positions = c(17, 18), findpairedgrnaonly = TRUE, min.gap = 0, max.gap = 20, grna.name.prefix = "grna", PAM.size = 3, grna.size = 20, PAM = "NGG", BSgenomeName, chromtosearch = "all", max.mismatch = 4, PAM.pattern = "N[A G]G$", grna.pattern = ", min.score = 0.5, topn = 100, topn.offtargettotalscore = 10, annotateexon = TRUE, txdb, outputdir, fetchsequence = TRUE, upstream = 200, downstream = 200, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE) Default setting for CRISPR-cas9 system in S. pyogenes grnaf1_rs362331tstart22end44

22 Constraint Guide Sequence By setting grna.pattern to require or exclude specific features within the target site. For example, synthesis of grnas in vivo from host U6 promoters is more efficient if the first base is guanine and grna synthesis in vitro using T7 promoters is most efficient when the first two bases are GG These features can be specified by setting grna.pattern = "^G" and "^GG" respectively Another example is that five consecutive uracils in any position of a grna will affect transcription elongation by RNA polymerase III. To avoid premature termination during grna synthesis using U6 promoter, we can set grna.pattern = "^(?:(?!T{5,}).)+$". In addition, some studies have identified sequence features that broadly correlate with lower nuclease cleavage activity, such as uracil in the last 4 positions of the guide sequence To avoid uracil in these positions, we can specify grna.pattern = "[ACG]{4,}.{3}$ Regular expression or IUPAC Extended Genetic Alphabet to represent grna pattern. Type help(translatepattern) for a list of IUPAC Extended Genetic Alphabet

23 DEFAULT PARAMETERS OFFTARGETANALYSIS offtargetanalysis(inputfilepath, format = "fasta", findgrnas = TRUE, exportallgrnas = c("all", "fasta", "genbank", "no"), findgrnaswithrecutonly = TRUE, REpatternFile, minrepatternsize = 6, overlap.grna.positions = c(17, 18), findpairedgrnaonly = TRUE, min.gap = 0, max.gap = 20, grna.name.prefix = "grna", PAM.size = 3, grna.size = 20, PAM = "NGG", BSgenomeName, chromtosearch = "all", max.mismatch = 4, PAM.pattern = "N[A G]G$", grna.pattern = ", min.score = 0.5, topn = 100, topn.offtargettotalscore = 10, annotateexon = TRUE, txdb, outputdir, fetchsequence = TRUE, upstream = 200, downstream = 200, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE) Default setting for CRISPR-cas9 system in S. pyogenes

24 DEFAULT PARAMETERS OFFTARGETANALYSIS offtargetanalysis(inputfilepath, format = "fasta", findgrnas = TRUE, exportallgrnas = c("all", "fasta", "genbank", "no"), findgrnaswithrecutonly = TRUE, REpatternFile, minrepatternsize = 6, overlap.grna.positions = c(17, 18), findpairedgrnaonly = TRUE, min.gap = 0, max.gap = 20, grna.name.prefix = "grna", PAM.size = 3, grna.size = 20, PAM = "NGG", BSgenomeName, chromtosearch = "all", max.mismatch = 4, PAM.pattern = "N[A G]G$", grna.pattern = ", min.score = 0.5, topn = 100, topn.offtargettotalscore = 10, annotateexon = TRUE, txdb, outputdir, fetchsequence = TRUE, upstream = 200, downstream = 200, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE) Default setting for CRISPR-cas9 system in S. pyogenes REpatternFile <- system.file('extdata', 'NEBenzymes.fa', package= 'CRISPRseek')

25 DEFAULT PARAMETERS OFFTARGETANALYSIS offtargetanalysis(inputfilepath, format = "fasta", findgrnas = TRUE, exportallgrnas = c("all", "fasta", "genbank", "no"), findgrnaswithrecutonly = TRUE, REpatternFile, minrepatternsize = 6, overlap.grna.positions = c(17, 18), findpairedgrnaonly = TRUE, min.gap = 0, max.gap = 20, grna.name.prefix = "grna", PAM.size = 3, grna.size = 20, PAM = "NGG", BSgenomeName, chromtosearch = "all", max.mismatch = 4, PAM.pattern = "N[A G]G$", grna.pattern = ", min.score = 0.5, topn = 100, topn.offtargettotalscore = 10, annotateexon = TRUE, txdb, outputdir, fetchsequence = TRUE, upstream = 200, downstream = 200, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE) Default setting for CRISPR-cas9 system in S. pyogenes Position specific penalty score matrix is experimentally determined by Feng Zhang s lab at MIT Alternative weights = c(0, 0, 0, 0, 0, 0.311, 0.329, 0, 0.47, 0.011, 0.335, 0.395, 0.584, 0.829, 0.762, 0.795, 0.67, 0.816, 0.74, 0.656) Hsu et al., 2013

26 DEFAULT PARAMETERS OFFTARGETANALYSIS offtargetanalysis(inputfilepath, format = "fasta", findgrnas = TRUE, exportallgrnas = c("all", "fasta", "genbank", "no"), findgrnaswithrecutonly = TRUE, REpatternFile, minrepatternsize = 6, overlap.grna.positions = c(17, 18), findpairedgrnaonly = TRUE, min.gap = 0, max.gap = 20, grna.name.prefix = "grna", PAM.size = 3, grna.size = 20, PAM = "NGG", BSgenomeName, chromtosearch = "all", max.mismatch = 4, PAM.pattern = "N[A G]G$", grna.pattern = ", min.score = 0.5, topn = 100, topn.offtargettotalscore = 10, annotateexon = TRUE, txdb, outputdir, fetchsequence = TRUE, upstream = 200, downstream = 200, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE) Default setting for CRISPR-cas9 system in S. pyogenes Position specific penalty score matrix is experimentally determined by Feng Zhang s lab at MIT Alternative weights = c(0, 0, 0, 0, 0, 0.311, 0.329, 0, 0.47, 0.011, 0.335, 0.395, 0.584, 0.829, 0.762, 0.795, 0.67, 0.816, 0.74, 0.656)

27 OUTPUT FILES grnas (genbank, fasta) Restriction Site Overlap (tab delimited) Paired Sites (tab delimited) Off-Target sites (tab delimited) Summary (tab delimited)

28 DEFAULT PARAMETERS COMPARE2SEQUENCES compare2sequences(inputfile1path, inputfile2path, format = "fasta", findgrnaswithrecutonly = FALSE, REpatternFile, minrepatternsize = 6, overlap.grna.positions = c(17, 18), findpairedgrnaonly = FALSE, min.gap = 0, max.gap = 20, grna.name.prefix = "grna", PAM.size = 3, grna.size = 20, PAM = "NGG", PAM.pattern = "N[A G]G$", max.mismatch =4, outputdir, weights = c(0, 0, 0.014, 0, 0, 0.395, 0.317, 0, 0.389, 0.079, 0.445, 0.508, 0.613, 0.851, 0.732, 0.828, 0.615, 0.804, 0.685, 0.583), overwrite = FALSE)

29 compare2sequences might output no match or more than one match to the alternative input sequence for each grnas identified for each input sequence depending on max.mismatch (default is 4 mismatches allowed) Solution Caveat Suggest sort the output by grnapluspam and scorediff to examine possible multiple off-target sites in the alternative sequence, if you aim to identify grnas to target one of the two input sequences only.

30 NEW FEATURES AND FUTURE PLAN Model effect of PAM variants, e.g., NAG vs. NGG Model effect of different base changes, e.g., A->G vs. C->G Annotate off-targets with gene name Speed up off-target search

31 REFERENCE AND HELP?CRISPRseek in a R session browsevignettes( CRISPRseek") Zhu LJ*, Holmes BR, Aronin N and Brodsky MH*. (2010) [* denotes co-corresponding author] CRISPRseek: a Bioconductor package to identify target-specific guide RNAs for CRISPR-Cas9 genomeediting systems. PloS One (Accepted) bioconductor <bioconductor@stat.math.ethz.ch>

32 ACKNOWLEDGEMENT The Bioconductor core members Hervé Pagès Martin Morgan University of Massachusetts Michael Brodsky (PGFE and PMM) Neil Aronin (RNA Therapeutics Institute and Department of Medicine) Broad Institute of MIT and Harvard Benjamin Holmes Feng Zhang

33 DEMO & EXERCISE

34 RSTUDIO IN AMAZON CLOUD We will be using an Amazon Machine Instance (AMI) on an Ubuntu Linux machine that is pre-configured with RELEASE version of Bioconductor (2.14) compute- 1.amazonaws.com User name: ubuntu Password: bioc BioC2014/CRISPRdemo.Rmd