TALENs (Transcription Activator-Like Effector Nucleases)

Size: px
Start display at page:

Download "TALENs (Transcription Activator-Like Effector Nucleases)"

Transcription

1 TALENs (Transcription Activator-Like Effector Nucleases) The fundamental rationale between TALENs and ZFNs is similar, namely, combine a sequencespecific DNA-binding peptide domain with a nuclease domain to induce double strand breaks in DNA. The DNA-binding domain in the case of TALENs is derived from proteins coded for by certain plant-pathogenic bacteria belonging to the Xanthomonas genera. When the bacterium infects a plant cell, it secrets certain proteins called TALEs (Transcription Activator-Like Effectors), which bind to certain promoters in the host genome to regulate transcription. The binding is sequence-specific, and the control of transcription is part of the establishment of infection. Interestingly, a subset of the genes activated by the effector proteins is involved in host defense mechanisms. In the evolutionary battle between host and pathogen, the host has evolved strategies to exploit the effector proteins to signal the presence of the pathogen, and initiate appropriate defensive actions. Each effector protein is comprised of several effector modules, ~34 amino acids long, and arranged as a tandem array. These modules are arranged within the central part of the protein. The N-terminal part of the protein is required for its secretion from the bacterium into the host cell, while the C-terminal part is required for its entry and localization in the plant cell nucleus. Within the conserved organization of modules, positions 12 and 13 vary according to the characteristic base pair specificity of each module. These residues are called RVDs or repeat variable diresidues. In other words, the RVDs may be regarded as a code for the recognition of specific base pairs (A-T, T-A, G-C or C-G). The RVD recognition Table is shown below. 1

2 The structure of the 23.5 repeat modules from an effector protein of a rice pathogen has been solved in the bound form with the target DNA (see Figure on the next page). Each repeat module is comprised of two alpha-helices joined by a loop harboring the RVD residues. The helices within a repeat module form a left handed superhelix. However the super helical structure formed by the helical modules from the ensemble of the repeat units has a right-handed chirality. This handedness is the same as that of B-form DNA, and enables the TALE modules to make basespecific contacts through the major groove of DNA. The structure shows the sequence-specific contact provided by each module in accordance with the Table above. Note that the Table shows the optimal contacts. However there is some degeneracy in the code. For example NN specifies G-C contact (according to the Table) but can also make A-T contact. The series of modules are in register to interact sequentially with consecutive bases of the sense strand of the DNA (the strand whose sequence corresponds to that of the transcribed RNA, except for the T to U change in the RNA) (see Figure below). The alignment of the repeat modules from the amino to the carboxyl direction is collinear with the 5 to 3 direction of the DNA strand with which contacts are made. The structure in the figure below shows a 36 bp B-form DNA bound by 23.5 repeat modules of a TALE protein, which recognizes ~23 bp. For the individual modules primarily responsible for sequence-specific contacts, there is a one-to-one correspondence between each repeat module and its cognate base pair. The first residue of the RVD pair (residue #12) is not directly involved in DNA recognition. It is hydrogen bonded to the carbonyl oxygen of residue 8, which constrains the loop to position residue #13 for contacting the base. In many instances, the interaction is mediated by Van der Waals contacts, although hydrogen bonding with N-7 of guanine is also seen. The N* RVD pair 2

3 is peculiar in that it is missing the #13 residue. There are two consecutive Gly residues which follow the RVD pair in all of the repeat modules. Thus Gly-1 becomes de facto residue 13. The specificity of N* is somewhat lax. Perhaps it helps the recognition of C-G in which the C residue is methylated. Methylation of C residues is often encountered with C-G islands found in eukaryotic genomes. 3

4 Just preceding the start of the consensus array of repeat modules (amino-terminal to them), there appear to be two degenerate (or pseudo) modules which cooperate to confer specificity to T which precedes the target sequence in a large number of cases. In a minority of cases, this position is C. The last repeat of the effector domain is incomplete, and is referred to as a half-repeat. Hence the 23.5 modules in the effector used for the structure determination. Based on the specificity of individual modules, one can generate a variety of combinatory modules to obtain a desired binding specificity. Fusion of TALEs to FokI nuclease domain Once the repeats have been chosen to define the binding specificity, the next step is to fuse them to the FokI nuclease domain (the same domain as that described in the context of ZFNs). Generally speaking, each site starts with a T, which is contacted by the two cryptic (degenerate) modules that precede the authentic modules. The length of the recognition target may range from base pairs. The final half-repeat plus a few additional residues linked to the FokI nuclease domain completes the TALEN construct. To optimize the dimerization of the nuclease domain, two TALENs will have to be designed to target opposite strands of the DNA. These two sites are separated by bp to accommodate the extra amino acid sequences present between the last functional repeat module and the nuclease domain. A general picture of the targetable nucleases that we dealt with is shown in the next page. The CRISPR-Cas system recognizes the DNA target by one-to-one pairing of complementary bases. The ZFNs mediate target interaction by the recognition of triplet units by each zinc finger. Finally, the TALENS recognize their targets by a one-to-one recognition between a repeat module and a base pair. 4

5 In the ZFNs and TALENs, peptide modules mediate the recognition specificity for the DNA targets. The FokI nuclease domain, once dimerized, will cleave the two strands of DNA. In the CRISPR- Cas9 system, DNA recognition is mediated by the complementarity between the guide RNA and DNA with assistance from the Cas9 protein and the PAM sequence. The Cas9 protein cleaves one DNA strand using its HNH nuclease domain and the other strand using the RuvC nuclease domain. 5