SUPPLEMENTARY INFORMATION

Size: px
Start display at page:

Download "SUPPLEMENTARY INFORMATION"

Transcription

1 doi: /nature16191 SUPPLEMENTARY DISCUSSION We analyzed the current database of solved protein structures to assess the uniqueness of our designed toroids in terms of global similarity and bundle handedness. DALI searches: We submitted representative models from each of the design families to the DALI server 1. For the left-handed bundles, top Z-scores ranged from 3 to 5 with relatively short alignments that did not cover substantial portions of the design or matched PDB structure. For the right-handed design dtor_6x33r, on the other hand, the top Z-scores exceeded 11.0 and top alignments were longer, although sequence identities were low (5-20%) and RMSD s were high (above 6Å). Top-scoring matches were found primarily to right-handed helical bundles assembled from hairpins (as is dtor_6x33r), however we did not find (by visual inspection of the top few matches) any toroidal (closed) structures. Handedness of alpha-toroids of known structure: We sought to establish whether there exist left-handed, closed alpha-solenoids (i.e., alpha-toroids) in the current database of solved structures. To minimize the likelihood of missing a similar structure, we consulted multiple databases. We visually inspected representatives from the SCOPe 2 alpha/alpha toroid fold (a.102), the CATH 3 alpha/alpha barrel architecture, the ECOD 4 alpha/alpha toroid topology, and the RepeatsDB 5 alpha-barrel classification and did not find any alpha-toroids with a clear lefthanded bundle. It is worth mentioning that we did encounter a few alpha-helical barrel structures (for example, PDB ID 1okc) for which it is difficult to assign a handedness because the alpha-helices composing the barrel follow an up-down path rather than twisting around a bundle axis (in other words, the helices do not form a true solenoid/super-helical structure, a fact which is supported by the SCOPe and ECOD classifications of 1okc). Handedness of alpha-helical repeat bundles: To gain insight into the handedness of alpha-helical repeat bundles more generally, we performed a combined manual and automated analysis of alpha-helical bundles gathered from several databases. The SCOPe database contains two folds already classified by handedness: a.118, the alphaalpha superhelix fold, described as having a right-handed superhelix, and a.298, the left-handed alpha-alpha superhelix fold. The a.118 fold class is composed 24 superfamilies, which include canonical alpha-helical repeats such as the Armadillo and TPR families. By contrast, the a.298 fold includes only the TAL effector-like family, composed of DNA-binding domains from plant-pathogenic bacteria (and designed variants thereof). In the ECOD database, alpha superhelices are collected together into a single top-level grouping. We downloaded representative structures and domain boundaries (410 total) for this grouping and analyzed them using Rosetta to determine the handedness of the bundle. Visual inspection of all domains identified as potentially left-handed revealed that the only proteins containing multiple complete turns of the solenoid belonged to the mitochondrial mterf and TAL effector families. In ECOD these two homology-level families are grouped together at the "possible homology" ("X") level, and indeed there are similarities in their overall structures and modes of DNA binding. Finally, we analyzed representatives of the alpha-solenoid (III.3) grouping in RepeatsDB (48 domains) and found no left-handed bundles. While this analysis of bundle handedness is not comprehensive, depending as it does on manually curated structural classifications and visual inspection, we expect that the overall conclusion that left-handed alpha-helical tandem repeat bundles are substantially less common in the current database of solved structures than right-handed bundles is valid. Structure of the toroid dimers in solution: Two of the crystallized toroid designs dtor_3x33l_2-2 and dtor_6x35l form stable dimers in solution. To investigate the nature of these dimeric interfaces, we examined the crystal packing interactions in our solved structures. For the 3x33L design, the same monomer-monomer packing interaction (Extended Data Fig. 6(a)) is seen in both the crystal forms (P and P ), which leads us to believe that this mode of association provides a plausible model for the manner in which the monomers associate in solution. In the single crystal structure of the 6x35L design, two monomers stacking in a head-to-head manner and stabilized by electrostatic interactions (Extended Data Fig. 6(b)) appear to provide the most likely model of the solution dimer. An alternative in either case would be for the monomers to associate as head-to-tail dimers to form a single larger ring, akin to the tetrameric ring formed by the 3-repeat dtor_9x31l_sub construct, with concomitant breakage of the intra-monomer interactions between terminal repeats that are observed in the crystalline state. Given the stable, well-packed nature of the the 3x and 6x designs, we hypothesized that such an association mode would be disfavored due to loss of favorable packing interactions and energetic strain caused by the altered curvature of a larger ring. To evaluate this hypothesis in the case of the 3-repeat construct 3x33L_2-2, we performed multiple 1

2 independent symmetric folding simulations in order to model the structure of a 6-repeat ring composed of two copies of the 3-repeat construct; for comparison, we used the same simulation protocol to model the 3-repeat construct forming a 3-repeat ring as designed. Analysis of packing quality and per-residue energies in these simulations suggests that the 6-repeat dimerization mode is indeed significantly less favorable than the 3-repeat conformation: mean packing scores (lower is better) of -0.62±0.97 and 3.16±1.13 for the 3x and 6x ring simulations, respectively; mean per-residue energies (lower is better) of -1.81±0.16 and -1.51±0.10 for the 3x and 6x ring simulations, respectively. 2

3 SUPPLEMENTARY DATA Amino acid sequences of the protein designs referenced in this study. >dtor_9x31l_sub repeat_num= 3 repeat_len= 31 protein_len= 93 YYSGTTVEEAYKLALKL >dtor_3x33l_1 repeat_num= 3 repeat_len= 33 protein_len= 99 GKSPTEALLKLIAEAKGITETEAKEEAEKALKEGKSPTEALLKLIAEAKGITETEAKEEAEKALKEGKSPTEALLK LIAEAKGITETEAKEEAEKALKE >dtor_3x33l_1-1 repeat_num= 3 repeat_len= 33 protein_len= 99 GKSPTEALLKLIAEAKGITSTEAKEEAIKALKEGKSPTEALLKLIAEAKGITELEAKVLAEKALKEGKSPTEALLK LIAEAKGITETEAKLEAEKALKE >dtor_3x33l_2 repeat_num= 3 repeat_len= 33 protein_len= 99 GKSPTEVLLELIAEASGTTKEEVKEKFLKELSKGKSPTEVLLELIAEASGTTKEEVKEKFLKELSKGKSPTEVLLE LIAEASGTTKEEVKEKFLKELSK >dtor_3x33l_2-1 repeat_num= 3 repeat_len= 33 protein_len= 99 GKSPTEVLLELIAEASGTTKEEVKRKFLKELSKGKSPTEVLLELIAEASGTTKAEVKREFLWELSLGKSPTEVLLE LIAEASGTTKEEVKEKFLAELEK >dtor_3x33l_2-2 repeat_num= 3 repeat_len= 33 protein_len= 99 GKSPTEVLLELIAEASGTTREEVKEKFLKELRKGKSPTEVLLELIAEASGTTKEEVKEKFLKELSFGKSPTEVLLE LIAEASGTTKEEVKKKFWKELSL >dtor_3x33l_2-3 repeat_num= 3 repeat_len= 33 protein_len= 99 GKSPTEVLLELIAEASGTTKEEVKEKFLKELSKGKSPTEVLLELIAEASGTTKEEVKEKFLKELSKGKSPTEVLLE LIAEASGTTKREVKRWFLFELRK >dtor_3x33l_2-4 repeat_num= 3 repeat_len= 33 protein_len= 99 GKSPTEVLLELIAEASGTTKAEVKLKFLFELSFGKSPTEVLLELIAEASGTTKEEVKEKFLKELFKGKSPTEVLLE LIAEASGTTKEEVKEKFLKELSK >dtor_3x33l_3 repeat_num= 3 repeat_len= 33 protein_len= 99 GYSTTEALLILIAEASGTTVEQQKQRFKELVKKGYSTTEALLILIAEASGTTVEQQKQRFKELVKKGYSTTEALLI LIAEASGTTVEQQKQRFKELVKK >dtor_6x33r_1 repeat_num= 6 repeat_len= 33 protein_len= 198 GDKTAIAQILAIKASAKGDETELERALRYAKKVGDKTAIAQILAIKASAKGDETELERALRYAKKVGDKTAIAQIL AIKASAKGDETELERALRYAKKVGDKTAIAQILAIKASAKGDETELERALRYAKKVGDKTAIAQILAIKASAKGDE TELERALRYAKKVGDKTAIAQILAIKASAKGDETELERALRYAKKV >dtor_6x33r_1-1 repeat_num= 6 repeat_len= 33 protein_len= 198 GDKTAIAQILAIKASAKGDETELERALRYAVKVGDKTAIAQILAIKASAKGDETELEQALRYAKFVGDKTAIAQIL AIKASAKGDELELTRALAYAKKVGDKTAIAQILAIKASAKGDETELERALRYAKLVGDKTAIAQILAIKASAKGDE TELERALRYAKYVGDKTAIAQILAIKASAKGDEPELEYALAYAKKV >dtor_6x33r_1-2 repeat_num= 6 repeat_len= 33 protein_len= 198 GDKTAIAQILAIKASAKGDETELERALRYAKKVGDKTAIAQILAIKASAKGDETELERALIFAEAVGDKTAIAQIL AIKASAKGDETELERALRYAKKVNDKTAIAQILAIKASAKGDETELDRALWYAKKVGDKTAIAQILAIKASAKGDE TELERALRYAKKVGDKTAIAQILAIKASAKGDETELERALLYAKKV 3

4 >dtor_6x33r_1-3 repeat_num= 6 repeat_len= 33 protein_len= 198 GDKTAIAQILAIKASAKGDETELERALRYAKKVGDKTAIAQILAIKASAKGDETELERALAYARLVGDKTAIAQIL AIKASAKGDETELERALRYAEKVGDKTAIAQILAIKASAKGDEQELEAALIYAKKVGDKTAIAQILAIKASAKGDE TELERALRYAKKVGDKTAIAQILAIKASAKGDETELERALWYAKKV >dtor_6x33r_2 repeat_num= 6 repeat_len= 33 protein_len= 198 GDRSAIATAYIALAEYLGDKEALLKAIEIAIKLGDRSAIAEAYIALARYLGDKEALLKAIEIAIKLGDRSAIATAY IALAEYLGDKEALLKAIEIAIKLGDRSAIAEAYIALARYLGDKEALLKAIEIAIKLGDRSAIATAYIALAEYLGDK EALLKAIEIAIKLGDRSAIAEAYIALARYLGDKEALLKAIEIAIKL >dtor_6x33r_3 repeat_num= 6 repeat_len= 33 protein_len= 198 GDKSALAQILAIYASAYGDTTLFLRALKLAKEVGDKSALAQILAIYASAYGDTTLFLRALKLAKEVGDKSALAQIL AIYASAYGDTTLFLRALKLAKEVGDKSALAQILAIYASAYGDTTLFLRALKLAKEVGDKSALAQILAIYASAYGDT TLFLRALKLAKEVGDKSALAQILAIYASAYGDTTLFLRALKLAKEV >dtor_6x33r_4 repeat_num= 6 repeat_len= 33 protein_len= 198 GDLELYIRVLAIVAEAEGDKTKLELALKLALKKGDLKLYIEVLAIVAEAEGDKTKLELALKLALKKGDLELYIRVL AIVAKAEGDKTKLELALKLALKKGDLKLYIEVLAIVAEAEGDKTKLELALKLALKKGDLELYIRVLAIVAEAEGDK TKLELALKLALKKGDLKLYIEVLAIVAKAEGDKTKLELALKLALKK >dtor_6x35l repeat_num= 6 repeat_len= 35 protein_len= 210 VSLEQALKILKVAAELGTTVEEAVKRALKLKTKLGVSLEQALKILEVAAELGTTVEEAVKRALKLKTKLGVSLEQA LKILEVAAKLGTTVEEAVKRALKLKTKLGVSLEQALKILKVAAELGTTVEEAVKRALKLKTKLGVSLEQALKILEV AAELGTTVEEAVKRALKLKTKLGVSLEQALKILEVAAKLGTTVEEAVKRALKLKTKLG >dtor_6x35l(semet) repeat_num= 6 repeat_len= 35 protein_len= 210 VSLEQALKILKVAAELGTTVEEAVKRALKLKTKLGVSLEQALKILEVAAELGTTVEEAVKRALKLKTKLGVSLEQA LKILEVAAKLGTTVEEAVKRALKLKTKLGVSLEQALKILKVAAELGTTVEEAVKRALKLKTKLGVSLEQALKILEV AAELGTTVEEAVKRAMKLKTKLGVSLEQALKILEVAAKLGTTVEEAVKRALKLKTKLG >dtor_9x31l repeat_num= 9 repeat_len= 31 protein_len= 279 YYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAYKLALKLGISVEELLKLAEAAYYSGTTVEEAYKLA LKLGISVEELLKLAKAAYYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAYKLALKLGISVEELLKLA EAAYYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAYKLALKL >dtor_12x31l repeat_num= 12 repeat_len= 31 protein_len= 372 YYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAYKLALKLGISVEELLKLAEAAYYSGTTVEEAYKLA LKLGISVEELLKLAKAAYYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAYKLALKLGISVEELLKLA EAAYYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAY KLALKLGISVEELLKLAEAAYYSGTTVEEAYKLALKLGISVEELLKLAKAAYYSGTTVEEAYKLALKL 4

5 References 1 Holm, L. & Rosenstrom, P. Dali server: conservation mapping in 3D. Nucleic Acids Research 38, W545- W549, doi:doi /nar/gkq366 (2010). 2 Fox, N. K., Brenner, S. E. & Chandonia, J. M. SCOPe: Structural Classification of Proteins- - extended, integrating SCOP and ASTRAL data and classification of new structures. Nucleic Acids Res 42, D , doi: /nar/gkt1240 (2014). 3 Sillitoe, I. et al. CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43, D , doi: /nar/gku947 (2015). 4 Cheng, H. et al. ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10, e , doi: /journal.pcbi (2014). 5 Di Domenico, T. et al. RepeatsDB: a database of tandem repeat protein structures. Nucleic Acids Res 42, D , doi: /nar/gkt1175 (2014). 5