Large-Scale Protein-Protein Interaction Detection Approaches: Past, Present and Future

Size: px
Start display at page:

Download "Large-Scale Protein-Protein Interaction Detection Approaches: Past, Present and Future"

Transcription

1 Biotechnology & Biotechnological Equipment ISSN: (Print) (Online) Journal homepage: Large-Scale Protein-Protein Interaction Detection Approaches: Past, Present and Future N. Chepelev, L. Chepelev, M.D. Alamgir & A. Golshani To cite this article: N. Chepelev, L. Chepelev, M.D. Alamgir & A. Golshani (2008) Large-Scale Protein-Protein Interaction Detection Approaches: Past, Present and Future, Biotechnology & Biotechnological Equipment, 22:1, , DOI: / To link to this article: Taylor and Francis Group, LLC Published online: 15 Apr Submit your article to this journal Article views: 139 Citing articles: 4 View citing articles Full Terms & Conditions of access and use can be found at

2 REVIEW LARGE-SCALE PROTEIN-PROTEIN INTERACTION DETECTION APPROACHES: PAST, PRESENT AND FUTURE N. Chepelev, L. Chepelev, M.D. Alamgir and A. Golshani Carleton University, Ottawa Institute of Systems Biology, Department of Biology, Ottawa, Ontario, Canada Correspondence to: Ashkan Golshani ABSTRACT Protein-protein interaction elucidation is of immense importance to biology, medicine, and related fi elds. It is now realized that various diseases such as different types of cancers, Alzheimer s disease, etc, require an integrated view of protein interaction networks. To aid in deciphering these networks, a number of methods have been developed including yeast two-hybrid analysis, tandem affi nity purifi cation tagging, as well as protein microarray technologies. In this article, we discuss some of the most important trends and technologies of the past, refl ect on their present, and explore some exciting future directions in the fi eld of large-scale protein-protein interaction detection. We argue that the future of the protein interaction elucidation fi eld lies in the development of novel and/or improved high-throughput techniques that generate reproducible and most importantly, quantitative data. Keywords: protein interaction map, proteome, TAP tag, Y2H analysis, protein chip Introduction Much like a human society, where the overall behavior of a nation is determined by multitudes of daily interactions of its numerous citizens with each other and their surroundings, the overall state of the cell is determined by the interactions and activity of its constituent parts, a large proportion of which are proteins. Since the annotation and sequencing of genomes of a number of organisms, we have learned much about the roles and structures of many proteins, and yet a large number of proteins are still to have a function and a description assigned to them. The interactions that a given protein may undergo with other proteins may define to a large extent its role within the cell. Further, a number of pathways require the concerted and extremely well-choreographed action of multimeric proteins in order to robustly maintain homeostasis of the cell or to trigger an appropriate response to an outside stimulus. Thus, the elucidation of protein interactions is a central problem in biology today. Unless we completely understand the complex interaction patterns of the tens of thousands of proteins that constitute our proteome, we cannot hope to even attempt to efficiently combat some of the most important diseases (2, 65), such as cancer, Alzheimer s, and even ageing itself (13), let alone gain an integrated understanding of the living cell. The sheer scale of the number of possible interactions that proteins in a human cell may undergo soon made researchers worldwide realize that there are probably more different possible interactions than there are researchers in the field. Thus, high throughput approaches for the elucidation of protein-protein interactions have rapidly gained appreciation since the introduction of yeast two-hybrid method in the early 1990 s (23). Since then, a number of advances in the field of high throughput interaction profiling have resulted in improved specificity, selectivity, and applicability of the original yeast twohybrid method. New methods have gained firm footing among proteome researchers, including tandem affinity purification (TAP) approach combined with mass spectrometry (MS) (27). Other revolutionary technologies, such as protein microchipbased interaction detection systems, have been improved and are close to being ready for genome-scale studies that have the potential to produce data of unprecedented quality. Yeast Two-Hybrid Methods Introduction Since their introduction by Fields and Song in 1989 (23), yeast two-hybrid-based techniques have enjoyed extensive use in protein-protein interaction research. In light of the presence of an intimidating amount of research and review literature available on the subject, we will only consider the basics of the technique and the studies which, in our opinion, characterize the overall vector of development of the field. In short, the technique is based on the modular nature of transcriptional factors, which usually contain a DNA-binding domain (DBD), responsible for the recognition of and specific binding to the upstream activation sequence (UAS) of the target gene, and a transcriptional activator domain (AD), responsible for the recruitment of transcriptional machinery. In the twohybrid approach, the domains are separated and the activator domain is fused with one protein of interest, sometimes referred to as prey (Y), while the DNA-binding domain is fused to another protein, also referred to as bait (X). An interaction between the two hybrid proteins is expected to result in the assembly of the two domains, initiating transcription of a reporter gene. The reporter may catalyze the production of a quantifiable colored compound or impart colony survival, for example (Fig. 1). 513

3 The existence of mating type α and type a yeast haploids makes this approach highly scalable, since the only thing necessary to bring the prey and the bait in one cell is mating the opposite types carrying the appropriate plasmids. Thus, it is theoretically possible to explore the entire binary interaction space for a given genome. In reality, it is estimated that the number of interactions for a proteome of size n is about 3n-5n if directionality is considered (31). Practically, however, the systematic combination of every prey with every single bait through an individual mating reaction would require n 2 mating reactions, where n is the number of proteins in the system assayed (not excluding homodimer formation), a feat never accomplished before for a large genome. Fig. 1. Summary of the principles behind the yeast two-hybrid assay. Presence of an interaction between proteins of interest X and Y will result in effective transcriptional factor reconstitution and subsequent activation of transcription of a reporter gene This systematic combination of every prey with every bait protein is referred to as the matrix approach. The matrix approach is sensitive and can provide quantitative information via quantification of the reporter (66). Another approach is the library screen, where one bait protein is screened against a whole library of preys in a single mating reaction. The reporter in this approach would be a gene that grants survival. The surviving colonies are thus the ones where the interaction occurs. These colonies are picked off the plate where the mating reaction occurs and characterized through sequencing, with the help of interaction sequence tags. In high-throughput applications, both strategies are succeeded by mating pools of preys with pools bait proteins, and subsequent selection of surviving colonies with an interaction. There are several computational methods designed to select the best pools to use in order to capture the relevant 5n interactions most efficiently (93). The elegance, simplicity, and scalability of the yeast twohybrid system are perhaps paralleled only by the number of reasons the result of a given screen may have a low degree of confidence associated with it. Most of these limitations stem from the enormous diversity, individuality, and complexity of proteins studied within a standard framework, and many limitations can be overcome to a greater or lesser degree (22). First of all, the hybrid protein interaction has to take place within the nucleus, possibly hindering the ability to study proteins with strong targeting sequences, including secreted and membrane proteins. A modification of the technique, the split ubiquitin method, has been used to surmount this problem. Secondly, the process of fusion itself, though it may involve the introduction of a spacer between the transcription factor domain and the protein of interest, may alter the folding of that protein and/or block the site of interaction. Besides the fusion process, non-native proteins expressed in yeast may be inappropriately modified, resulting in altered or non-functional interaction domains. Some proteins may possess interaction domains capable of binding non-specifically to a vast number of other proteins. Some of the interacting proteins may also naturally be spatiotemporally separated, making it difficult to distinguish a biologically relevant interaction from an artifact. Further, it is assumed that all the hybrid proteins within a given study have equal half-life and synthesis rates, which translates into inadequate representation of the interactions of a number of highly regulated proteins. The expressed proteins themselves may alter the survival and metabolism of the transfected cells through interference with important signaling pathways, thus altering reporter production. A solution to this is to fragment proteins into constituent domains (5). It is also possible that some proteins may activate transcription without the need to bind to a partner. In the studies with bacteriophage T7 (5) and Caenohabditis elegans proteins (49), it was estimated that about four percent of studied proteins are capable of this. Opposite may be true: other interactions may in fact be not binary, but mediated by one or more endogenous proteins. All of these factors combined lead to possibly extremely large false positive and false negative rates, as well as problems with reproducibility of results. For example, the results of seminal studies on yeast proteome-scale interaction mapping by independent groups of Ito (38) and Uetz (96) have only a 20% overlap. In another case, for a Drosophila interaction map focused on cell-cycle regulators (89) compared to a global Drosophila map (28), for the proteins that were common to the two datasets, there was a 2% overlap. For the human interactome maps by Rual et al. (80) and Stelzl et al. (90), the overlap is less than one percent. The low level of overlap may be due to differences in techniques, or perhaps not fully probing every single possible interaction, or due to a high false positive rate. One study, for example, has estimated that it is possible that the false positive rates could be as large as 44-91% of yeast twohybrid interaction data sets obtained from the studies by Ito and Uetz (61). Another study estimates up to 50% of all yeast two-hybrid interactions to be spurious (101) There are ways to overcome the problem of false positives, including introducing 514

4 multiple reporters, carrying out the assay in stringent conditions so that only the strongest interactions are seen, and performing a large number of experimental repetitions. These and more strategies are discussed in detail by Fields (22). And yet, given the statistics above, one does not see the proponents of two-hybrid research disposing of their equipment and moving onto a different methodology. Though it was always recognized that additional, more detailed experiments addressing the nature of the newly identified interactions are necessary, it was also recognized that two-hybrid approaches have features unparalleled by other techniques: cost-efficiency, scalability, versatility, in vivo milieu of the interacting protein pairs, and high specificity and sensitivity, which are constantly being improved. On the cost efficiency side, two-hybrid approaches do not require protein purification: in fact, all that is necessary is the cdna sequence for the protein of interest. As an additional bonus, plasmid libraries from the two-hybrid studies can be reused for further, closer investigations of selected interactions. As for sensitivity, even weak interactions may be detected since a single interacting protein pair may result in the expression of a number of reporter protein molecules, which in turn may amplify the signal even further. Most importantly, however, the scalability and versatility of this technique have resulted in its applicability for whole-proteome studies, as well as for studies in fields such as drug discovery (6, 8). Interesting Modifications of the Two-Hybrid System A number of organisms have been adapted for two-hybrid screens. The requirement to perform the screen for interactions for a given organism in the cells of that same organism stems from the fact that some post-translational modifications inherent to the organism in question will not be carried out by yeast. Needless to say, the lack of an appropriate modification may have profound effects on the structure of the resulting protein. This would in turn result in the introduction of false positives and false negatives into the data set. A feature of the yeast that greatly facilitates two-hybrid experiments is the possibility of mating two different haploid strains to obtain one diploid strain containing the genetic material of both initial strains. Manipulation of haploids in organisms other than yeast may be problematic consider mice, for example. Thus, the most commonly used technique to bring the bait and the prey plasmids together in one nonyeast cell is the transient transfection of that cell with two vectors, one encoding the bait and the other the prey. Of course, a normalization plasmid is also used as a reference point for expression levels (19). For example, Ehlert et al. (19) have developed an Arabidopsis thaliana protoplast two-hybrid system (P2H) in order to determine the interactions of group C and S basic leucine zipper transcription factors. The authors have used high-copy plasmids optimized for plant cells to transiently transfect A. thaliana protoplasts (plant cells without cell wall). Auto-activating proteins were also screened for using a third plasmid. This setup has allowed the authors to arrive at quantitative description of protein-protein interaction strength through the determination of reporter product levels. The results of this P2H approach were compared to yeast two-hybrid results for the same baits and preys within the same study. It was found that about one-fifth of interactions found using the P2H approach were not found by Y2H screens, and vice versa. This discrepancy was explained in terms of inability of yeast to properly process the plant proteins. Though there were only a few interactions tested in this study, this finding casts serious shadows of doubt over all other studies using yeast cells to screen non-yeast protein interactions. In another interesting study, a mammalian two-hybrid system was used to monitor protein-protein interactions in living mice (74). In this study, the authors transiently transfected 293T cells with three plasmids: bait, prey, and reporter. This time, though, the expression of the bait and the prey was inducible with TNF-α, which targeted the NF-κB promoter on either the bait or the prey plasmids. The reporter plasmid contained firefly luciferase gene under the control of the standard GAL4 promoter. Thus, the induction of the expression of either interacting protein by addition of TNF-α resulted in the expression of firefly luciferase. The 293T cells were then introduced into nude mice, along with substrates for luciferase activity, and TNF-α. Mice were then monitored with a visible-range CCD camera, as well as a bioluminescent CCD device. Thus, protein-protein interactions were observed in a living mouse. Fig. 2. A variant of the reverse two-hybrid system used in order to determine whether a drug can disrupt the binding of X and Y and thus grant survival to a cell through the inhibition of expression of URA3, which is responsible for conversion of 5-fluoroorotic acid into the toxic 5-fluorouracil Another avenue of development in the two-hybrid screen field is finding out the conditions when two proteins do not interact (86, ). This information may be important in drug discovery to arrive at an agent that disrupts an undesirable interaction. A number of reverse two-hybrid screens were developed for this purpose (Fig. 2). In short, an interaction 515

5 between two proteins in question would lead to the expression of a gene that can metabolize otherwise harmless compounds into toxic ones, causing cell death. A disruption of such an interaction by a given compound would lead to cell survival. A similar approach was used to find small-molecule inhibitors of yeast Sir2 protein, a homologue of human SIRT1, which is responsible for inhibiting p53 DNA repair activities and for activating BCL6, which leads to suppression of differentiation and contributes to tumorigenesis (6). In this approach, however, URA3 gene is introduced to URA3-defficient cells in a growth medium lacking uracil. Interaction of Sir2 with its binding partners causes inhibition of URA3 expression and growth arrest or death, while the inhibition of Sir2 associations allows URA3 to be expressed and growth to be restored. One final modification that deserves attention is the onehybrid approach. Experiments of this type may provide valuable information about the interactions of transcription factors with particular promoter sequences. In this case, the bait is the promoter, and the prey is a suspected transcription factor fused with a transcription activation domain. The interaction between the two drives the expression of a reporter (Fig. 4). A recent study by Lopato and co-workers (52) has employed this approach to identify several plant transcription factors. An alternative strategy is to fuse a constant DBD with a variable AD to characterize transcriptional activation capacity of a set of proteins. Ito and Uetz (94), for instance, have performed such a screen on genome scale for yeast and identified 451 possible transcriptional activators of about 6000 yeast genes. Additional methods as well as their applications are summarized (Table 1). Fig. 3. A variant of the split-ubiquitin method. A. The mutated N-terminal domain of ubiquitin, N UB, cannot interact with C UB. B. If, however, two proteins (X and Y) fused to either domain can interact, this interaction will reconstitute ubiquitin, resulting in its cleavage from the reporter protein by ubiquitin-specific proteases (UBP), leaving the reporter capable of carrying out its tasks Finally, it has been noted that the original yeast two-hybrid method is incapable of characterizing interactions of membrane proteins or proteins that have strong localization sequences to areas of the cell other than the nucleus. The split-ubiquitin method overcomes this difficulty (Fig. 3) (39, 59, 67, 102). In this setup, ubiquitin is split into the N-terminal part and the C-terminal part. The N-terminal part is mutated such that it is incapable of interacting and assembling with the C-terminal part to reconstitute the ubiquitin molecule. The C-terminal part of ubiquitin is fused with a reporter protein. The proteins whose interactions need to be identified are fused to either part of this split ubiquitin molecule. If the interaction between the two proteins in question is successful, the N- and C-terminal parts of ubiquitin are brought close enough so as to reconstitute ubiquitin. UBP then recognizes and cleaves the reconstituted ubiquitin from the reporter protein, which is then quantified. This approach is capable of characterizing interactions between proteins, irrespective of where these proteins are localized to. For example, Fields and co-workers were able to successfully characterize the interaction profiles of 705 membrane proteins, finding 1985 interactions, of which 131 interactions were very high-confidence ones (59). In this particular setup, the reporter was a transcription factor, activating the expression of HIS3, which allows for the survival of the cells on histidine-free medium. A number of similar studies using this system were performed as well. Fig. 4. One-hybrid approach. In this case, one mating partner contains a plasmid with a promoter of interest (UAS) controlling the expression of a reporter, while another partner harbors a plasmid with the Y-AD fusion protein capable of activating transcription upon binding to UAS. Shown is the result of mating two partners Quantitative Data from Two-Hybrid Experiments A modification of two-hybrid assays can be used to arrive at a thermodynamic descriptor of the strength of interactions studied. In one possible experimental setup we can imagine, the interaction of the two hybrid proteins will result in the expression of an enzyme which will produce quantifiable output, such as a color change. The rate of this change, normalized with respect to the number of cells, can be used to directly calculate the amount of the reporter enzyme per cell. Two-hybrid methods employ hybrid proteins that are usually placed on a plasmid under the control of a promoter. The mean transcription rate from this system can be determined using a number of techniques, such as the observation of expression of a fluorescent protein fused with the protein of interest. Taken together with the mean degradation rates, this would provide the mean levels of the bait and prey. Since the amount of chromogen-producing enzyme is directly proportional to the transcription levels of the reporter protein, which are in turn directly proportional to the amount of dimers in the cell, all the information for a quantitative interaction description is present. 516

6 Examples of yeast two-hybrid derived techniques TABLE 1 Method Some Possible Variations Sample Applications References Mammalian, Plant, adapted for many Find binary interactions in organisms Yeast Two-Hybrid 8, 9, 48, etc. organisms. other than yeast. Reverse Systems based on TetR, URA3, CYH2, Studies of drug-induced interaction 95-97, 85 Two-Hybrid others. disruption. Kinase, Protein, Peptide, Small Ligand, Probing binary interactions modulated Three-Hybrid 52, 84, 105 and RNA three-hybrid systems. by proteins and other molecules. One-Hybrid Reverse One-Hybrid Protein-DNA interactions. 17, 91 Cytosolic, Screening proteins not localized to Split-Ubiquitin 41, 63, 70, 99 Membrane-Associated nucleus, such as membrane proteins. Interactions dependent on protein Tethered Catalysis Acetylase, kinase, etc. 31 modifications. SOS Recruitment Reverse Ras System Membrane protein screen. 2 In the simplest, textbook case for the association of proteins A and B, equation (2) holds. In reality the actual association kinetics may be much more complex and may involve a number of steps. Also, activities, not concentrations of proteins should be considered. This example is for demonstration purposes only. Where, (1) Here, k 1 is the proportionality constant for relating the normalized concentration of the observed colored product and K a is the association equilibrium constant for the two complexes A and B forming a dimer AB reversibly. If one prefers, K a can be used to estimate the free energy change of this reaction through the simple formula (3). Here, R is the gas constant and T is the temperature. (2) These two parameters, K a and ΔG can be then used for quantitative comparisons for interaction strength of various proteins. Similar schemes have been employed to quantitatively assess interactions (19, 94). However, such information is not available for proteome-scale experiments due to the use of survival-based library screens. Yeast Two-Hybrid Proteome-Scale Interaction Maps Though the yeast two-hybrid system has been used to elucidate a vast number of protein-protein interactions in separate studies, the most impressive application of two-hybrid screens by far has been in deriving the interaction maps of the entire proteome of a given organism. It can be argued that above all, it is the amenability of this technique to parallelization that has propelled it to widespread use and that has prompted the exploration of various modifications described above. Often, it seems that this ability of high-throughput application of twohybrid methods eclipses any deficiencies these methods may exhibit in terms of data quality. Thus, returning to the discussion of the direction of protein-protein interaction elucidation field, we may say that in order for a technique to be amenable to the challenges of tomorrow, the first and foremost requirement is the possibility to adapt it to high-throughput, proteome-scale studies. For two-hybrid methods, such studies have had humble beginnings: in 1996, the interactome of Escherichia coli bacteriophage T7 was mapped, opening the door to largerscale studies (5). This was one of the first studies of the sort to realize the high-throughput potential of the yeast twohybrid approach. In this study, the construction of prey and bait libraries required T7 genes to be fragmented. Though this was not necessarily intentional from the beginning of the study, the approach taken here may provide an interesting avenue of research. Rather than constructing linkage maps where the smallest unit is a protein, studies of domain-domain interactions can be performed. Of course, there are questions as to what constitutes a particular domain and whether domains in context of other domains within a protein act differently from isolated ones. However, taking the point of view that protein function is a linear combination of the functions of constituent domains, a domain-based approach has the ability to limit the number of binary combinations to be tested, at the same time providing interaction information. In 2000, two groups, the group of Ito (37, 38) (4549 interactions) and the group of Uetz (97) (967 interactions) have successfully carried out a proteome-scale two-hybrid study of the yeast protein-protein interactions, albeit with little overlap, as mentioned earlier. The core data set obtained by Ito may be regarded as a higher-confidence one due to the fact that Ito used three different reporter genes to identify an interaction, compared to one reporter for the Uetz data set. These were succeeded by a C. elegans map (4027 interactions) (49) and, a little later, by a D. melanogaster map (20405 interactions) in 2004 (28, 89). Finally, in early 2005, the interaction map for a portion of human proteome was constructed using the yeast two-hybrid method, revealing 2800 (80) and 3186 (90) 517

7 interactions in two different studies. Perhaps, the future yet holds more detailed and thorough tests based on the two-hybrid system, as well as human interaction maps derived from twohybrid tests in human cell lines. Indeed, these interaction maps produced vast amounts of valuable biological data. However, there is much space for improvement of the mentioned interaction maps and methods. Unfortunately, these large-scale studies only provided qualitative information, that is, the degree to which one may trust an interaction at best. Even that information comes with false positives mixed in. In addition to that, in cases when two studies were carried out on the same organism, datasets did not significantly overlap. Of course, this may be due to the fact that not the entire binary combination space was probed in these studies: recall that the expected number of protein interactions is 3n-5n for a protein pool tested of size n. The study to come closest to these figures was that of Giot et al. on D. melanogaster, at approximately 2n interactions (28). Therefore, it may be concluded that even though the two-hybrid approach is highly adaptable to various high-throughput applications, it is not a technique in which the elucidation of the entire interactome map is a simple, faultless, or completed task. Even so, this technique is so powerful that it may have affected other approaches, such as fluorescence resonance energy transfer techniques based on bait and prey hybrids fused with different-color fluorescent proteins, for example (12). Tandem Affinity Purification Coupled with Mass Spectroscopy Introduction We have seen that the two-hybrid methods attempt to employ deliberate binary combinations to explore the interaction space of a set of proteins. A different strategy to solve this problem is to purify all protein complexes from a living cell, subsequently characterizing their constituent parts. This is the strategy that lies at the heart of TAP-tagging approaches (77). Thus, the task of screening a proteome of size n only requires n experiments to be carried out. This makes the problem of proteomescale interaction mapping somewhat more straightforwardly tractable than with two-hybrid approaches, if the cost of a single experiment is neglected. Originally, the purification of protein complexes relied on antibodies specific for a given protein of interest (POI), for which the interaction partners had to be determined. This allowed for elution of POI in mild conditions which were not expected to disrupt protein complexes. On the other hand, the raising of antibodies was costly given the fact that antibody-based purification often contained a significant level of impurities (27). Also, antibody specificity and selectivity relied on a number of variables, such as the presence of posttranslational modifications or alteration of antibody structure as a result of cross-linking on supporting beads. A more generalized technique capable of higher degrees of purification was necessary. Thus arose numerous methods in which the POI was tagged with an amino acid sequence which would then be used to retain the POI and all the proteins complexed with it on a purification column (21). This solved many problems associated with antibody-based purifications, but unfortunately introduced new ones having to do with the effect of a tag on the proper folding and functioning of a protein. Indeed, for one study, in about one-fifth of the cases when essential proteins were tagged, viable colonies could not be obtained (24). Another study states that for 6466 yeast open reading frames (ORFs), only about one-third could be tagged and purified (25). It was realized by Rigaut and co-workers that the use of two-step purifications (hence the term tandem ), with two tags could yield higher levels of purification than a one-step procedure (76). The overall sequence of operations employed for TAP tag-based complex purification and identification as proposed by Rigaut et al. (77) and elaborated by other studies is shown (Fig. 5). First of all, the nucleotide sequence encoding the TAP tag is inserted at the end of the open reading frame to be investigated. In yeast, where homologous recombination is facile, this insertion may be performed right on the chromosome of interest. In other organisms, such as mammals, plasmids carrying the open reading frame with the tag construct are often employed. The tag itself consists of a calmodulin-binding peptide sequence (CBP), a sequence recognized and cleaved by tobacco etch virus (TEV site), and immunoglobulin binding peptide (IgGBD). A column with immunoglobulin beads would retain the TAP-tagged protein and associated complexed proteins through IgGBD. After washing the column, TEV proteases are applied to cleave the IgGBD from the rest of the protein and leave POI free for elution. At this point, some impurities are expected to be present in the solution containing POI. Passing this solution through a column coated with calmodulin beads in the presence of free Ca 2+ would lead to the capture of POI through the CBP moiety. A washing of this column is then expected to result in a very pure solution of POI and associated proteins. POI is then eluted by chelation of the free Ca 2+ with EGTA, for instance. It must be noted that Rigaut and co-workers also proposed that IgGBD and CBP moieties could be present on two different proteins whose ability to interact would need to be evaluated. The resulting purified protein complex is then separated by any method of choice, such as gel electrophoresis or isoelectric focusing. Individual separated proteins are digested, most often with trypsin, to obtain a number of fragment peptides. These peptides are then separated, usually with some flavor of liquid chromatography, and analyzed on a mass spectrometer. With the help of software, peptide sequences and protein identities are obtained from the mass spectrum. It is very clear that this approach readily lends itself to automation, from PCR-based homologous recombination in yeast to mass spectroscopic analysis, making TAP tag-assisted complex purification an ideal technique for proteome-scale interaction analysis. The feature of the TAP-MS analysis that is not captured by the two-hybrid assays is the regulation 518

8 Fig. 5. TAP tag-assisted protein complex purification and analysis (see the text) of the expression of POI with its native promoter, at its native chromosomal locus, and localized to its appropriate compartment, at least in yeast. Furthermore, it has been suggested that a set of binary interactions obtained for a twohybrid screen may not necessarily be used to arrive at a set of complexes (27). For a conceptual example, the existence of an interaction of protein A with protein B and protein C does not necessarily imply that a complex ABC will form. A binding site on A may be obstructed by B such that C does not bind, for example. Such a scenario would lead to two complexes, AB and AC. On the other hand, due to the fact that multiple washings are performed and conditions change from column to column, transient interactions which may sometimes be as important as strong interactions, may not be as readily detected as they are with the two-hybrid systems. TAP-MS does not provide 519

9 directionality of the interaction other. Therefore, it was suggested that the two methods are perfectly complementary. That said, the error rate for studies based on TAP-MS in yeast (15%) was calculated to be three to five times lower than that of the two-hybrid approaches (27). The Evolution of TAP-MS Approaches As the method is being applied to help address new challenges, more and more problems associated with the method are being solved. It has already been mentioned that the tagging of the protein itself may have a negative impact on the folding and activity of a tagged protein. Further, the tag itself may be buried within a complex, leaving it refractory to purification. It is expected that 5-20% of proteins could potentially have complications due to tagging (70). Classically, proteins have been tagged at the C-terminus. Therefore, an obvious solution to this problem has been to attach the tag at the N-terminus. Puig and co-workers have been able to successfully design such a tag and have observed instances where N-terminal TAPtagging reversed the loss of function observed upon C-terminal tagging (70). A further limitation of the technique is that it was originally optimized for yeast, where growth of cultures in four-liter containers is expedient and cost-effective. Unfortunately, this is not the case for mammalian cell lines, or cell lines that form a monolayer, for example. Burckstummer and co-workers asked the logical question: whether it is possible to make the interaction of the tag with the column stronger and thus improve protein recovery from a limited number of cells (11). To that end, protein A of the original tag was replaced with protein G, and CBP was replaced with streptavidin-binding peptide (SBP). This approach, termed GS-TAP has resulted in an approximately tenfold increase in the yield of tagged protein, and thus, a possibility of a tenfold decrease in starting cell amount without compromising detection. As mentioned earlier, TAP tagging in yeast is simple due to the ease of homologous recombination. Since this is not a luxury inherent to all eukaryotic cells, the researchers often consider two options: viral gene transfer or transient, plasmidassisted transfection. The former option has been used with success in the study that developed the GS-TAP tag, while the latter option may result in complications, including the loss of native gene expression regulation. In some cases, overexpression of the POI is a significant perturbation to the cellular signaling systems. This may result in toxicity or altered cellular states, as manifested by increased levels of heat shock proteins and chaperones (27). When using a system in which POI expression from a plasmid is moderated, however, competition with the endogenous POI may result in decreased recovery of some complex components. A way around this problem has been the selective knockdown of the native protein through RNA interference (29). Another approach, developed by Zhou and co-workers, allows to knock-in a tagged gene in mice in a relatively rapid manner (106). Bacterial artificial chromosome recombineering is used to insert the desired sequences at either end of the gene of interest. The resultant construct was captured into a plasmid through gap repair. Mouse embryonic stem cells were then transfected with this plasmid in its linearized form, and colonies that underwent homologous recombination to knockin the tagged gene were selected and verified. The embryonic stem cells were then used to produce knock-in mice through tetraploid embryo complementation. This procedure cuts the normal knock-in mouse production time in three: while the normal production time is nine months, this procedure can be used to obtain a viable knock-in mouse in three months. Other organisms, such as plants, have been studied using some sort of a modified or specialized TAP-tagging approach. In short, the basic principle in these studies is the replacement of the original three elements of the TAP tag with elements that are expected to perform better for a given system and subsequent evaluation of the resultant system. For instance, Rubio et al. (81) have reported a novel modified TAP tag approach for use in A. thaliana. The TEV protease site has been replaced with a low temperature-active rhinovirus 3C protease site, and the CBP has been replaced with a six-histidine and a nine myc repeats. When tagged with this modified TAP tag, a given subunit of COP9 signalosome complex, CSN3, was shown to be functional through the rescue of a csn3 mutant. Further, the study of genes involved in light signaling resulted in successful expression and purification of 88% of the genes analyzed. Very recently, Gregan and co-workers have reported a protocol for TAP tag-assisted protein purification from human cells (30). This procedure involves the use of a mousederived bacterial artificial chromosome (BAC) containing the construct of POI fused with the desired TAP tag. The BAC in this report carried a mouse gene, Sgo1, responsible for proper separation of sister chromatids. When the endogenous human Sgo1 in HeLa cells was depleted through RNA interference, an increase in precociously split sister chromatids was observed. The integration of the murine BAC carrying murine Sgo1 gene into the genome of HeLa cells resulted in abolition of this condition. This showed that the murine Sgo1, which is resistant to human Sgo1-targeted sirna, was functional in human cells, and therefore that it was functionally close to the human Sgo1. A number of human proteins interacting with Sgo1 were also clearly identified. This protocol, along with a number of other similar studies, and after further optimizations, opens a door to the exciting prospect of studies on the human interactome. Large-Scale Studies Using TAP-MS Though not as numerous as the two-hybrid studies, large-scale studies using TAP-MS have provided a plethora of relatively high-quality interaction information. Proteome-scale studies have primarily focused on yeast, partially due to the ease of genetic manipulation of this eukaryotic organism, though smaller-scale interaction studies have been reported for other organisms as well. 520

10 The first truly large-scale study using the TAP-MS approach was that by Gavin and co-workers in 2002 (24). In this study, 1739 yeast genes were processed, but only 589 genes, or about a third, were successfully purified in the end. Of these 589 proteins, 78% were seen to form a complex with other proteins, while the rest of 22% were purified alone. This was attributed to a number of reasons, among which were: the possibility of complexes being too weak to hold together to the end of purification, or that the particular complexes for these 22% of proteins were not necessary and thus were not formed by the cells at the time they were harvested. Another possible reason was the interference of the tag itself with the proper functioning of the proteins, as mentioned earlier. The interactions observed were reproducible in 70% of cases, meaning that the error rate for that study may have been as high as 30%. Binding partners were termed promiscuous and eliminated if they were observed in more than 20 different purifications. An interesting observation made in this study was the preferential interaction of proteins that were orthologous to human proteins among themselves. This observation supported the possibility of the existence of a proteome that carries out fundamental functions for the eukaryotic cells suggested earlier. In two other much more recent studies, yeast was the subject of attention again. This time, the authors dissected its interactome on a much larger scale. One of the studies, by Gavin and co-workers (25), was a follow-up on the one described above, except for the entire proteome of yeast was analyzed. Only 1993 proteins were successfully purified of the 6466 open reading frames considered. However, a complex contains more than one protein. Therefore, if in a complex A/B/C, proteins B or C are somehow refractory to tagging, a successful purification of protein A still characterizes the complex well. Another feature observed was the asymptotic decline in novel complexes identified as the study progressed, indicating the achievement of saturation. All in all, 257 novel complexes were identified, and the purification of membrane proteins shed some light on an otherwise inaccessible area of membrane protein interactions. The other study was published back-to back with that of Gavin and co-workers, and employed similar methodology (46). Of the 547 complexes identified, 275 were indicated as novel. Interestingly, a number of membrane-associated proteins were successfully purified using the same procedure as that used for the rest of the proteins. This was attributed to the addition of 0.1% Triton X-100 micelles. The authors also identified possible sites of post-translational modifications, to be reported at a later time. These studies, and a number of smaller-scale experiments performed in mammalian and human cells discussed, give hope that TAP tagging-assisted, high-confidence protein complex datasets would one day become available to help solve the mysteries of the human cell. It must be noted here that other methods, such as affinity purification based on a single epitope tag have been already successfully used in large-scale studies of the human interactome (21). Protein Microarray-Based Technologies The rise of TAP tag-based interaction detection as a feasible alternative to pre-existing methods owes much to the sheer cost of protein purification and expression, especially from mammalian samples. Techniques involving TAP tags bypass the costly protein purification step altogether, thus eliminating associated costs. An alternative approach to minimize cost is to develop a technique that minimizes the amount of precious purified protein used. Protein microarray-based technologies have recently emerged at the forefront of protein-protein interaction studies, offering dramatic decreases in assay times, as well as sample sizes required. The development of rapid protein expression and purification protocols and the robotic tools used for DNA microarray production have been highly conducive to protein microarray development. Further, the analysis of thousands of protein interactions on a genomewide scale and quantitative characterization of protein binding are highly simplified. Protein microarrays consist of a protein library arrayed onto a solid support. This immobilized library is probed with one or more target proteins and their interactions are analyzed with various detection systems. Of the three major types of protein microarrays (analytical, functional, and reverse phase), functional microarrays are used to investigate proteinprotein interactions as well as protein DNA, -RNA, and - small molecule interactions (34). The basic steps of interaction analysis using protein microarrays are summarized (Fig. 6). Library Construction and Protein Purification First, proteins are expressed and purified. These steps are the most challenging as some proteins such as trans-membrane and secreted proteins are notoriously difficult to purify and full length cdna collections for some organisms such as humans are still incomplete. Current cloning methods include gap repair-mediated recombination, ligation-independent and Gateway cloning systems (68). Despite relatively inexpensive and efficient expression of proteins in E. coli, homologous protein expression systems (e.g. yeast protein expressed in yeast) are preferred (68) to avoid protein misfolding, lack of post-translational modifications, and to minimize solubility problems. Insect cells and baculovirus tend to be the system of choice for eukaryotic protein expression; proteins expressed in insect cells undergo modifications similar to those of mammalian cells (69). Researchers often have to take advantage of several expression systems and vector constructs to achieve optimal levels of expression (69). Gateway cloning system is useful in this respect as it allows easy and rapid shuttle of ORFs from one expression vector to another. It was used for the cloning of over C. elegans ORFs (75). An alternative method of protein expression based on DNA attachment to a slide with in vitro transcription and translation named NAPPA (nucleic acid programmable protein array) has been described (72). GST-fused proteins attach directly to glutathione-coated slide, bypassing the need for protein purification and storage. However, NAPPA requires cdna cloning, plasmid immobilization and yields low density microarrays (1). A more advanced cell-free expression 521

11 Fig. 6. Basic steps of interaction detection with microarrays (see text for details) approach allows protein production from even unpurified PCR products and consists of spotting of DNA template followed by transcription and translation mix deposition on a microchip with subsequent spot rehydration to initiate separate expression reactors (1). This avoids the need for cdna cloning into expression vectors while still permitting the introduction of protein tags and yields high-density microarrays of up to spots per chip. In addition, a high-throughput method of fluorescence labeling of full-length cdna products has been reported (45). The labeling relies on flourophore-tagged puromycin in conjunction with either a release factor-free expression system or exclusion of stop codons. While translation is stalled, puromycin binds to the nascent polypeptide at its C-terminus, resulting in polypeptide release from the ribosome. In general, the weakness of cell-free protein expression approaches is the lack of post-translational modifications and appropriate folding 522 of some proteins (7). However, it was suggested that addition of chaperonin GroEL can alleviate those problems (45). A viable alternative to using whole-length proteins is domain-centered approach. Peptides representing the functional domain under investigation can be created easily with common peptide synthesizers. Using interaction domains for PPI studies is justified by the fact that protein binding domains are usually restricted to short amino acid sequences. Thus, domains or synthetic peptides representing a protein binding site often show the same biological activity as the protein itself (88). However, it may be possible that the exposition of the entire domain outside of its native protein context may open additional interaction possibilities not normally observed for the protein in question. Nonetheless, this approach saves time and improves the data quality by minimizing protein oxidation, misfolding and proteolysis (63). In addition, expressing domains rather than entire proteins increases the chances of producing and isolating soluble proteins in sufficient quantities

12 in bacteria (92). Up to date, a number of seminal PPI studies took that route (42, 63, 91, 92). Proteins are usually expressed with a common fusion tag. A number of different fusion tags exist, ranging from a short peptide sequence (FLAG, hemagglutinin, myc, Hisx6) to small proteins including green fluorescent protein (GFP), calmodulin-binding protein and glutathione S-transferase (GST). Tags are useful in several respects. First, a set of commonly tagged proteins can be rapidly purified due to good affinity of a tag for the purification column resin. A popular affinity tag, Hisx6, a sequence of six histidines, binds well to Ni columns. Also, fusion tags allow uniform attachment of fused proteins to microarray surfaces. The employment of two tags can facilitate protein attachment to the chip and/or signal normalization to the amount of protein actually present at each spot. Zhu et al. (107) expressed nearly complete yeast proteome (5800 ORFs, ~ 80% of all yeast proteins) library with N-terminal Hisx6-GST fusion tags. Hisx6 bound proteins to Ni-coated slide and the GST tag enabled quantification of each protein with labeled anti-gst antibody. Fusion tags such as maltose-binding protein may enhance the solubility of the fused protein (44) while immunoprecipitaiton and detection of the tagged proteins by Western blotting is possible with epitope tags FLAG, myc and hemagglutinin. Noteworthy, it is this initial step of protein microarray development, namely library construction, that is unequivocally recognized as the bottleneck step. To bypass the outlined complexities associated with this step, researchers can acquire protein chips which are now commercially available from such companies as Invitrogen and Protometrix. Protein Immobilization A number of surfaces and chemistries can be applied for arraying a protein collection on a chip (57, 95). These include glass slides coated with non-specific covalent crosslinkers such as aldehyde, epoxy and bovine serum albumin-n-hydroxy succinimide (BSA-NHS). Aldehyde-presenting surfaces facilitate protein crosslinking via Schiff base formation between protein primary amines and aldehyde carbonyl. The resulting covalent attachment is random and might lead to disruption of the tertiary structure in some proteins (7). Unreacted aldehyde is quenched to avoid false positives arising from covalent attachment of target to the remaining aldehyde groups. BSA can be used for this purpose (54, 92). To maximize the chance of arraying proteins in functional state, non-covalent site-specific attachment is preferred in which the proteins are oriented in a common direction away from the slide such that the majority of their surface is fully accessible and the possibility of structural distortion is minimized. Slides coated with nickel and streptavidin provide such a specificity of protein attachment. Ni-coated slides have been used to array GST-Hisx6-tagged proteins with the efficiency 10 times higher compared to random aldehyde-based arraying as quantified by anti-gst antibody (107). In addition, higher signal-tonoise ratios were observed for Ni compared to aldehyde slides probably due to lower extent of protein denaturation and uniform orientation of the protein binding domains away from the microarray surface (7). Interestingly, the assumption that the binding of Hisx6- labeled proteins to Ni chips is dependent on His tag has been recently challenged (1). The binding of Hisx6-tagged green fluorescence protein (GFP) and untagged GFP to Ni surface was compared by probing immobilized proteins with anti-gfp and anti-penta-his antibodies. Anti-GFP antibodies produced similar signals while anti-penta-his antibody signal was seen only in Hisx6-GFP, suggesting that unspecific binding to Ni slides rather than uniform attachment of Hisx6 tag to Ni may be responsible for protein binding (1). Other limitations of utilizing Ni-His affinity in functional microarrays arise from the fact that Ni-His interaction can be disrupted by washing, prolonged storage or commonly used reagents including EDTA and DTT (57). Slides coated with streptavidin bind biotinylated proteins very tightly and this system was used by Procognia to produce the first commercially available functional microarray in 2002 (9). In addition, three-dimentional surfaces such as epoxy-modified silicone elastomer microwells (108) can be used for microarrays. These prevent cross-contamination and have increased binding capacity due to increased surface area (57). The arraying of expressed proteins onto a slide is carried out using microspotting, non-contact ink jetting, microcontact printing, photo-induced adsorption, proximal or distal electrospray deposition (60) under humid conditions and in high glycerol content buffers to minimize evaporation which can lead to protein denaturation and activity loss (57). Needless to say, proteins attached to a chip have to retain their binding function for successful interaction with their partners. There are a number of reasons that immobilized proteins are functional. Proteins immobilized to a solid support have been used for over forty years (87). Immobilized enzymes retained their activity, demonstrated enhanced stability, which, along with their reusability and the ease of separation from products has lead to their widespread industrial use. Later on, immobilized antibodies were shown to retain their native function, allowing the development of enzyme-linked immunosorbent assay (ELISA) (20). More recently, the retention of native binding of proteins on the microarray has been demonstrated by detecting 33 new along with 6 known calmodulin-binding proteins (107). Merkel et al. (40) used reciprocal interaction test in which a set of previously characterized human protein interacting pairs was examined for retention of binding when each interacting partner was presented either in surface-bound or free form in solution in two separate experiments. Over 90% of all binding partners demonstrated reciprocal binding and similar result was reported for randomly selected yeast proteins (40). Therefore, the majority of immobilized proteins are most likely functional. Once again, as with the majority of techniques examined here, there is a possibility that immobilization on a solid support or the use of fusion tags may alter a protein structure, 523