Proteomics in Drosophila melanogaster Kathryn S. Lilley and Delia R. Griffiths Date received (in revised form): 14th April 2003

Size: px
Start display at page:

Download "Proteomics in Drosophila melanogaster Kathryn S. Lilley and Delia R. Griffiths Date received (in revised form): 14th April 2003"

Transcription

1 Kathryn Lilley is the group leader of the Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge. Delia Griffiths is currently a final year student in the Biochemistry Department at the University of Cambridge Keywords: Proteomics, Drosophila melanogaster, LC-MS/MS, two-dimensional polyacrylamide gel electrophoresis, difference gel electrophoresis, isotope coded affinity tagging Kathryn S. Lilley Cambridge Centre for Proteomics, Department of Biochemistry, University of Cambridge, Building O, Downing Site, Downing Street, Cambridge, CB2 1QW Tel: þ44 (0) Fax: þ44 (0) k.s.lilley@bioc.cam.ac.uk Proteomics in Drosophila melanogaster Kathryn S. Lilley and Delia R. Griffiths Date received (in revised form): 14th April 2003 Abstract To be able to understand cellular mechanisms, we require fully integrated data sets combining information about gene expression, protein expression, post-translational modification states, sub-cellular location and complex formation. Proteomics is a very powerful technique that can be applied to interrogate changes at the protein level. Studying this effectively requires specialised facilities within research institutes. Here, we describe the setting up and operation of such a facility, providing a resource for the Arabidopsis and Drosophila research communities. INTRODUCTION Drosophila melanogaster is one of the most intensively studied eucaryotes to date, being a model system for many cellular processes in higher organisms, including humans. A landmark in the study of this organism was the publication of the whole genome shotgun assembly sequence in March The prediction of open reading frames from this genome and the sequenced genomes of many other organisms has enabled the comprehensive identification of many putative protein sequences. Generally, these proteins can be arranged into three categories: those of known function; those with recognisable motifs and hence a vague idea of function; and those with no sequence similarities to any protein. 2 Many proteins reside in this latter functional vacuum, which could represent as much as 30% of the predicted proteins. 2 Determining protein function is key to understanding cellular mechanism. Studying how protein expression is modulated in response to a given set of circumstances, such as infection, disease, developmental stage, senescence or response to drugs, will facilitate the elucidation of many pathways and thus provide a mechanism for diagnosis and therapy. mrna profiling studies give information about changes in gene expression in response to a particular biological perturbation. The extrapolation that changes in levels of a certain transcript result in corresponding changes in protein expression cannot necessarily be made. 3 Studying cellular mechanisms at the protein level, as well as giving information about relative abundances of proteins, will also give an added dimension to similar data that could be achieved by transcriptomics studies such as sub-cellular location, secretion, protein complex components and posttranslational modifications alone. To understand cellular mechanisms fully, we require fully integrated data sets combining all this information. The concept of the proteome was first discussed some time ago, 4 and the field of proteomics has matured along with the study of the protein content of cells. The term proteomics is now, however, becoming more generally applied to many different disciplines. These include: quantitative proteomics, the study of the relative abundance of protein or protein isoforms in a given sample, whether it is an isolated complex or a whole tissue extract; structural proteomics, the development of experimental approaches to define the primary, secondary and tertiary structure of proteins; and functional proteomics, the development and application of global experimental approaches to assess protein function. Studying the proteome in an effective 106 & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY 2003

2 Proteomics in Drosophila melanogaster The study of the proteome requires specialised equipment and skilled operators Stages in the study of the proteome: separation analysis identification confirmatory experiments manner on the scale dictated by the huge quantity of genomics data available, requires specialised high-throughput techniques, equipment and skilled operators. The very nature of the work lends itself to specialised facilities within research institutes. What follows is a description of one such facility, the Cambridge Centre for Proteomics (CCP). This was set up using funds from the Biotechnology and Biological Sciences Research Council (BBSRC) Investigating Genome Initiative (IGI) and provides proteomics resources for both the Arabidopsis and Drosophila research communities. Here, we discuss the rationale behind funding such a facility, the equipment required, specialised techniques employed and an overview of the types of questions that are being addressed within this facility by the Drosophila community. BBSRC IGF INITIATIVE In light of the vast amounts of genome sequence information that was available, or about to be made available, the BBSRC launched its Investigating Gene Function (IGF) Initiative in The aim was to fund consortia to develop high-throughput genomic techniques for communities working with key organisms, in order to make available methods and resources, whereby the connection between genes and important functions could be discovered through providing access to, for example, microarrays and proteomics. A small number of awards were made to consortia, including two groups focusing on Arabidopsis and Drosophila, respectively. Both of these consortia had demonstrated a broad community-based strategy based on wide access and complementary, coordinated activities. The proteomics elements of the Arabidopsis and Drosophila consortia were combined in one facility, CCP, based in the Biochemistry Department at the University of Cambridge, Cambridge, UK. Setting up CCP The types of equipment and techniques employed in a proteomics facility depend on the types of investigations to be carried out. In the case of CCP, it was envisaged that quantitative proteomics would be an essential part of the facilities available. Generally, whatever the rationale of the investigation or the number of proteins involved, the study of the proteome can be broken down into the following stages of analysis: 1. Separation of proteins, traditionally by one (1D)- or two-dimensional polyacrylamide electrophoresis (2D-PAGE). 2. Analysis of comparative expression to assess the relative abundance of the proteins present. 3. Identification of protein species, generally by digestion of proteins to peptides and further analysis using mass spectrometric methodologies. 4. Confirmatory experiments to confirm its implied function or involvement in the process. The CCP started its operations in October 2000 with the above stages in mind. State-of-the-art equipment was purchased and seven members of staff recruited: one manager, two postdoctoral research workers and four dedicated technicians. Samples can be submitted for full 2-D PAGE separation, image analysis and identification; alternatively, they can be submitted already separated, typically by electrophoretic techniques, for identification by mass spectroscopy (MS). Once a set of proteins has been identified, necessary confirmatory experiments are then generally carried out in collaborating laboratories. TECHNIQUES EMPLOYED Quantitative 2D-PAGE The techniques used within the CCP are largely based around separation of & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY

3 Lilley and Griffiths Quantitative proteomics gives added insight into functionality Silver staining is an unsuitable method for quantitative 2D-PAGE DIGE labelling gives excellent sensitivity and dynamic range proteins by either 1D or 2D gel electrophoresis. Bands on 1D gels, or spots from 2D gels, are then taken and the protein components within the pieces are identified by one of two MS methods. 2D-PAGE has been widely used over the past three decades to resolve and investigate the abundance of several thousand proteins in a single sample. 5 This has enabled identification of the major proteins in a tissue or sub-cellular fraction by MS. In addition, 2D-PAGE has been used to compare relative abundances of proteins in related samples, such as those from altered environments or from mutant and wild type, thus allowing the response of classes of proteins to be determined. To date, the majority of comparative protein profiling studies have produced qualitative data, which have enabled the investigator to determine whether or not a particular protein shows an increase or decrease in expression. This provides no measure of the extent of this expression change and this approach is therefore unsuitable for clustered data analysis that ultimately presents an insight into functionality. Quantitative proteomics allows coexpression patterns to be studied; proteins showing similar expression trends suggest membership of the same functional groups. Quantitation has been hampered by a series of factors. First, silver staining being more sensitive than Coomassie staining methods has been widely used for high sensitivity protein visualisation on 2D-PAGE, but it is unsuitable for quantitative analysis as it has a limited dynamic range (although, more recently, the Sypro family of post-electrophoretic fluorescent stains has offered better dynamic range and sensitivity than silver staining). Secondly, no two gels run identically and corresponding spots between the two gels have to be matched prior to quantification. To overcome some of these issues and to acquire robust quantitative data in situations where collaborators are interested in the changes in protein profiles between, for example, wild type and mutant Drosophila, the CCP has employed the use of difference gel electrophoresis (DIGE). This method was first described in 1997 by Minden s group 6 and involves the preelectrophoretic labelling of samples with one of three spectrally-distinct fluors, Cyanine-2 (Cy2), Cyanine-3 (Cy3) or Cyanine-5 (Cy5). More than one set of samples, each labelled with a different Cy dye, can then be run in one gel and viewed individually by scanning the gel at different wavelengths, thus circumventing problems with spot matching between gels. Image analysis programs can then be used to generate volume ratios for each spot, which essentially describe the intensity of a particular spot in each test sample, and thus enable expression differences to be identified and quantified. Figure 1 shows schema for the labelling of two samples whose protein profiles are to be compared. This methodology has been commercialised by Amersham Biosciences, and all three Cy dyes have been available from this supplier since July Cy dyes are synthesised with a reactive N- hydroxysuccinimide ester group, which forms a covalent bond with the epsilonamino group of lysine side chains. Labelling reactions are set up such that the stoichiometry of protein to dye results in only 1 2% of the total number of lysine residues being labelled. The dyes also carry a net charge of +1, in order for the isoelectric point (pi) of the protein to be maintained with labelling. The three dyes are also mass matched, each labelling event adding approximately 500 Da to the mass of the protein. The benefits of this system, particularly when using internal standardisation, are described in the recent publication of Alban et al. 7 Labelling with Cy dyes is the most sensitive method of protein detection currently available, with detection of 125 pg of a single protein and a linear response in protein concentration over at least five orders of magnitude. 8,9 By comparison, the limit of detection with silver stain is in the region of 1 ng of protein, with a 108 & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY 2003

4 Proteomics in Drosophila melanogaster Figure 1: Schema for labelling of two samples with protein profiles which are to be compared using the 2D DIGE technique Peptide mass fingerprinting is not always a successful approach to identifying spots from 2D-PAGE dynamic range of no more than two orders of magnitude. 10 After electrophoresis, gels must be scanned using an appropriate imaging system. Typically, comparative analysis of images from a gel results in spots with relative intensities, or normalised spot volumes, expressed as ratios. To achieve these ratios, the following processes are applied to raw data images. Analysis of data can be achieved using a variety of commercial packages. Amersham Biosciences supplies DeCyder software, currently the only analysis software specifically designed for use with DIGE and capable of using one of the Cy dyelabelled images as an internal standard, simplifying gel-to-gel analysis. Mass spectrometric techniques Once a set of protein spots from a DIGE or standard 2D-PAGE gel (or alternatively bands from a 1D gel) have been chosen for identification, the next step is excision of the gel pieces and proteolytic digestion. Both of these procedures are easily automated by robotics. The protease most frequently employed is trypsin, which cleaves the peptide bond on the C-terminal side of lysine and arginine residues. In the case of DIGE, the minimal labelling results in only 1 2% of lysine residues being modified and therefore does not significantly reduce the number of available tryptic sites. Mass spectra of the resulting peptides are then automatically acquired and processed by matrix-assisted laser desorption/ionisation-time of flight (MALDI-TOF) mass spectrometry to generate peptide mass maps. Peptide mass fingerprinting (i.e. correlating the masses of protein digest products with protein databases for protein identification) was first developed by several groups in However, there are occasions when fingerprinting alone is not sufficient for protein identification because (a) the full-length protein sequence does not appear in a database and (b) the map represents a mixture of proteins. Mann and Wilm 15 and Eng et al. 16 later developed peptide sequencing techniques using nanospray/liquid chromatography tandem mass spectrometry (LC-MS/MS), which compared database peptide & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY

5 Lilley and Griffiths sequences with MS/MS data. The advantages of these methods are (1) specificity, (2) the fact that mixtures of peptides can be identified and (3) the fact that if no sequence is obtained from a database search, the peptide can be sequenced de novo. The combination of mass mapping and sequencing has allowed protein identification to be both rapid (MALDI-TOF) and specific (MS/MS). The CCP utilises both methodologies. Peptides are first screened by peptide mass fingerprinting. If no successful identification is made, then peptides are subjected to LC-MS/MS. EQUIPMENT REQUIRED To be able to carry out the above techniques, the CCP has invested in the following equipment: 2D gel apparatus (Amersham Biosciences) Master Imager and Typhoon 9400 (Amersham Biosciences) gel imagers with appropriate filters to be able to image Cy2, 3 and 5 Robotics for automated excision of spots from 2D-PAGE gels (Ettan Spot Picker; Amersham Biosciences) and digestion to tryptic fragments (MassPrep Station; Micromass) Two mass spectrometers, a high throughput MALDI-TOF mass spectrometer for peptide mass fingerprinting (TofSpec-2E; Micromass) and a LC-MS/MS mass spectrometer (Qtof; Micromass) for generation of fragmentation data. WHAT TYPE OF QUESTIONS CAN BE ASKED To date, contributors of samples to the BBSRC IGF Consortium proteomics facility have generally fallen into three main categories: 1. Those wishing to identify binding partners or substrates of a particular Drosophila protein. This type of sample is usually prepared by antibody precipitation and subsequent separation of co-precipitating proteins by 1D gel. Bands from the 1D gel track are then excised and digested to tryptic fragments, which are identified by LC-MS/MS as they are likely to be mixtures. 2. Those interested in the identification of as many protein components as possible in a given organ or isolated fraction from Drosophila, such as the Malphigian tubule. 3. Those who require quantitation of protein expression between two or more samples. These include tissuespecific differences, or differences caused as a result of mutation or treatment with an external stimulus such as drug administration. This type of question requires DIGE analysis to produce robust quantitative data. An example of the first approach has recently being published. 17 Figure 2 shows a 2D DIGE gel as an example of the third category. Here, we are interested to see the tissue-specific differences between the gametes of Drosophila aged 0 12 hours after eclosion. The samples were prepared from Drosophila testes in the Cy3 channel and Drosophila ovaries in the Cy5 channel. When performing comparisons of protein profiles using the DIGE technique, a reverse labelling should also be performed (here, ovaries labelled with Cy3 and testes with Cy5) to discount dye-specific labelling issues. POINTS TO BE CONSIDERED WHEN DESIGNING DROSOPHILA PROTEOMICS EXPERIMENTS The main point to consider when designing a proteomics experiment is that somewhere in the order of the top 10% of 110 & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY 2003

6 Proteomics in Drosophila melanogaster 2D-PAGE results in the visualisation of only the top ten per cent most abundant soluble proteins in a cell Integral membrane proteins cannot be resolved by 2D-PAGE Genetic backgrounds and gender should always be matched when performing comparative proteomics Figure 2: These images are from a single ph D DIGE gel, scanned twice using excitation and emission parameters consistent with Cy3 and Cy5. The proteins were isolated from Drosophila melanogaster testes labelled with Cy3 (left hand panel) and ovaries labelled with Cy5 (right hand panel). Approximately 50 testes and 50 ovaries were dissected from adult Drosophila virgins 0 12 hours after eclosion and homogenised in a lysis buffer (10 mm Tris/HCl ph 8.5, 7 M urea, 2 M thiourea, 5 mm magnesium acetate, 4% amidosulphobetaine-14). 50 ìg of protein was extracted from each gamete using this protocol and the total amount was labelled with either Cy3 or Cy5. The images were acquired using a D Master Imager (Amersham Biosciences). The images were exported as 16 bit TIF images for analysis abundant soluble proteins in any sample can be visualised by 2D-PAGE. Membrane proteins and proteins with extremes of pi or molecular weight are not easily resolved by 2D-PAGE. 18 In order to study less abundant proteins, prefractionation, sub-fractionation or some other method of enrichment prior to 2D- PAGE is advised. Membrane proteins can be difficult to study in a comparative manner; the section below describes alternative strategies for looking at proteins with poor solubility in the types of buffers compatible with 2D-PAGE. Drosophila embryos do not lend themselves to quantitative analysis using 2D-PAGE or DIGE because levels of the very abundant yolk proteins, termed the vitellogenins (CG2985, CG2979 and CG11129), are so variable between embryos. This leads to difficulty in normalising the amounts of proteins used in a comparative experiment. Approaches involving immunoprecipitation of complexes can result in antibody chains dominating 1D gels; it is advisable therefore, to use an immobilised antibody, if possible. Finally, genetic backgrounds should always be matched, if one is looking for differences between mutant and wild type, and it is important always to use all males or all females in differential protein expression experiments so that sexspecific variations can be eliminated. If considering preparing samples for DIGE, certain criteria have to be fulfilled. For efficient labelling to take place, a protein concentration of 5 10 mg/ml is required in order to achieve low volume labelling reactions. A typical analytical gel requires 50 ìg of protein per labelling reaction, which equates to approximately two to three heads, one thorax, one to two abdomens, 50 gametes and 50 embryos. Samples can be directly homogenised in an appropriate lysis buffer such as 10 mm Tris, ph 8.5/8 M urea/ 4% CHAPS or 2 % ASB-14/5 mm magnesium acetate. Any part of Drosophila can be used but, in the experience of the CCP, samples from whole adult & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY

7 Lilley and Griffiths Alternative technologies to 2D- PAGE should provide better coverage of the proteome Drosophila benefit from a protein precipitation set using one of the above methods prior to labelling with Cy dyes. THE FUTURE To ensure that the CCP is in a position to match the proteomics demands of the Arabidopsis and Drosophila research communities, it has been necessary to bring on board new technologies that address some of the issues associated with gel-based approaches to separation and quantitation, namely the problems detecting lower abundance protein species and membrane proteins. There are two additional methodologies that are in the process of being setting up within the CCP. The first is 2D LC, a non-gel based separation technique. In this approach, a whole sample is digested and the resulting peptides are separated firstly on the basis of their charge (strong cation exchange) and then by their hydrophobicity and size (reverse phase). Both separations can take place in series within the same hybrid chromatography column. The separated peptides are then sequenced by LC-MS/ MS. The power of this approach as a separation technique, was first demonstrated by Washburn and his coworkers. 19 Digesting a total yeast protein extract, they were able to detect 1,500 proteins, including membrane proteins and proteins of known low abundance. The second methodology is a quantitative technique that is complementary to DIGE. Isotope coded affinity tagging (ICAT) is a non-gel-based method that utilises quantitation at the mass spectrometric level. Currently, this method, as developed by Gygi et al., 20 deals with pair-wise comparisons of samples. Two samples are labelled separately with ICAT reagents through a sulphydryl linkage to cysteine. The reagent also contains a biotin tag attached to the iodacetamide group via a linker containing either nine 12 C atoms (light tag) or nine 13 C atoms (heavy tag). One sample is labelled with the heavy tag, the other with the light tag. The samples are then pooled and subjected to a trypsin digestion. The pooled peptides are run though an avidin affinity column to simplify the mixture. They are then separated by LC and analysed by mass spectroscopy. Nine mass units should separate identical peptides from the two samples and, because both sets of peptide are subjected to identical conditions throughout digestion, separation and analysis, the ratio of peaks should correspond to the relative protein levels in the two samples. The strength of this approach to quantitation is the ability potentially to give information about membrane proteins and lower abundance proteins. Serious drawbacks include its inability to give information on proteins that do not contain cysteine residues, thought to be approximately 10% of the Drosophila proteome, and also the inability to detect post-translational modifications. A 2D-PAGE approach will give some insight into the latter, provided that the modification leads to a change in the overall pi of the protein. Similar approaches to investigating quantitative changes in the proteome using stable isotope labelling and alternative separation techniques are emerging in this quickly changing field. The CCP is in a position to evaluate such techniques with time and offer them as alternative methodologies to the Drosophila research community. Currently, the combination of the ICAT and multidimensional LC-MS/MS methods seems to be the most robust approach. It is clear that no single separation or quantitation technique will provide all the answers necessary to give a global picture of the proteome. Combinations of prefractionation, enrichment techniques and mass spectrometric and gel-based quantitation will lead to more rigorous strategies for examining cellular mechanisms at the protein level. Acknowledgments The authors would like to thank the BBSRC for its support. 112 & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY 2003

8 Proteomics in Drosophila melanogaster References 1. Adams, M. D., Celniker, S. E., Holt, R. A. et al. (2000), The genome sequence of Drosophila melanogaster, Science, Vol. 287, pp Gabor Miklos, G. L. and Malenszka, R. (2001), Protein functions and biological contexts, Proteomics, Vol. 1, pp Gygi, S. P., Rochon, Y., Franza, B. R. and Aebersold, R. (1999), Correlation between protein and mrna abundance in yeast, Mol. Cell. Biol., Vol. 19, pp Wasinger, V. C. et al. (1995), Progress with gene-product mapping of the mollicutes: Mycoplasma genitalium, Electrophoresis, Vol. 16, pp Gorg, A., Obermaier, C., Boguth, G. et al. (2000), The current state of two-dimensional electrophoresis with immobilized ph gradients, Electrophoresis, Vol. 21, pp Unlu, M., Morgan, M. E. and Minden, J. S. (1997), Difference gel electrophoresis: A single gel method for detecting changes in protein extracts. Electrophoresis, Vol. 18, pp Alban, A., David, S. O., Bjorkesten, L. et al. (2003), A novel experimental design for comparative two-dimensional gel analysis: Two-dimensional difference gel electrophoresis incorporating a pooled internal standard, Proteomics, Vol. 3, pp Lilley, K. S., Razzaq, A. and Dupree, P. (2002), Two-dimensional gel electrophoresis: Recent advances in sample preparation, detection and quantitation, Curr. Opin. Chem. Biol., Vol. 6, pp Tonge, R., Shraw, J., Muddleton, B. et al. (2001), Validation and development of fluorescence two-dimensional differential gel electrophoresis proteomics technology, Proteomics, Vol. 1, pp Lopez, M. F., Berggren, K., Chemkalskaya, E. et al. (2000), A comparison of silver stain and SYPRO Ruby protein gel stain with respect to protein detection in two-dimensional gels and identification by peptide mass profiling, Electrophoresis, Vol. 21, pp Henzel, W. J., Billeci, T. M., Stults, J. T. et al. (1993), Identifying proteins from 2- dimensional gels by molecular mass searching of peptide-fragments in protein-sequence databases, Proc. Natl. Acad. Sci. USA, Vol. 90, pp Pappin, D. J. C., Hojrup, P. and Bleasby, A. J. (1993), Rapid identification of proteins by peptide-mass fingerprinting, Curr. Biol., Vol. 3, pp Yates, J. R., Speicher, S., Griffin, P. R. 3rd et al. (1993), Peptide mass maps A highly informative approach to protein identification. Anal. Biochem., Vol. 214, pp Mann, M., Hojrup, P. and Roepstorff, P. (1993), Use of mass-spectrometric molecularweight information to identify proteins in sequence databases, Biol Mass Spectrom, Vol. 22, pp Mann, M. and Wilm, M. (1994), Error tolerant identification of peptides in sequence databases by peptide sequence tags, Anal. Chem., Vol. 66, pp Eng, J. K., McCormack, A. L. and Yates, J. R. (1994), An approach to correlate tandem mass-spectral data of peptides with aminoacid-sequences in a protein database, J. Am. Soc. Mass. Spectrom., Vol. 5, pp Robertson, A. S. et al. (2003), Characterization of the necrotic protein that regulates the toll-mediated immune response in Drosophila, J. Biol. Chem., Vol. 278, pp Zuo, Z., Echan, L., Hembach, P. et al. (2001), Towards global analysis of mammalian proteomes using sample prefractionation prior to narrow ph range two-dimensional gels and using one-dimensional gels for insoluble large proteins, Electrophoresis, Vol. 22, pp Washburn, M. P., Wolters, D. A. and Yates J. R. I. (2001), Large-scale analysis of the yeast proteome by multidimensional protein ientification technology, Nat. Biotechnol., Vol. 19, pp Gygi, S. P., Rist, B., Gerber, S. A. et al. (1999), Quantitative analysis of protein mixtures using isotope coded affinity tags, Nat. Biotechnol., Vol. 17, pp & HENRY STEWART PUBLICATIONS BRIEFINGS IN FUNCTIONAL GENOMICS AND PROTEOMICS. VOL 2. NO JULY