Structural proteomics - a new driving force in drug discovery

Structural proteomics - a new driving force in drug discovery From the identification of first inhibitors to lead optimisation, having access to high-resolution protein structures is a key advantage in the search for new therapeutic agents. Judd Berman, Christian Burks, Raymond Hui and Molly B Schmid Affinium Pharmaceuticals, Inc Assembling protein targets into structurally similar 'families' provides a powerful framework to leverage knowledge from the expanding database of tertiary structures. Substantial international efforts are underway to determine the detailed tertiary structures of proteins on a genomic scale. Effective integration of insights based on the structures of macromolecular targets and their small molecule ligand interactions will enable the rapid optimisation of ligand potency, the generation of alternative template hypotheses and the timely discovery of drugs. Assembling protein targets into structurally similar 'families' provides a powerful framework to leverage knowledge from the expanding database of tertiary structures. Bold beginnings: structural proteomics e availability of genome sequences and technical advances in cloning, protein expression and protein structure determination have spurred many efforts to determine protein tertiary structures on a genome-wide scale. e motivation for these efforts stems in part from the belief that high-resolution structural information provides unique advantages in the efficient discovery of new medicines. Structural proteomics efforts are large-scale programmes, requiring substantial funding and recruitment of inter-disciplinary personnel. ese large programmes provide the inter-disciplinary expertise, as well as automation and liquid-handling capabilities, to undertake high-throughput protein purification and crystallisation for high-resolution structure determination by X-ray crystallography. e diversity of expertise required is substantial molecular biologists for the cloning and expression of proteins, protein biochemists for the purification of proteins, and structural biologists for the crystallisation and crystallography of proteins. Importantly, the interface of these scientists with computational and medicinal chemists is critical for transforming structural data into drug discovery knowledge. In addition, personnel to provide automation, engineering, database and informatics expertise are also critical for these large-scale efforts. Structural proteomics initiatives will contribute substantially to the total number of protein structures that are available in the public domain. Some of the publicly and privately funded structural proteomics initiatives are listed in Table 1. ese initiatives have the goal to solve the 10,000 protein structures thought necessary to cover all of 'protein fold space' by 2010. With such information inhand, homology modeling of most other proteins should be possible. In 2002, over 3,300 structures were added to the Protein Data Bank (PDB, http: //www.rcsb.org/pdb), bringing the total number of structures to 21,659 (August 5 2003 release, the figure increases weekly). Many of these structures are somewhat 'overlapping' (duplications, single amino 44 Innovations in Pharmaceutical Technology

Table 1. Structural genomics websites Structural genomics research center US National Institutes of Health Protein structure initiative Berkeley Structural Genomics Center TB Structural Genomics Consortium Structural Genomics of Pathogenic Protozoa Consortium The Southeast Collaboratory for Structural Genomics Northeast Structural Genomics Consortium The Midwest Center for Structural Genomics New York Structural Genomics Research Consortium The Joint Center for Structural Genomics Center for Eukaryotic Structural Genomics RIKEN Structural Genomics Initiative Ontario Center for Structural Proteomics ESF Functional Genomics Programme The Wellcome Trust Structural Genomics Consortium Structural Proteomics in Europe Website http://www.nigms.nih.gov/psi/ http://www.strgen.org/ http://www.doe-mbi.ucla.edu/tb/ http://depts.washington.edu/sgpp/ http://www.secsg.org/ http://www.nesg.org/ http://www.mcsg.anl.gov/ http://www.nysgrc.org/ http://www.jcsg.org/scripts/prod/home.html http://www.uwstructuralgenomics.org/ http://www.rsgi.riken.go.jp http://www.uhn.ca http://www.functionalgenomics.org.uk http://www.wellcome.ac.uk/en/genome/thegenome/hg03n002.html http://www.spineurope.org/ having access to a high-resolution structure prevents the researcher from working blind. From identification of first inhibitors to lead optimisation, having a protein structure can improve efficiency in the search for new therapeutic agents. acid changes and differing ligand complexes), and the number of vastly differing proteins in the PDB is approximately 3,100 (1). Applying structural proteomics to drug discovery For drug discovery purposes, the absolute number of known protein structures is not important. Rather, obtaining the high resolution structure of 'my favourite target' is critical in the drug discovery setting. In general, project teams focus their energies on a particular target or small group of targets, the structures of which must be solved if the advantages of structure-guided drug discovery are to be reaped. Affinium has evolved a Selective Multiplex approach, in which both standard and proprietary methods are applied in an algorithmic fashion to obtain a large percentage of protein structures that are selected for structure determination. Starting with high quality purified protein in sufficient amounts, first-pass genomic-style efforts yield solved structures for a few per cent of proteins (2). Using current practices, this approach has improved the success rate of producing diffraction quality crystals and high resolution structures of up to 80% of proteins studied. Selective Multiplex utilises sequential and parallel approaches to obtain diffraction quality protein crystals. Results from Affinium and others have shown that some proteins crystallise readily in a number of different conditions, while other proteins require very specific conditions (3, 4). For proteins that do not crystallise readily, a wide variety of methods have evolved over the past 50 years of macromolecular crystallisation efforts. Methods such as changing the protein by truncation, point mutation or separation of protein domains have successfully provided diffraction quality protein crystals for otherwise problematic proteins. Other methods involve changing crystallisation conditions, refolding the protein, co-expression to stabilise interacting proteins, the stabilisation of proteins during and after purification by a variety of methods, and crystallisation with small molecule ligands. e high-throughput methods of structural proteomics efforts can be applied to these troublesome proteins, allowing efficient and parallel assessment of variant protein constructs and crystallisation efforts. Algorithms that provide an optimal approach to efficiently obtain diffraction quality crystals for a large fraction of proteins provide an advantage for drug discovery efforts. Methods that assess the behaviour of specific proteins can guide these algorithms, making them more efficient. Further algorithm development relies on mining the databases of information arising from structural proteomics projects, as well as from the Biological Macromolecule Crystallization Database (http: //www.bmcd.nist.gov) to predict protein behaviour during crystallisation. Why bother - what are the advantages of highresolution protein structures in the drug discovery process? To put it simply, having access to a high-resolution structure prevents the researcher from working blind. From identification of first inhibitors to lead optimisation, having a protein structure can improve efficiency in the search for new therapeutic agents. 46 Innovations in Pharmaceutical Technology

Where to start - using protein structure to begin the discovery of new drugs e widespread availability of protein structures will allow the characteristics of the active sites of potential targets to become criteria for target prioritisation. Active sites vary substantially in their sizes, shapes and amino acid compositions. Structure-guided rules for target prioritisation may provide additional opportunities to further improve the efficiency of drug discovery by steering project teams towards the selection of targets with a high likelihood of yielding inhibitors that have drug-like properties. e availability of high-quality proteins and high-resolution structures (<2.5A resolution) can guide the initial identification of inhibitors using computational and/or experimental approaches. Virtual screening relies on computational methods to identify and select compounds or fragments that can bind to a given target protein. Current high-throughput methods can screen a large virtual compound library; the success of these methods is estimated at 10-50%, based on comparisons with experimental results (5). Improvements in the accuracy and efficiency of virtual screening algorithms should be expected as the number of protein structures on relevant drug targets grows, and the availability of protein structures motivates drug discovery teams to explore virtual screening as an alternative to costly experimental high-throughput screening. In spite of the imperfect success rates, current virtual screening algorithms can provide guidance for designing new compound libraries and pre-filtering existing compound libraries for experimental screens. Algorithms that assess the in silico binding of ligands with protein structures can be used to bias compound libraries to provide a larger percentage of compounds with a likelihood of binding the target protein. Libraries can arise from employing virtual screening, or by designing analogs of known ligands of target proteins and family members. Here the use of target families (for example, kinases, proteases, and so on) in library design has proven to be quite successful. Where to go - finding the path to more efficient drug discovery. Once experimental hit compounds have been identified, the traditional process of lead optimisation is often slow and difficult. In the discovery of new medicines, the hurdles are Figure 1. Concept to structure: process pipeline for determining new target 3D structure. extremely high and a successful agent must possess a wide range of different properties: onset of activity, few and minor undesirable side effects and the physical, chemical and pharmacological properties that allow appropriate formulation and sufficient residence time within the body to achieve sustained efficacy and patient compliance with the given treatment regimen. e knowledge gained by detailed examination of target structures and target-inhibitor co-structures provides a greatly advantaged basis from which to optimise across these requirements in parallel. Determining structure activity relationships (SAR which parts of the ligand are necessary for activity) has historically been a trial-and-error process. Portions of the molecule are removed or modified, and experimentally tested for biological activity. is information is used to build a map of the regions of the compound that are necessary for biological activity and that interact with the target protein. Designing compounds, synthesising them and then testing them can take weeks just to obtain the results for a single compound. When these results show inactivity, it can be inferred that the altered portion of the molecule interacts with the target protein. However, understanding the molecular basis of the interaction hydrophobic, charge, hydrogen bonding and so on requires new rounds of hypothesis creation, synthesis and experimental assays. Ultimately, this work infers an interaction map between the compound and the biological target based on the data that are collected. ousands of person-hours can be spent determining even the initial SAR for a lead series, and for well-studied molecules such as the fluoroquinolones (which inhibit the bacterial DNA gyrase enzyme), hundreds of person-years can be spent determining the detailed SAR for the series. the availability of protein structures motivates drug discovery teams to explore virtual screening as an alternative to costly experimental high-throughput screening. Innovations in Pharmaceutical Technology 49

Critically, tertiary structures provide detailed insights into the precise nature of the molecular forces governing interactions. Lead optimisation is more efficient with protein structures in hand, as regions of the lead that are involved in binding to the target protein are visualised and ideas to alter properties of a compound become readily apparent. Beginning a drug discovery programme with knowledge of the tertiary structure of target proteins, binding sites, and even better, with bound small molecule ligands can short-circuit many of the dead-ends that blind SAR generates. Critically, tertiary structures provide detailed insights into the precise nature of the molecular forces governing interactions. e utility of protein structure in contributing to the lead optimisation process has proven effective in the design of many compounds, notably antiviral agents (HIV protease inhibitors and influenza virus neuraminidase inhibitors). Moreover, a defined understanding of the interactions between small molecules and target proteins provides a roadmap to alternative templates or scaffolds to interact with the target proteins. ese alternatives are particularly important when searching for back-up compounds and appropriate intellectual property coverage. All in the family Many human proteins of therapeutic importance are members of protein families (proteases, kinases and so on). Viewing ligand discovery from a family-centric perspective provides substantial advantages in solving protein structures, as well as efficiencies in assembling downstream methods such as assay development and small molecule template designs (6). Figure 2. Affinium Pharmaceuticals crystal gallery. is approach adds a new and valuable dimension to traditional disease-centric single target approaches. Drug discovery efforts aim to identify ligands with highly selective effects on the therapeutic target but not on other members of the protein family. e availability of high-resolution structures for members of a protein family provides the opportunity to apply computational methods to design selective inhibitors that strongly bind the desired target, and that weakly bind other members of the family. Anti-infectives and adaptive inhibitors. Anti-infectives pose special challenges in the discovery of new drugs. The most efficient design of broad-spectrum anti-infectives requires insight into the binding sites of orthologous proteins. Strong binding between a small molecule inhibitor and each member of the related orthologous proteins must occur, despite differences in the molecular details of their binding sites. In contrast, the design of selective agents for mammalian targets requires the ability to discriminate between close structural variants. Although genomics provides us with a genome-scale view of the similarities among microbial species and viral isolates, subtle differences in the binding pocket s shape and amino acid composition can determine whether a small molecule will productively interact with a protein. When viewed at the resolution of a small molecule interacting with a specific collection of orthologous proteins, the differences between these proteins may become more apparent and more significant for the discovery of broadspectrum inhibitors and compounds targeted against differing isolates. Recently, the design and synthesis of Plasmodium falciparam protease inhibitors has provided an interesting case in point (7). Here, a fundamental design principle sought ligands containing conformationally flexible parts for the variable regions of the target binding site, while establishing strong H-bonding and polar interactions with evolutionarily conserved target structural elements. ese insights, garnered from a structure- 50 Innovations in Pharmaceutical Technology

Figure 3. Schematic of the Selective Multiplex approach. guided drug discovery programme, may provide new opportunities for controlling the selectivity and spectrum of anti-infective agents. In addition, for anti-infective medicines, the agent should provoke minimal resistance, so that the clinician can be assured of the agent s efficacy for current and future patients. e combination of protein structure and genetic methods may make it possible to predict and document potential resistance mutation steps, and to design molecules that will anticipate the evolutionary moves available to the microbe. A design process that effectively anticipates the likelihood of emergence of resistance to ligands would be a fundamental advance in the discovery of new anti-infective therapeutic agents. Integrating structural biology into the drug discovery process In the past, the pace of structural biology has lagged behind the drug discovery project team s pace of new compound discovery. Affinium s Chem-Time process adapted from the company's expertise in structural proteomics provides structural biology and computational chemistry with the speed and success rates required to become integrated partners with medicinal chemistry and biology in the drug discovery process. Improvements in the rates of success, reproducibility and speed of obtaining diffraction-quality protein-ligand co-crystals have provided substantial advantages for a project team to use structural and binding modes to design new compounds in real time. Such capabilities can save substantial time typically spent by project team members in synthesising and testing new compounds that fail to bind the target protein. Scientists can spend a greater fraction of their efforts on the most attractive new compounds. Reaping the best of these integrated processes allows structural biologists to determine the structures of protein-ligand complexes in the 7-10 day medicinal chemistry process cycle-time. Conclusion e widespread availability of protein structures will enable the much-needed ability to break through the lead optimisation bottlenecks that now limit the discovery of new classes of ligands and subsequently drugs. Importantly, new methods of understanding the structures of orthologous proteins will provide valuable insights in designing compounds destined to be medicines against a wide spectrum of molecular targets. Acknowledgements Our thanks to John D Mendlein for insightful comments on the manuscript, and to our many colleagues at Affinium Pharmaceuticals, Inc who continue to innovate and develop methods for structure-guided drug discovery. Judd Berman, PhD, joined Affinium as Senior Vice President, Chemistry, in August 2002. Prior to this, he was Director and Site Head for High- roughput Chemistry at GlaxoSmithKline (Research Triangle Park). His previous roles included Department Head, Glaxo In the past, the pace of structural biology has lagged behind the drug discovery project team s pace of new compound discovery. Innovations in Pharmaceutical Technology 53

The widespread availability of protein structures will enable the much-needed ability to break through the lead optimisation bottlenecks that now limit the discovery of new classes of ligands and subsequently drugs. Wellcome, where he specialised in metabolic diseases, target class research and combinatorial chemistry. Previously, he had held progressively senior roles within Glaxo Research, where he had responsibility for teams dedicated to specific initiatives in metabolic disorders, cancer research and benign prostatic hyperplasia. He was also a Senior Scientist at the Merrell Dow Research Institute, working on cardiovascular diseases. The author and inventor of more than forty articles and patents, Dr Berman received his PhD in organic chemistry from the University of California, San Diego. Christian Burks, PhD, joined Affinium as Chief Scientific Officer in March 2002. Before this, he served as Vice President and Chief Informatics Officer at Exelixis, Inc, where he was responsible for a department of 40 staff focused on genome informatics, computational target discovery and computational drug discovery in the context of comparative genomics and model system genetics. Prior to Exelixis, Dr Burks served in various positions at Los Alamos National Laboratory from 1982 to 1996, and was part of the team that created the global DNA sequence database, GenBank a project that he went on to head. As Group Leader of the Theoretical Biology and Biophysics Group, he led a team of 70 scientists focused on genome informatics, structural biology, mathematical cell biology and molecular biology database resources. The author of more than 50 published articles, reviews and books, Dr Burks received his PhD in molecular biophysics and biochemistry from Yale University. Raymond Hui, PhD, joined Affinium as Vice President, Engineering and Informatics, in September 2000. Previously, he was Senior Robotics Engineer and Robot Product Manager at CRS Robotics Corp, Senior Research Scientist at the Canadian Space Agency, and an independent consultant in the area of lab automation. While at the Canadian Space Agency, he was the Canadian Representative on the European Space Agency Advisory Group on Automation and Robotics. Dr Hui holds a PhD in mechanical engineering from the University of Toronto and an MSc from the Institute of Aerospace Studies, University of Toronto. Molly B Schmid, PhD, joined Affinium as Senior Vice President, Preclinical Programs, in December 2001. Dr Schmid most recently served as Senior Director, Functional Genomics & Bioinformatics, at Genencor International, a global biotechnology company with products for the healthcare, agriculture and industrial chemicals markets. Previously, she was Vice President, Research Alliances, and Director, Discovery Biology, with Microcide Pharmaceuticals, a small molecule drug discovery company (now Essential erapeutics). From 1986-1994, she was Assistant Professor of Molecular Biology at Princeton University, where her lab investigated bacterial chromosome structure and function; her research programme led to the identification of Topoisomerase IV in Salmonella typhimurium, as well as a genetic strategy for identifying new antimicrobial targets. Dr Schmid is a Fellow of the American Academy of Microbiology, and was a Searle/Chicago Community Trust Scholar and a Damon Runyon-Walter Winchell Fellow. She received her PhD in Biology from the University of Utah. References 1. Vitkup D, et al. (2001). Completeness in structural genomics. Nat Struct Biol, 8(6), 559-66. 2. Christendat D, et al. (2000). Structural proteomics of an archaeon. Nat Struct Biol, 7(10), 903-9. 3. Kimber MS, et al. (2003). Data mining crystallization databases: knowledge-based approaches to optimize protein crystal screens. Proteins, 51(4), 562-8. 4. Page R, et al. (2003). Shotgun crystallization strategy for structural genomics: an optimized two-tiered crystallization screen against the Thermotoga maritima proteome. Acta Crystallogr D Biol Crystallogr, 59(Pt 6), 1028-37. 5. Abagyan R and Totrov M (2001). Highthroughput docking for lead generation. Curr Opin Chem Biol, 5(4) 375-82. 6. Drews J (2000). Drug discovery: a historical perspective. Science, 287(5460) 1960-4. 7. Nezami A, et al. (2003). High-affinity inhibition of a family of Plasmodium falciparum proteases by a designed adaptive inhibitor. Biochemistry, 42(28) 8459-64. 54 Innovations in Pharmaceutical Technology