IPRO: An Iterative Computational Protein Library Redesign and Optimization Procedure

Similar documents
Immune-inspired Evolutionary Algorithm for Constrained Optimization

Power Module Installation Notes for TELP Series

Estimating Fish Abundance - Mark Recapture Method

Experimental Study on Shear Resistance of Steel Beams and SC Column Joints Constructed by Simplified Method

Variation of Normal Anisotropy Ratio "r" during Plastic Forming

THE VALUE OF GRID-SUPPORT PHOTOVOLTAICS IN PROVIDING DISTRIBUTION SYSTEM VOLTAGE SUPPORT 2. OBJECTIVE

Production Planning under Hierarchical Workforce Environment

Quick Reference: Amplifier Equations

Reporting Checklist for Nature Neuroscience

Overthrowing the dictator: a game-theoretic approach to revolutions and media

MULTI-OBJECTIVE OPTIMIZATION OF PLANNING ELECTRICITY GENERATION AND CO 2 MITIGATION STRATEGIES INCLUDING ECONOMIC AND FINANCIAL RISK

Effect of Foaming Temperature on Morphology and Compressive Properties of Ethylene propylene diena monomer rubber (EPDM) Foam

Oregon and Washington). The second major study used for this report is "Parks and Recreation Information

Solar in Wetlands. Photo credit: a k e.org/blog/2012/08/15mw solar field near philadelphia.html

The Research of Risk Management in Two Non-Independent IT System

Fuzzy evaluation to parkour social value research based on AHP improved model

10.2% 2.5% FY19 to FY21

Inventory Strategy of Dual-Channel Supply Chain from Manufacturer's Perspective

Investigation of a Dual-Bed Autothermal Reforming of Methane for Hydrogen Production

Concentric Induction Heating for Dismantlable Adhesion Method

Annual Review APPLICATIONS AND STATISTICAL PROPERTIES OF MINIMUM SIGNIFICANT DIFFERENCE-BASED CRITERION TESTING IN A TOXICITY TESTING PROGRAM

Intrinsic Viscosity Measurement for Optimal Therapeutic Formulation

( τα ) = Product of transmittance and absorptance

A DEMONSTRATION GEOTHERMAL PROJECT IN BEIJING - MULTIPLE UTILIZATION OF GEOTHERMAL ENERGY

Water Management of Heat Pump System for Hot Water Supply in a Medium Size Hospital

SIMILARITY SOLUTION ON UNSTEADY AXI-SYMMETRIC VISCOUS BOUNDARY LAYER FLOW

An Energy-Economy Model to Evaluate the Future Energy Demand-Supply System in Indonesia

HEURISTIC BASED APPROACH OF CELL FORMATION CONSIDERING OPERATION SEQUENCE

Lecture 3 Activated sludge and lagoons

Journal of Chemical and Pharmaceutical Research, 2014, 6(6): Research Article

SCHEDULING FOR YARD CRANES BASED ON TWO-STAGE HYBRID DYNAMIC PROGRAMMING

An Evaluation of Negative Selection in an Artificial Immune System for Network Intrusion Detection

Out-of-Merit-Order Dispatch

Enterprise Systems and Revenue Recognition: The Missing Link FINANCIAL EXECUTIVE BENCHMARKING SURVEY. Enterprise Systems Edition

Phd Program in Transportation. Transport Demand Modeling

SANITARY ENGINEERING ASSISTANT, 7866 SANITARY ENGINEERING ASSOCIATE, 7870 SANITARY ENGINEER, 7872

Numerical Analysis of Current Attachment at Thermionic Cathode for Gas Tungsten Arc at Atmospheric Pressure

Re-Designing a Customer Satisfaction and Loyalty Program via Linkage

A Note on Void Ratio of Fibre-Reinforced Soils

ScienceDirect. Measuring solid liquid interfacial energy by grain boundary groove profile method (GBG)

Production Policies of Perishable Product and Raw Materials

AN IDEA BASED ON HONEY BEE SWARM FOR NUMERICAL OPTIMIZATION (TECHNICAL REPORT-TR06, OCTOBER, 2005) Dervis KARABOGA

The impact of velocity on thermal energy storage performance of tube type thermocline tank

GEO-SLOPE International Ltd, Calgary, Alberta, Canada Salt Flow Example

Self-assessment for the SEPA-compliance of infrastructures

Application of Induction Machine in Wind Power Generation System

Gene Targeting: Altering the Genome in Mice Mario Capecchi

4 STRUCTURAL MODELLING

Power Flow Control of Grid-Connected Fuel Cell Distributed Generation Systems

FDSS Training Manual

Impact of Labor Mobility on Comparative Advantage of Manufacturing Industries in China: Empirical Analysis

Theoretical Investigation on Condensing Characteristics of Air and Oil Vapor Mixtures

Stacking Spheres. The dimensions of a tetrahedron. The basic structure of a stack of spheres is a tetrahedron.

machine design, Vol.6(2014) No.3, ISSN pp

Designer Babies. Baby 1

ASSESSMENT OF MATERIAL BEHAVIOUR AND STRUCTURAL INTEGRITY OF ENGINEERING STRUCTURES BASED ON R6 PROCEDURE

ABSTRACT. Professor Ashwani K. Gupta Department of Mechanical Engineering

Available online at ScienceDirect. Procedia Engineering 122 (2015 )

Blind Multi-Channel Estimation of Arterial Input Function in Dynamic Contrast-Enhanced MRI

ADVANCEMENT OF BRIDGE HEALTH MONITORING BASED ON DISTRIBUTED FIBER OPTIC SENSORS

THE EFFECT OF SHEAR STRENGTH NORMALISATION ON THE RESPONSE OF PILES IN LATERALLY SPREADING SOILS

Health Links Assessment

RETRACTED ARTICLE. The Fuzzy Mathematical Evaluation of New Energy Power Generation Performance. Open Access. Baoling Fang * , p 2

PRODUCT CONTROL SECTION

Research Article Integrated Location-Production-Distribution Planning in a Multiproducts Supply Chain Network Design Model

OPTIMIZATION OF FILLER METALS CONSUMPTION IN THE PRODUCTION OF WELDED STEEL STRUCTURES

CARPATHIAN JOURNAL OF FOOD SCIENCE AND TECHNOLOGY

Discovering Transcription Factor Binding Motif Sequences

GeXP Chemistry Protocol

Optimal Pricing under The Consumer Segmentation in Dual Channel Supply Chain

Bay Area, is now in agricultural production (see Figures 1 and 2). Oat hay and its fermented derivative,

MIAMI-DADE COUNTY PRODUCT CONTROL SECTION DEPARTMENT OF REGULATORY AND ECONOMIC RESOURCES (RER)

MIAMI-DADE COUNTY PRODUCT CONTROL SECTION DEPARTMENT OF REGULATORY AND ECONOMIC RESOURCES (RER)

Business Law Curriculum Coordination at Berkeley Law (Boalt Hall) For last several (~10) years

INVESTIGATION OF THERMOSTAT-SET CONTROL AS A NEW DIRECT LOAD CONTROL METHOD

DESIGN OF OPTIMAL WATER DISTRIBUTION SYSTEMS

Newcastle University, Newcastle upon Tyne, United Kingdom

ACCEPTED VERSION. Published version available via DOI:

DECARBURIZATION OF FERROCHROME AND HIGH ALLOY STEELS WITH OPTIMIZED GAS AND SLAG PHASES TOWARDS IMPROVED Cr RETENTION

TC burrs for tough applications especially in foundries, dockyards and steel constructions

Developing an applied algorithm for multi-trip vehicle routing problem with time windows in urban waste collection: A case study

Social Networks. Collective action and network change. Károly Takács a,b,,béla Janky c, Andreas Flache b. abstract

A Research on The Basic Theories of Systematic Risk Transmission in Enterprise Value Chain

Optimization of the Brass Melting

GHG Emissions Reduction by Improving Efficiency of Utilities Transport and Use and Cross-Sectorial Energy Integration

Multi-year Expert Meeting on Transport, Trade Logistics and Trade Facilitation:

The 3-D finite element analysis of press fitting process in railway wheel-set

International Journal of Solids and Structures

Numerical Simulation of Gas Tungsten Arc Welding in Different Gaseous Atmospheres. Tashiro, Shinichi; Tanaka, Manabu; Ushio, Masao

UNIVERSITY OF CINCINNATI

Parametric Investigation of a Downdraft Gasifier Using Numerical Modelling for Different Biomass Materials in Myanmar

Surface Water Hydrology

GenomeLab GeXP. Troubleshooting Guide. A53995AC December 2009

University of Groningen. Collective action and network change Takacs, Karoly; Janky, Bela; Flache, Andreas. Published in: Social Networks

Transcriptome-based distance measures for grouping of germplasm and prediction of hybrid performance in maize

MIAMI-DADE COUNTY PRODUCT CONTROL SECTION DEPARTMENT OF REGULATORY AND ECONOMIC RESOURCES (RER)

Analysis of Electric Stress in High Voltage Cables Containing Voids

Improving Software Effort Estimation Using Neuro-Fuzzy Model with SEER-SEM

Transcription:

Biophyical Jounal Volume 90 June 2006 4167 4180 4167 IPRO: An Iteative Computational Potein Libay Redeign and Optimization Pocedue Manih C. Saaf,* Gegoy L. Mooe, y Nina M. Goodey, z Vania Y. Cao, z Stephen J. Benkovic, z and Cota D. Maana* *Depatment of Chemical Engineeing, The Pennylvania State Univeity, Univeity Pak, Pennylvania 16802; y Xenco Inc., Monovia, Califonia 91016; and z Depatment of Chemity, The Pennylvania State Univeity, Univeity Pak, Pennylvania 16802 ABSTRACT A numbe of computational appoache have been developed to eenginee pomiing chimeic potein one at a time though tageted point mutation. In thi aticle, we intoduce the computational pocedue IPRO (iteative potein edeign and optimization pocedue) fo the edeign of an entie combinatoial potein libay in one tep uing enegy-baed coing function. IPRO elie on identifying mutation in the paental equence, which when popagated downteam in the combinatoial libay, impove the aveage quality of the libay (e.g., tability, binding affinity, pecific activity, etc.). Reidue and otame deign choice ae diven by a globally convegent mixed-intege linea pogamming fomulation. Unlike many of the available computational appoache, the pocedue allow fo backbone movement a well a edocking of the aociated ligand afte a pepecified numbe of deign iteation. IPRO can alo be ued, a a limiting cae, fo the edeign of a ingle o handful of individual equence. The application of IPRO i highlighted though the edeign of a 16-membe libay of Echeichia coli/bacillu ubtili dihydofolate eductae hybid, both individually and though upteam paental equence edeign, fo impoving the aveage binding enegy. Computational eult demontate that it i indeed feaible to impove the oveall libay quality a exemplified by binding enegy coe though tageted mutation in the paental equence. BACKGROUND AND INTRODUCTION The ability to poactively modify potein tuctue and function though a eie of tageted mutation i an open challenge that i cental in many diffeent application. Thee include, among othe, enhanced catalytic activity (1 3) and tability (4,5), ceation of gene witche fo the contol of gene expeion fo ue in gene theapy and metabolic engineeing (6,7), ignal tanduction (8,9), genetic ecombination (10), moto potein function, and egulation of cellula pocee (ee Bihop et al. (11) fo a eview). Thi tak i complicated by the fact that potein ely on complex netwok of ubtle inteaction to enable function (12 14). Theefoe, the effect of a mutation i difficult to ae a pioi equiing the captue of it diect o indiect effect on many neighboing amino acid. A a eult, mot potein engineeing paadigm involve the ynthei and ceening of multiple potein candidate (potein libay) a a way to enhance the odd of identifying potein with the deied functionality level. Thee diected evolution deign paadigm (15 20) typically involve juxtapoition of epeated libay geneation and ceening (Fig. 1). On the othe hand, mot computational appoache fo guiding potein deign ae focued on the downteam edeign of ingle paental equence o pomiing hybid (Fig. 1). Notable exception include the wok of Bogaad and Deem (21) and effot by Saven (22) that decibe computational method fo potein libay deign. A numbe of computational model and technique have been developed (ee Mooe and Maana (23) fo eview) to aid in the in ilico evaluation of potein edeign candidate. Typically thee technique attempt to find ingle o multiple amino acid equence that ae compatible with a given theedimenional tuctue pecific to a tageted function (e.g., enzymatic activity). The potein fold i uually epeented by the Cateian coodinate of it backbone atom, which ae fixed in pace o that the degee of feedom aociated with backbone movement ae neglected. Moe ecent appoache (24 29) allow fo ome backbone movement. Candidate potein deign ae geneated by electing amino acid ide chain (uing atomitic detail) along the backbone deign caffold. Fo implicity, ide chain ae uually only pemitted to aume a dicete et of tatitically pefeed confomation efeed to a otame (ee Dunback (30) fo a eview of cuent otame libaie). Thu, a potein deign conit of both a eidue and a otame aignment fo each amino acid poition. To evaluate how well a poible deign fit a given fold, otame/backbone and otame/ otame inteaction enegie fo all the otame in the otame libay ae tabulated. Thee enegie ae appoximated uing tandad foce field (e.g., CHARMM (31), DREIDING (32), AMBER (33), and GROMOS (34)). Scoing function cutomized fo potein deign (35 37) (ee Godon et al. (38) fo a eview) typically include van de Waal inteaction, hydogen bonding, and electotatic, olvation, along with entopy-baed penalty tem fo flexible ide chain (e.g., aginine) (39 42). Becaue activity level o othe pefomance objective ae vey difficult to compute diectly, altenative uogate of hybid fitne, Submitted Decembe 8, 2005, and accepted fo publication Febuay 2, 2006. Adde epint equet to Cota D. Maana, Tel.: 814-863-9958; Fax: 814-865-7846; E-mail: cota@pu.edu. Ó 2006 by the Biophyical Society 0006-3495/06/06/4167/14 $2.00 doi: 10.1529/biophyj.105.079277

4168 Saaf et al. FIGURE 1 (a) Pomiing hybid equence fom the libay ae elected fo downteam edeign that involve eithe andom o ite-diected mutagenei. (b) Illutation of the upteam paental equence edeign. Note that the mutation in the paental equence popagate downteam into the combinatoial libay effectively deigning the combinatoial libay at once, theeby impoving the oveall quality of the libay. uch a tability o binding affinity, ae employed in mot tudie. The ue of thee indiect objective futhe neceitate the need fo deigning a combinatoial libay athe than a ingle hybid to impove the chance of ucce. Even fo a mall 50-eidue potein, an enomou numbe (i.e., 153 50 10 109 auming a 153-otame libay (43)) of deign i poible. Both tochatic and deteminitic each tategie have been ued to tackle the computational challenge of finding the globally optimum deign within thi vat each pace. Depite thee challenge, a numbe of ucce toie of combinatoial deign fo many diffeent application ha been epoted (42,44 50) in the lat few yea demontating the feaibility of uing computation to guide potein edeign. Biefly, uccee include manyfold impovement in enzyme activity and themotability (50 52), impoved enantioelectivity (53 55), enhanced bioemediation (56 58), and even the deign of genetic cicuit (6,7,10) and vaccine (59 61). It i inceaingly becoming appaent, howeve, that intead of computationally geneating a et of ditinct potein edeign, it i moe pomiing to ue computation to hape the tatitic of an entie combinatoial libay. Thi allow one to ae and then tee diveity towad the mot pomiing egion of equence pace (62). Thi paadigm i moe likely to ucceed compaed to contucting, one at a time, potein deign. On the othe end, contuction of combinatoial libaie baed on mutation and/o ecombination without any guidance fom model/ computation i a daunting tak becaue only an infiniteimally mall faction of the diveity affoded by DNA and potein equence can be examined egadle of the efficiency of the ceening pocedue. In epone to thee challenge, in thi aticle we intoduce a new computational pocedue IPRO (iteative potein edeign and optimization) that allow fo the upteam edeign of paental equence (Fig. 1). The key idea hee i that the eidue change within the paental equence will popagate in the combinatoial libay; effectively intoducing mutation within the hybid equence in the libay (ee Fig. 1). Judiciou election of thee mutation in the paental equence can imultaneouly elieve unfavoable inteaction o clahe (63 65) within the hybid equence and theefoe enhance the oveall quality of the libay in one tep mioing the expeimental potocol deign. Note that even though IPRO i geaed towad paental equence edeign, it can be ued, a a limiting cae, fo the edeign of a ingle o handful of individual equence. The key featue of the IPRO potocol i the cycling between equence deign, ligand edocking, and backbone movement of a et of equence epeentative of the combinatoial libay. The goal of the equence deign hee i to chooe mutation within the paental equence, and theefoe in the hybid equence, that optimize the aveage binding enegy/coe (o altenative uogate of deign objective) of the hybid equence in the libay. The genetic algoithm of Dejalai and Handel (66) and the Monte Calo minimization potocol of Kuhlman and co-woke (41) involve imila equence deign and backbone petubation move. Howeve, they only allow fo the deign of a ingle equence at a time and involve full-cale optimization ove otame fo only a local backbone petubation. On the othe hand, IPRO allow fo the deign of the entie combinatoial libay and involve optimization ove the local petubation egion uing a globally convegent mixed-intege linea pogamming (MILP) fomulation. In addition, IPRO allow fo the edocking of the aociated ligand (e.g., ubtate, cofacto, olvent, etc.) afte a pepecified numbe of deign iteation. In the next ection, we decibe in detail the IPRO pocedue and intoduce the globally convegent mixed-intege linea pogam that dive eidue edeign. We alo dicu the method ued fo geneating and identifying hybid Echeichia coli/baccilu ubtili dihydofolate eductae Biophyical Jounal 90(11) 4167 4180

Potein Libay Redeign and Optimization 4169 (DHFR) and B. ubtili/lactobacillu caei DHFR enzyme containing ingle coove poition and aay fo DHFR activity. Next, we povide an example application of IPRO to highlight the featue and type of output obtained with IPRO. The tudy involve the computational identification of paental edeign that ae likely to impove a ingle coove E. coli/b. ubtili DHFR combinatoial libay compoed of 16 hybid (64). We conclude by dicuing the implication of ou eult and ome of the modeling and algoithmic enhancement that we ae cuently incopoating to futhe impove the IPRO famewok. MATERIALS AND METHODS The IPRO pocedue The IPRO pocedue i compoed of fou pat (ee Fig. 2): a. A et of hybid equence matching the membe of the combinatoial libay, if,;100, i geneated. Fo lage libaie, only a epeentative ample of the diveity of the combinatoial libay i conideed. b. Fo each hybid equence, an initial tuctue i computationally geneated. Thi i a citical tep a the efficacy of the identified edeign depend heavily on the accuacy of the modeled tuctue. c. A et of poition, anging fom a ingle eidue poition to the entie equence length, to be tageted fo edeign i compiled. Note that the lage the numbe of deign poition i, the moe expanive the each pace become leading to highe computational equiement. Typically we only conide between 3 and 20 deign poition that include eidue poition within o in the neighbohood of the active ite. In addition, etiction on the type of allowable eidue edeign (e.g., hydophobic, chaged, etc.) can be impoed fo each edeign poition. d. Next, a et of eidue change i identified in the paental equence, which upon popagation among the combinatoial libay membe, lead to the optimization of the aveage libay coe (e.g., binding enegy o tability (35 37)). Thi optimization tep i caied out globally uing a MILP model within a local petubation window, wheea imulated annealing i ued to accept o eject the eidue edeign aociated with each backbone petubation tep. Geneating a et of equence epeentative of the combinatoial libay A et of hybid equence i elected to exhautively o tatitically epeent the combinatoial libay. Thi tep begin with the equence/tuctual alignment (67) of the paental equence. A tatitical deciption of the combinatoial libay i obtained by conideing the pecific of the combinatoialization potocol. Fo example, in cae of DNA huffling, model uch a eshuffle (68) o thoe developed by Mahehi and Schaffe (69) can be ued to etimate the libay diveity. Altenatively, fo an oligonucleotide ligation-baed potocol uch a GeneReaembly (70), SISDC (71), and degeneate homoduplex ecombination (72), a tatitically unbiaed ample of fagment concatenation i contucted that boadly captue the diveity of the eulting combinatoial libay. In the limiting cae when thee i only a ingle tating equence to be edeigned, IPRO evet back to the taditional ingle potein equence deign pocedue. Note, howeve, that the concept of deigning fo the optimum of the aveage of a libay of equence can alo find utility in thi cae when not a unique but athe an enemble of putative tuctue i available fo the potein to be edeigned. The enemble of modeled tuctue then play the ole of the combinatoial libay when fed to IPRO. By optimizing with epect to the enemble aveage of the putative tuctue, a moe obut edeign tategy i likely to be obtained. Geneation of tating hybid potein tuctue The initial putative tuctue of the hybid potein foming the libay ae obtained by plicing fagment of the paental tuctue conitent with it equence (ee Fig. 3). The coodinate of the fagment tuctue ae taken fom the tuctual alignment of the paental equence. The fold at the junction point() typically involve a kink a a eult of the ad hoc concatenation of the paental tuctue, which become even moe pominent in cae of inetion. Thi i moothened by allowing the backbone aound the junction point to move. The backbone f and c angle of even eidue on eithe ide of the coove poition() ae allowed to vay and thei new poition ae detemined though enegy minimization. In the cuent implementation of IPRO, we ue the CHARMM (73) enegy function and molecula modeling envionment. Note that duing the enegy minimization, the bond length (b), bond angle (x 1, x 2, etc.), and intenal coodinate of the ide chain ae etained to thei oiginal value (b o, x o ) by penalizing any deviation (ee Eq. 1 and 2). The bond tetching i penalized uing Hooke law fomula (Eq. 1) and the ditotion in the bond angle ae penalized uing the hamonic function (Eq. 2). In addition, ditance between cetain key atom can alo be etained uing Eq. 1. Note that becaue le enegy i equied to ditot an angle than to tetch a bond, the foce contant aociated with bond angle ditotion i accodingly malle: DE bond len penalty ¼ 1000ðb ÿ b o Þ 2 kcal=mol Å2 (1) bond DE bond angle penalty ¼ angle 60ðx ÿ x o Þ 2 kcal=mol ad 2 : (2) FIGURE 2 Fou key tep involved in the IPRO pocedue. Detail of each of thee tep ae decibed epaately in the text. Altenative method to paental fagment plicing and elaxation fo modeling the hybid tuctue include technique uch a homology modeling (74,75) and ab initio tuctue pediction method (75,76). Afte Biophyical Jounal 90(11) 4167 4180

4170 Saaf et al. FIGURE 3 Thi figue highlight the key tep fo contucting the initial tuctue of a hybid potein fom a et of paental tuctue with known coove poition(). Thee involve i), backbone plicing, ii), backbone elaxation at the coove poition, and iii), ligand edocking. Thee tep ae epeated fo diffeent coove poition to geneate the combinatoial libay. the tuctue of the hybid potein i modeled, the miing hydogen atom ae added to the hybid potein in accodance with the tandad pocedue ued in CHARMM (31). Finally, the poition of the aociated ligand ae identified uing cytallogaphic data (wheneve available) in conjunction with the ZDOCK docking oftwae (77,78). Notably the ZDOCK oftwae allow fo the ue-pecified ough placement of the docked molecule, thu ignificantly educing the computational expene of the docking calculation. Selecting deign poition The election of the et of poition that will be allowed to mutate (i.e., candidate edeign poition) fo each of the paental equence i lagely dependent on the deign objective and aociated uogate citeion. Typically, deign objective involve one o moe of the following: i), potein tability, ii), binding affinity, iii), pecific activity, and iv), ubtate pecificity. Potein tability i aociated with the ability of the potein to fold coectly unde a et of condition. Geneally, unfavoable inteaction peent within the potein uch a the electotatic epulion, hydogen bond diuption, teic clahe, o a combination of thee tend to pevent thee potein fom folding coectly (63). A numbe of tuctue o equence data baed (SCHEMA (79), SIRCH (65), and clahmap (63)) and functionality baed (FamClah (64)) coing tategie can be ued to quantify the extent of uch unfavoable inteaction in each hybid. Reidue poition that paticipate in a dipopotionate numbe of uch clahing inteaction eve a deign poition. On the othe hand, when binding affinity, pecificity, o pecific activity i the deign objective, eidue within o in the neighbohood of the binding ite ae choen a candidate fo deign. In geneal, the deign poition ae eithe the clahing eidue, binding pocket eidue, o a combination of both. In mot cae, the et of candidate deign poition i ubequently evied (eithe upwad o downwad) by uing infomation, found in ome cae in the liteatue, about the diect o indiect impact of diffeent eidue on the peence, abence, o extent of functionality. Iteative potein optimization tep The optimization pocedue of IPRO involve iteating between equence deign, backbone optimization, and ligand edocking (ee Fig. 4). Thi iteative pocedue involve ix main tep a follow: FIGURE 4 IPRO i an iteative potein edeign oftwae that include the following tep: i), A local egion of the potein (1 5 conecutive eidue a hown in black cicle) i andomly elected fo petubation. The backbone toion angle of thee eidue ae petubed by up to 65. ii), All amino acid otame conitent with thee toion angle ae elected at each poition fom the Dunback and Cohen otame libay (86). Rotamebackbone and otame-otame enegie ae calculated fo all the elected otame uing a uitable enegy function (87). iii), A mixed-intege linea pogamming fomulation i ued to elect the optimal otame at each of thee poition uch that the binding enegy i minimized. iv), The backbone of the potein i elaxed though enegy minimization to allow it to adjut to thee new ide-chain. v), The ligand poition i eadjuted with epect to the modified backbone and ide chain uing the ZDOCK (78) docking oftwae. vi), The binding enegy of the potein-ligand complex i evaluated and the move i accepted o ejected uing the Metopoli citeion. i. Backbone petubation. Diffeent backbone confomation ae ampled by iteatively petubing mall egion of the backbone that ae andomly choen duing each cycle along the length of the equence (N). Fo thi pupoe, a egment (fom one to five contiguou eidue (k to k9) excluding poline) of the potein equence i andomly choen fo petubation. Becaue the pecial tuctue of poline make the polypeptide backbone moe igid, poline, wheneve peent, ae conideed pat of the backbone. The f and c angle of the poition within the petubation window ae petubed by up to 65 fom thei cuent value. The pobability ditibution of the petubation (between ÿ5 and 15 ) follow a Gauian ditibution with a mean of zeo and a tandad deviation of 1.65. Thi enue that malle petubation ae choen moe often (64% chance that the petubation ae between ÿ1.65 and 11.65 ) compaed to lage one that in mot cae ae found to eult in teic clahe. Note that the backbone confomation of both paental and hybid equence ae petubed duing each cycle. Although the petubation poition ae the ame fo evey hybid and paental equence, the petubation magnitude in the backbone angle may vay. Thi allow diffeent paental and hybid equence to aume divee backbone confomation to bette accommodate the diffeing ide chain. ii. Rotame-otame/otame-backbone enegy tabulation. Given the backbone confomation detemined in Step i and the otame and otame combination pemitted at each poition, thi tep involve the calculation of the inteaction enegie of all otame-backbone and otame-otame combination within an inteaction-dependent cutoff ditance (cutoff ditance fo van de Waal ¼ 12 Å, hydogen bond ¼ 3 Å, and olvation ¼ 9 Å). Thi enegy tabulation mut be pefomed epaately fo each hybid and paental tuctue. The computational expene i educed by only updating the pat of the table that ae affected by the cuent petubation. Thee value ae then fed a paamete to the ide-chain/equence optimization model. Biophyical Jounal 90(11) 4167 4180

Potein Libay Redeign and Optimization 4171 iii. Side-chain/equence optimization. Thi tep optimize the amino acid choice and confomation (otame) fo the given backbone tuctue ove a 10 15 eidue window that include the petubation poition and five eidue poition flanking it on eithe ide (ee Fig. 5). Specifically, the deign poition within the petubation egion ae pemitted to change amino acid type, wheea the flanking eidue poition (five eidue on eithe ide) can only change otame but not the eidue type. Thi entail two dicete deciion: 1), identifying the choice of amino acid at any given poition; and 2), electing the otame of the choen amino acid that minimize the elected uogate objective function. To model thee dicete deciion, IPRO daw upon the MILP optimization model fomulation that ue binay vaiable to mathematically epeent thee dicete deciion. Fo claity of peentation, we will fit decibe the MILP fomulation fo the pecial cae, i.e., edeign of a ingle paental equence. Thi deciption will then eve a the tating point fo the moe geneal combinatoial libay deign optimization fomulation. In both cae, the et of allowed ide-chain confomation and amino acid choice at any poition i encoded within et (R i and R ih, epectively), whee i denote the eidue poition and h denote a hybid equence in the combinatoial libay in cae of paental equence edeign. Poition within the petubation window but outide the et of edeign candidate ae eticted to the oiginal amino acid type but can change thei otame tate. All othe eidue poition outide the petubation window ae fixed and cannot change eithe eidue type o otame. A expected, the paental equence edeign poblem i much moe complex than the ingle hybid deign. Thi i becaue a ubtituted eidue need not aume the ame otame confomation in each libay membe. In othe wod, the hybid ae tied togethe at the equence level, but not neceaily at the otame level. Stating with the imple MILP fomulation fo the deign of a ingle hybid equence, we fit outline the et, paamete, and vaiable ued in the model a decibed below: Set k; k9 2f1; 2;...:; Ng ¼et of tating and ending poition fo petubation; k, k9 i; j 2fkÿ5; k ÿ 4;...; k;...; k9;...; k914; k915g ¼ et of poition fo petubation ; 2f1; 2;...:; Rg ¼et of otame R i ¼ et of otame available at poition i: Binay vaiable 1; if otame i elected at poition i X i ¼ 0; othewie: Continuou vaiable 8 1; if otame ; >< ae elected imultaneouly Z ij ¼ at poition i; j; epectively >: 0; othewie: FIGURE 5 Deign poition within the petubation egion (hown in oange) ae pemitted to change amino acid type, wheea the flanking eidue poition (five eidue on eithe ide hown in geen) can only change otame but not the eidue type. Poition outide thi 10 15 eidue window (gay) ae fixed and cannot change eithe otame o eidue type. Paamete E b ¼ ubtate-backbone enegy ¼ otame-backbone enegy of otame at poition i E b i E i ¼ otame-ubtate enegy of otame at poition i E ij ¼ otame-otame enegy of otame ; at poition i; j epectively: Baed on the above defined et, vaiable, and paamete, the ingle equence deign poblem (SSDP) i implemented a the following MILP fomulation, which i a pecial cae of the quadatic aignment poblem (80): i Minimize ÿ X i 3 E b 1 i E i 1 i i j. i ÿ X i 3 E i (3) Z ij 3E 1 ij Eb #E cutoff (4) X i ¼ 1; " i; 2 R i (5) X i ¼ 0 " i; uch that E i i 2 R i (6) Z ij ¼ X i 3X j " i; ; j; ; 2 R i ; 2 R j : (7) The objective function (Eq. 3) hee entail the minimization of the binding coe between the ubtate and the potein a an example. The objective function can be changed depending on the deign equiement. In many cae, (e.g., binding coe) the objective function doe not encode infomation about the inteaction in the entie potein. Theefoe, the minimization tep may lead to mutation o otame change that adveely affect the oveall tability of the potein. Containt Eq. 4 i included to afeguad againt thi by equiing that the total enegy of the potein be below a pepecified cutoff value, E cutoff. The veatility of the adopted MILP modeling deciption enable the incopoation of thi explicit tability equiement that i abent in mot othe famewok popoed fo potein deign/edeign. In the ame Biophyical Jounal 90(11) 4167 4180

4172 Saaf et al. piit, additional enegy-baed equiement can be impoed to enue, fo intance, etention of impotant hydogen bond between a dono and an accepto. Containt Eq. 5 enue that only one otame i elected at any given poition i along the equence. Note that the otame may be that of the oiginal eidue o of othe eidue, depending on whethe o not poition i i a deign poition. Containt Eq. 6 pevent any otame fom being elected at poition i that have ufficiently high enegy value ð.d i Þ that peclude them fom the optimal olution. Thi otame elimination pocedue fomalize the backgound optimization concept popoed by Looge and Hellinga (81) and allow fo eliminating otame that ae guaanteed not to be pat of the optimal olution (ee Looge and Hellinga (81) fo detail). Thi concept allow u to a pioi tim down the each pace and theefoe educe the computational time. Containt Eq. 7 detemine which otame and ae imultaneouly elected at poition i and j, epectively. Thi i encoded with vaiable Z ij, which i equal to one only if both vaiable X i and X j ae equal to one. Thi implie that Z ij i equal to the poduct of the two binay vaiable. Thee nonlinea tem ae then ecat into an equivalent linea fom by umming Z ij ove and, epectively, a hown below: i; j 2fkÿ5; k ÿ 4;...; k;...; k9;...; k914; k915g ¼ et of poition fo petubation in hybid h a 2f1; 2;...; 19g ¼et of amino acid excluding poline ; 2f1; 2;...:; Rg ¼et of otame R ih ¼ et of otame available at poition i in hybid h: Binay vaiable 1; if otame i elected at poition i in hybid h X ih ¼ 0; othewie: 1; if amino acid a i elected at poition i in hybid h Y iah ¼ 0; othewie: Z ij ¼ Z ij ¼ ½X i 3 X j Š¼X i 3 ½X j Š¼X i "i; ; j. i; 2 R i ; 2 R j : (8) ½X i 3 X j Š¼X j 3 ½X i Š¼X j "i; j. i; ; 2 R i ; 2 R j (9) 0 # Z ij # 1 "i; ; j. i; ; 2 R i ; 2 R j : (10) By eplacing containt Eq. 7 with containt Eq. 8 10, the lineaity of the SSDF fomulation i peeved. The complete MILP fomulation fo SSDP include containt Eq. 3 10 excluding containt Eq. 7. Unlike the ingle equence potein deign fomulation SSDP, the hybid libay deign poblem (HLDP) involve the imultaneou optimization of the hybid (h) compiing the combinatoial libay. Becaue the hybid equence in the combinatoial libay ae deived fom the paental equence, thei amino acid compoition mut be eticted to the amino acid type peent in the coeponding paental equence afte the tageted mutation. To thi end, we intoduce paamete ðv i9ap ; aa ih Þ that link the amino acid type a elected at a given poition i9 in paental equence p to thoe peent in the hybid equence at the coeponding poition i. In cae of inetion and deletion, the poition i and i9 in the hybid and paental equence, epectively, may not be the ame. Theefoe, one need to keep tack of both the paental equence p and what poitioni9 in that equence coepondto a given poitioni in a hybid equence h. Specifically, paamete v i9ap i equal to one if amino acid a occu at poition i9 in paental equence p, wheea paamete aa ih toe the amino acid type of otame at poition i in hybid h. In addition, binay vaiable ðy iah Þ i intoduced and et to be equal to one if amino acid a i elected at poition i in hybid equence h. Unlike amino acid type change, which ae popagated thoughout the entie libay, otame choice can diffe between hybid and/o paentalequence. Theenew complexitie give ieto thefollowingadditional et, paamete, and vaiable definition. Set p 2f1; 2;...::Pg ¼et of paental equence h 2f1; 2;...:; Hg ¼et of hybid i9 2f1; 2;...:; N p g¼et of poition in paental equence p k; k9 2f1; 2;...:; N h g¼et of tating and ending poition fo petubation in hybid h; k, k9 Continuou vaiable 8 >< 1; if otame ; ae elected Z ijh ¼ at poition i; j in hybid h >: 0; othewie: Paamete E b h ¼ ubtate-backbone enegy of hybid h E b ih ¼ otame-backbone enegy of otame at poition i in hybid h E ih ¼ otame-ubtate enegy of otame at poition i in hybid h E ijh ¼ otame-otame enegy of otame ; at poition i; j in hybid h aa ih ¼ amino acid type of otame at poition i in hybid h 8 < 1; if amino acid a occu at poition i9 v i9ap ¼ in paental equence p : 0; othewie: By building on the SSDP fomulation uing the new additional et, vaiable, and paamete, the poblem of paental equence edeign and aociated HLDP i modeled a the following MILP fomulation: h i Minimize 1=H h ÿ 1 i i X ih 3 E b ih 1 E ih j. i Z ijh 3E 1 ijh Eb h X ih 3ðE Þ (11) ih ) # H:E cutoff (12) X ih ¼ 1; " i; h; 2 R ih (13) Biophyical Jounal 90(11) 4167 4180

Potein Libay Redeign and Optimization 4173 X ih ¼ 0 " i; ; h uch that E ih ih; 2 R ih (14) Z ijh ¼ X ih 3 X jh " i; ; j; ; h; 2 R ih ; 2 R jh (15) Y iah ¼ 1; "i; h; 2 R ih (16) a Y iah ¼ X ih " ði; a; hþ uch that aa ih ¼ a; 2 R ih (17) Y iah ¼ v i9ap " i; h; k; p uch that poition i coepond to poition i9 in the paental equence p: (18) Slightly modified veion of containt Eq. 11 15 wee alo peent in the SSDP fomulation. Biefly, containt Eq. 11 i the objective function of HLDP involving the minimization of the aveage uogate coe (e.g., binding enegy) of the hybid in the libay. Containt Eq. 12 enue the tability of the hybid equence in the libay by impoing an enegy cutoff. Containt Eq. 13 and 14 enue election of only one otame at any given poition i in any hybid equence h while eliminating any otame with a high enough enegy to peclude them fom the optimal olution. Equation 15 i identical to Eq. 7 in SSDP. Containt Eq. 16 enue that only one amino acid type a i pemitted at any given poition i in a hybid h. Containt Eq. 17 detemine the amino acid type ðy iah Þ of the otame elected at poition i in a hybid h. Finally, Eq. 18 enue that amino acid type a at poition i in the hybid equence h i the ame a the amino acid type at poition i9 in paental equence p. Thi i in accodance with poition i of hybid h being etained fom poition i9 of paental equence p. Equation 15, a in the cae of Eq. 7, involve the poduct of two binay vaiable. It i exactly ecat into a linea fom in the ame manne a hown below. Z ijh ¼ Z ijh ¼ ½X ih 3X jh Š¼X ih 3 ½X jh Š¼X ih "i; ; j. i; h; 2 R ih ; 2 R jh (19) ½X ih 3X jh Š¼X jh 3 ½X ih Š¼X jh "i; j. i; ; h; 2 R ih ; 2 R jh (20) 0 # Z ijh # 1 "i; ; j. i; ; h; 2 R ih ; 2 R jh : (21) Fomulation HLDP i compoed of containt Eq. 11 21 excluding containt Eq. 15. We ue the CPLEX MILP olve acceed though the GAMS modeling envionment to olve both SSPD and HLPD. Thi optimization tep i integated with CHARMM uing a FORTRAN 90 inteface. iv. Backbone elaxation. The optimization tep decibed above may lead to a numbe of new eidue and/o otame fo the hybid tuctue. Thee new ide chain and/o confomation may no longe be optimally inteacting with the peviou backbone. To emedy thi, a backbone elaxation tep i included hee allowing fo dihedal angle to vay, wheea the bond length and angle ae contained to thei oiginal value uing Eq. 1 and 2. Note that each hybid tuctue undegoe a epaate backbone elaxation pocedue to optimize the backbone confomation with epect to it aociated otame. Hee the ide-chain confomation ae fixed while the backbone toion angle ae optimized ove the ame 10 15 eidue window uing the adopted bai-et Newton-Raphon algoithm within CHARMM and the ame enegy function ued fo equence deign (41). A maximum of 4000 tep ae allotted fo backbone elaxation though enegy minimization. v. Ligand edocking. Becaue of the alteation in the backbone and the change of otame/eidue type, the location of the ligand may need to be adjuted with epect to the new tuctue. Theefoe, the ligand ae edocked epaately fo each of the hybid and paental equence uing the ZDOCK docking oftwae (77,78). Thi edocking tep i pefomed only afte a numbe of pepecified deign cycle to cut down on computational equiement. Tight bound ae intoduced into ZDOCK to contain ligand placement in only the elevant pocket o active ite. The ligand edocking tep uing the ZDOCK oftwae i integated with the backbone elaxation and ide-chain optimization tep uing a FORTRAN inteface. vi. Accepting/ejecting move. Afte the edocking tep, the aveage coe of the hybid libay i calculated and the petubation impated in Step i i accepted o ejected on the bai of the diffeence between the final and tating aveage coe accoding to the Metopoli citeion. We have alo expeimented with a tempeatue-loweing chedule a it petain to imulated annealing without finding ignificant diffeence in the eult. The pocedue i epeated fo 200 10,000 iteation depending on the complexity and ize of the deign tudy. Upon completion, IPRO povide a et of low enegy olution and aociated mutation to be pefomed within the paental equence whoe popagation to the hybid libay impove the aveage coe of the libay. Due to the decompoable tuctue of the paental equence edeign poblem, mot of the computation can be done in paallel with little infomation co-flow. Specifically, hybid tuctue efinement, backbone elaxation, backbone petubation, calculation of otame-backbone and otameotame enegie, and ligand docking fo each hybid ae pefomed on epaate poceo. Afte the otame-backbone and otame-otame enegy calculation fo each hybid, the infomation i fed a paamete to the mate poceo, which ubequently olve the MILP model (i.e., SSPD o HLDP) to detemine the optimal eidue at each of the deign poition in the paental equence(). The choice of the eidue/otame detemined uing the MILP fo each of the hybid i then paed to the lave poceo fo futhe backbone elaxation and ligand docking. All computational tudie lited in thi aticle wee pefomed on a Linux PC clute uing a 3.06GHz Xeon CPU/4GB RAM. Hybid contuction and functional ceening Contuction of DHFR hybid libaie Peviouly contucted plamid paze-be and paze-eb (64) wee ued in thi wok to contuct plamid fo the geneation of the L. caei-b. ubtili DHFR libaie in both oientation (paze-lb and paze-bl). Fit, the E. coli DHFR fagment containing eidue 1 120 and 31 159 wee emoved fom paze-eb and paze-be plamid by NdeI/BamHI and PtI/SpeI etiction diget, epectively. The L. caei DHFR fagment 1 124 and 30 162 wee obtained by NdeI/BamHI and PtI/SpeI etiction diget of paze-el and paze-le plamid (gift fom Alex R. Howill, Univeity of Iowa). The L. caei DHFR fagment 1 124 wa then ineted into the cut paze-eb by ligation, taking advantage of the complementay NdeI and BamHI ite. Analogouly, the L. caei DHFR fagment containing eidue 30 162 wa ineted into the cut paze-be by ligation. Plamid paze-lb (L. caei eidue 1 124-B. ubtili eidue 31 159) and paze-bl (B. ubtili eidue 1 121-L. caei eidue 30 162) wee confimed by equencing at the Nucleic Acid Facility of The Pennylvania State Univeity. To contuct the hybid libaie, plamid paze-lb and paze-bl wee lineaized at a unique SalI ite between the L. caei and B. ubtili DHFR fagment. Incemental tuncation fo the ceation of hybid enzyme (ITCHY) method wa ued to contuct libaie of hybid L. caei-b. ubtili DHFR in both oientation (82). Libaie wee tanfomed and toed in E. coli tain DH5a. Selection and detemination of pecific activitie of active DHFR hybid The plamid containing the hybid DHFR gene wee puified and electopoated into modified E. coli tain MH829, which ha a deletion of Biophyical Jounal 90(11) 4167 4180

4174 Saaf et al. DHFR (fola) gene. Tanfomed cell wee wahed twice in minimal media A and plated on minimal media A aga plate upplemented with 0.5% glyceol, 0.6 mm aginine, 50 mg/ml thymidine, 25 mg/ml kanamycin, 100 mg/ml ampicillin, 1 mm MgSO 4, and 100 mm iopopyl b-d-thiogalactoe. The plate wee allowed to gow fo 5 day at oom tempeatue and colonie wee picked and eteaked onto the ame media and gown at 30 C fo 24 h. The electant wee equenced at the Nucleic Acid Facility of The Pennylvania State Univeity to identify coove poition and confim the abence of inetion, deletion, o mutation. The pecific activitie of hybid DHFR wee detemined in cell-fee lyate a peviouly decibed (64). Biefly, the plamid paze wa ued to expe all DHFR hybid. To inceae expeion level, laci gene wa detoyed on all plamid by EcoRV and SfoI etiction diget. Plamid wee tanfomed into the tain MH829, and 50 ml cultue wee gown at 30 C in Luia Both with 100 mg/ml ampicillin, 50 mg/ml thymidine, and 0.5 mm iopopyl b-d-thiogalactoide. Cultue wee gown to OD 600 of 1.0, centifuged, wahed with 25 ml of buffe (20 mm Ti, ph 7.7, 2 mm DTT), and eupended in 1 ml of buffe. The cell wee boken by onication and inoluble mateial wa emoved by centifugation. The lyate wee aayed at 25 C in MTAN buffe at ph 7.0 uing the Cay 100 Bio UV-Vi pectophotomete by Vaian (Palo Alto, CA). Cell-fee lyate wa peincubated with 100 mm cofacto NADPH and the eaction wa initiated by adding ubtate dihydofolate to 100 mm. Reaction poge wa monitoed by following abobance at 340 nm (NADPH abobance maximum) (De¼ 13,200 mm ÿ1 cm ÿ1 ). APPLICATION EXAMPLE DHFR libay chaacteization and analyi The contuction, identification, and chaacteization of the above dicued ixteen E. coli/b. ubtili DHFR hybid wee decibed peviouly (64). E. coli and B. ubtili DHFR hae a 28% equence identity at the potein level. Below i dicued the iolation and chaacteization of 10 B. ubtili/l. caei DHFR hybid ued hee to validate the computationally deived oveall binding coe. The B. ubtili/l. caei DHFR hybid libay wa contucted fom the B. ubtili/l. caei DHFR pai haing a 36% equence identity at the potein level. A peviouly developed (64) genetic election utilizing an E. coli tain containing a complete deletion of chomoomal DHFR (fola) wa ued to elect hybid enzyme with DHFR activity fom the libay. Fo thi eaon, it wa neceay to ue inactive DHFR fagment to make the ITCHY libaie, which limited the coove window to eidue 31 121. The combined libay put though the election included ;2.1 3 10 6 membe. Thee ae (90 3 3) 2 o 72,900 poible hybid potein. To detemine the numbe of libay membe that mut be examined fo complete libay coveage, the numbe of hypothetical membe i typically multiplied by 10. Since we examined.729,000 membe, complete libay coveage can be aumed. Fom the DHFR enzyme that paed the election, 40 hybid wee andomly choen and equenced. Only two contained inetion; the emaining 38 wee fee of inetion, deletion, and mutation. Ten out of 38 hybid wee choen fo thi tudy baed on thei even ditibution of coove poition ove the 90 amino acid coove poition window (ee Table 1). The coove poition in TABLE 1 Coove poition fo the E. coli/b. ubtili and B. ubtili/l. caei DHFR hybid and thei pecific activitie (mmol/min/mg) E. coli/b. ubtili B. ubtili/l. caei Coove poition Specific activity Coove poition Specific activity 0 20.22 0 0.197 6 0.114 32 2.17 32 0.915 6 0.086 35 0.39 40 0.067 6 0.008 46 0.17 53 0.001 6 0.000 49 0.12 62 0.025 6 0.004 53 0.12 85 0.001 6 0.000 55 0.12 103 0.003 6 0.001 62 0.09 114 0.035 6 0.16 73 0.01 123 0.063 6 0.005 79 0.15 160 6.622 6 0.157 81 0.06 96 0.10 100 0.36 108 0.70 122 0.84 159 1.43 The eo in the pecific activity fo the B. ubtili/l. caei hybid ae given at 95% confidence inteval. The coove poition fo the E. coli/b. ubtili and B. ubtili/l. caei hybid ae defined a the lat eide poition (in alignment) of the E. coli and B. ubtili DHFR equence, epectively. the B. ubtili/l. caei hybid i defined a the lat eidue (by alignment poition) of B. ubtili DHFR. It i clea fom the numbe of active DHFR hybid identified that 36% equence identity on the amino acid level between two DHFR potein can be ufficient fo the geneation of active hybid. Specific activitie (mmol/min/mg) of the B. ubtili/l. caei hybid enzyme wee meaued to compae thee value to the oveall binding coe obtained uing the SSDP fomulation. Note that the lited pecific activitie ae cude lyate activitie. Thi mean that total lyate of cell expeing the hybid of inteet, not the puified hybid, ae ued in the aay. Specific activity i the amount of poduct fomed by an enzyme in a given amount of time pe milligam of enzyme. Expeimentally, pecific activity hee i the amount of cofacto NADPH conveted to NADP 1 by a DHFR hybid in 1 min pe milligam of total potein in the cude lyate. The pecific activitie (mmol/min/mg) ae quantified by meauing the deceae in abobance at 340 nm (NADPH abobance maximum) duing the enzymatic eaction to detemine how many mmole of NADPH ae conveted to NADP 1 pe minute uing the extinction coefficient of NADPH (13,200 mm ÿ1 cm ÿ1 ). The eulting value i then divided by the milligam of total potein in the cude lyate, which i detemined by the coloimetic Badfod aay. The B. ubtili/l. caei hybid with the highet activitie wee found to have coove poition cloe to the N- o C-teminu. Thee hybid potein conit motly of one DHFR (i.e., B. ubtili o L. caei) and have only a hot amino acid equence eplaced by the equence of the othe Biophyical Jounal 90(11) 4167 4180

Potein Libay Redeign and Optimization 4175 DHFR at eithe the N- o the C-teminu. Conequently, thee hybid have a elatively mall numbe of new inteaction ince a lage pecentage of the equence i etained fom one pecie. The hybid with the lowet activitie have thei coove poition in the cental egion of the coove poition window, between amino acid 53 and 103. Thi egion belong to the adenoine binding ubdomain of DHFR, which i involved in binding of the cofacto NADPH (83). Thee hybid contain long equence fagment fom both B. ubtili and L. caei DHFR and ae thu expected to have many new inteaction not peent in the wild-type potein. Simila eult wee een fo the E. coli/b. ubtili DHFR hybid; the lowet pecific activitie wee found fo the hybid with coove poition in the cental egion coniting of amino acid 55 96. IPRO analyi of DHFR libaie In thi ection, we povide a tep-by-tep application of the IPRO pocedue, tating with the SSDP fomulation, to tet whethe it i feaible to impove the computationally deived oveall binding coe of two epaate DHFR hybid ytem: i), 16 E. coli/b. ubtili, and ii), 10 B. ubtili/l. caei hybid DHFR equence. Thee eult ae contated againt the expeimentally detemined pecific activity value to check whethe the tend obeved fo the pecific activity can be explained uing the computed binding coe. Fit we apply the SSDP fomulation to individually deign each one of the 16 E. coli/b. ubtili DHFR hybid conideing two diffeent et of deign poition followed by the HLDP fomulation, which i ued to optimize the aveage binding enegy of the 16 E. coli/b. ubtili DHFR hybid. Stating with Step a, IPRO fit geneate the equence fo the 16 E. coli/b. ubtili and 10 B. ubtili/l. caei DHFR hybid coeponding to the coove poition hown in Table 1. Thi imply involve plicing of the paental equence fagment conitent with the given coove poition. Putative tuctue fo two diffeent et of DHFR hybid ae geneated a decibed in Step b. The alignment of the paental tuctue equied fo thi tep i pefomed uing the combinatoial extenion method (84). An appoximate tuctue of each of the hybid equence i contucted by concatenating the coeponding paental tuctue fagment obtained fom the aligned tuctue. The tuctue of the E. coli (PDB code: 1RX2) and L. caei (PDB code: 1AO8) paental equence wee obtained fom the Potein Data Bank (85), while the tuctue of the B. ubtili DHFR wa povided to u by D. Gegoy A. Petko at Bandei Univeity (peonal communication). Each one of thee putative tuctue wa efined by allowing the backbone aound the junction point (14-eidue window) to elax though enegy minimization, and ubequently the hydogen atom wee added a decibed in Step b. Although no eidue change ae made, SSDP i ued to dive ide-chain movement (otame change and/o backbone elaxation) fo bet binding. The optimized binding coe (kcal/mol) fo thee hybid equence wee then contated againt the expeimentally meaued pecific activitie (mmol/min/mg). The pecific activity value of the B. ubtili/l. caei and E. coli/b. ubtili hybid (64) ae hown in Table 1. The calculated binding coe in each cae i found to be linealy coelated to the natual log of the pecific activitie uggeting that binding enegy i a good pedicto of pecific activity (ee Fig. 6, a and b, coeponding to E. coli/b. ubtili and B. ubtili/l. caei DHFR hybid equence epectively). Specifically, 72.7% of the vaiance in the pecific activity tend fo the E. coli/b. ubtili DHFR hybid and 75.4% fo the B. ubtili/l. caei DHFR hybid i explained by the log-linea elation with the binding coe. The next tep involve the edeign of each one of the ixteen E. coli/b. ubtili DHFR hybid equence individually uing SSDP fomulation to enhance thei computationally deived binding enegie. Two epaate et of deign poition wee conideed, a equied in Step c, fo mutation: i), poition that wee identified to be involved in clahe (63,64), and ii), all eidue within the binding pocket (i.e., within 4 Å ditance fom the ubtate) that ae likely to contibute diectly to the binding coe. Clahing poition fo each one of the hybid tuctue wa detemined uing the clahmap (63) and FamClah (64) pocedue. Poition that wee fequently involved in clahe wee identified and FIGURE 6 Plot of the natual log of the pecific activitie againt the binding coe fo two diffeent type of DHFR hybid (a) E. coli/b. ubtili and (b) B. ubtili/l. caei. Along each point i hown the coeponding hybid equence with it coove poition. Biophyical Jounal 90(11) 4167 4180

4176 Saaf et al. conideed fo edeign. The ame deign poition wee conideed fo all the hybid equence to identify any ignificant patten in the eidue ubtitution. On aveage, 20 deign poition wee conideed in eithe cae, and each un wa ubmitted to an individual poceo fo a total of 1000 iteation fo binding coe minimization uing SSDP. Inteetingly, out of 20 poition conideed fo edeign, we found that only 7 poition (eult hown in Table 2) ae mutated away fom the wild-type. The maximum numbe of mutation intoduced in any one hybid equence did not exceed fou mutation (ee Table 2). Notably, a numbe of mutation ae pevalent in all deign. Alo many eidue that ae within o cloe to the binding pocket peit at the wildtype even though they ae teated a deign candidate. TABLE 2 Individual edeign of the (a) clahing poition and (b) binding ite eidue fo the E. coli/b. ubtili hybid DHFR equence (a) 30 62 63 96 97 98 103 B. ub Y V T G A Q L E. coli W L S G G R F 0 F 33 F K 36 F Q 47 F 50 F K 54 F 56 F 62 F/A 73 F T K M 79 F 81 F 96 F 101 H K 109 H/F K 123 F Q L 160 F A K L Redeigning the clahing poition (a total of 17 poition) povide appoximately the ame impovement (ÿ6.9 kcal/ mol) in the aveage binding coe a compaed to deigning only the binding pocket eidue (ÿ6.2 kcal/mol) including 22 eidue. Thi mean that at leat in thi tudy, elieving clahe can indiectly impove binding at the ame extent a active ite eidue edeign. The binding coe of the hybid equence befoe and afte deign fo the two et of deign poition ae compaed in Fig. 7, a and b, epectively. Notably, when only clahing eidue poition ae conideed fo edeign, mot of the impovement in the binding coe of the hybid equence (aveage coe, ÿ149.0 kcal/ mol) i found to be the eult of a ingle mutation in the B. ubtili DHFR equence fagment (S64R) and two mutation in the E. coli equence fagment (S64R and T68F). On the othe hand, when only binding pocket eidue ae conideed fo edeign, a ingle mutation in the E. coli (W30F) and a ingle mutation in the B. ubtili (Y30F) DHFR equence fagment appea to contibute mot to the impovement in the binding coe (aveage coe, ÿ148.3 kcal/mol). Not upiingly, thee mutation ae found to be conitently occuing in the deign of mot of the hybid equence (ee Table 2). Many altenate mutation leading to the ame binding coe impovement ae found paticulaly fo deign poition 65, 67, and 68 (ee pat b in Table 2). (b) 57 61 63 64 65 67 68 B. ubt R V S S A D S E. coli R I T S Q G T 0 T R R/Q R/D R/F 33 T R Q R E 36 R R/Q R/D R/Y 47 T R Q E Q 50 I K Q K R 54 R Q 56 N R K T Q 62 R H K D 73 K A R R Q 79 A R H F 81 A R R F 96 T R R/Q F 101 R R F 109 N R R F 123 R T Y 160 A R R F The oiginal B. ubtili and E. coli eidue ae hown in bold, and undelined, epectively. Poition with conitent mutation ae 30, 64, and 68 (fo coove afte poition 63). Note that poition 0 coepond to the B. ubtili paental equence, wheea 160 coepond to E. coli equence. FIGURE 7 Binding coe pofile befoe and afte edeign of the E. coli/b. ubtili DHFR hybid uing the SSDP famewok when (a) only clahing eidue poition ae conideed and (b) only binding pocket eidue ae conideed fo edeign. Biophyical Jounal 90(11) 4167 4180

Potein Libay Redeign and Optimization 4177 The eult highlighted above decibe the application of the SSDP optimization fomulation, which enable the oneby-one optimization of each one of the 14 hybid. Note that mutation pedicted fo the ame poition can vay fo diffeent hybid. Next, we decibe the application of HLDP, which unlike the SSDP fomulation enfoce the ame et of mutation fo all hybid. The objective hee i to contat the oveall eult obtained fom the two optimization fomulation. Both the clahing poition and eidue within the binding pocket ae conideed imultaneouly. The HLDP fomulation wa un on a 16-node Linux PC clute with 3.06 GHz Xeon CPU/4 GB RAM, with one node aigned to each equence (14 hybid equence and 2 paental equence). One of thee node eved a the mate node that olved the HLDP famewok evey iteation. Thi pocedue wa un fo a total of 48 h that pemitted on aveage 315 deign iteation. The enegy pofile of the libay befoe and afte the edeign of the paental equence i hown in Fig. 8. Note that even though we obtained an impovement in the binding coe (ee Table 3) fo all hybid equence, thi may not alway be the cae a the impovement in the aveage binding coe of the libay may be in ome cae due to a handful of hybid equence. We find that the mot pevalent mutation baed on the SSDP eult ae again peent. HLDP identified mutation at only thee poition in the paental equence (poition 30, 64, and 68) that yielded an aveage binding coe of ÿ149.0 kcal/mol. Notably, thi i vey cloe to the aveage binding coe of the libay whee each equence i individually edeigned. Wheea the upteam paental edeign uing HLDP equie in total only five mutation in the paental equence, the downteam hybid equence deign involve up to fou diffeent mutation fo each hybid equence. Thi example, theefoe, demontate that upteam paental equence edeign can indeed optimize all eulting hybid in one tep in contat to one-by-one edeign of the hybid equence. Examination of the eulting tuctue of the edeigned equence eveal that mot of the impovement in the TABLE 3 Redeign of paental E. coli and B. ubtili DHFR 30 64 68 B. ub Y S S E. coli W S T 0 F R 33 F R 36 F R 47 F R 50 F R 54 F R 56 F R 62 F R 73 F R F 79 F R F 81 F R F 96 F R F 101 F R F 109 F R F 123 F R F 160 F R F aveage binding coe of the libay eult fom a new alt bidge between the ubtituted aginine at poition 64 and the cofacto NADPH (Fig. 9 a). Moeove, ubtitution of tyoine and typtophan at poition 30 with a malle aomatic eidue phenylalanine pehap educe teic hindance with the ubtate DHF (Fig. 9 b). We alo find that the deign identified uing the IPRO pocedue ae conitent with the eidue type obeved in the DHFR potein family equence (at poition 30, F ¼ 15.73%; and at poition 64, R ¼ 57.98%). It i impotant to note that no infomation of the potein family equence wa a pioi povided to the IPRO model. SUMMARY AND DISCUSSION In thi aticle, we intoduced the computational famewok IPRO fo the computational deign of potein combinatoial libaie. IPRO identifie tageted mutation in the paental equence that when popagated in the combinatoial libay FIGURE 8 Binding coe pofile befoe and afte edeign of paental E. coli and B. ubtili DHFR equence uing the HLDP famewok. Both clahing eidue poition and the binding pocket eidue ae conideed fo deign. FIGURE 9 (a) Subtitution of eine with an aginine at poition 64 tabilize the binding with the cofacto NADPH due to fomation of a new alt bidge. (b) Subtitution of tyoine and typtophan at poition 30 with a malle aomatic eidue phenylalanine pehap educe teic hindance with the ubtate DHF. Biophyical Jounal 90(11) 4167 4180