Neuro-Fuzzy Modeling of Nucleation Kinetics of Protein

Similar documents
Atmospheric Ice Nucleation

Management Science Letters

Diffusional Transformations in Solids

Assessment of Nucleation Kinetic Mechanisms in Gas Hydrate Crystallization Processes

Seader & Henley, Separation Process Principles f01_11

8. Principles of Solidification

Chapter 10, Phase Transformations

Flood Forecasting using Adaptive Neuro-Fuzzy Inference System (ANFIS)

Kinetics. Rate of change in response to thermodynamic forces

Preliminary Studies on In Situ Monitoring of Lactose Crystallization Using Focused Beam Reflectance Measurement

Microfluidics to produce and manipulate microcrystals

Crystal Growth from Melts

A Balanced Nucleation and Growth Model for Controlled Precipitations

Module 29. Precipitation from solid solution I. Lecture 29. Precipitation from solid solution I

CRISTALIZATION UNIT YUSRON SUGIARTO

Geometric Semantic Genetic Programming Algorithm and Slump Prediction

Point Defects. Vacancies are the most important form. Vacancies Self-interstitials

Keywords: Breach width; Embankment dam; Uncertainty analysis, neuro-fuzzy.

New Methods of Editing and Imputation

Preliminary studies on in situ monitoring of lactose crystallization using focused beam reflectance measurement

FACTS: FUZZY ASSESSMENT AND CONTROL FOR TEMPERATURE STABILIZATION

Metallurgy - Lecture (2) Solidification

CHAPTER 4: Oxidation. Chapter 4 1. Oxidation of silicon is an important process in VLSI. The typical roles of SiO 2 are:

WATER QUALITY PREDICTION IN DISTRIBUTION SYSTEM USING CASCADE FEED FORWARD NEURAL NETWORK

Part II : Interfaces Module 3 : Nucleation of precipitates from a supersaturated matrix

Properties of Matter. Chemical Properties and Effects on Pollutant Fate. Characteristics of Chemical Changes. Physical Characteristics

Engineering Genetic Circuits

Part III : Nucleation and growth. Module 4 : Growth of precipitates and kinetics of nucleation and growth. 4.1 Motivating question/phenomenon

SiC crystal growth from vapor

Final exam. Please write your name on the exam and keep an ID card ready.

E45 Midterm 01 Fall 2007! By the 0.2% offset method (shown on plot), YS = 500 MPa

STAT 2300: Unit 1 Learning Objectives Spring 2019

A FUZZY-BASED MODEL TO DETERMINE CUI CORROSION RATE FOR CARBON STEEL PIPING SYSTEMS

Learning Objectives. Chapter Outline. Solidification of Metals. Solidification of Metals

Introduction to Point Defects. Version 2.1. Andrew Green, MATTER John Humphreys, UMIST/University of Manchester Ross Mackenzie, Open University

FUZZY MODELING OF SOIL COMPACTION DUE TO AGRICULTURAL MACHINE TRAFFIC

Diffusion coefficients for crystal nucleation and growth in deeply undercooled glass-forming liquids

Investigation on Nucleation Kinetics and Photoluminescence Characterization of a Nonlinear Optical L-arginine Maleate dihydrate Single Crystal

Slurry concentration [Vol.%]

Phase Transformations Workshop David Laughlin

Phase Field Modeling of the Effects of Gradient Energy Coefficients on. the Void-matrix Interface Thickness during Czochralski Silicon Crystal

Chapter 7 Mass Transfer

Predicting gas usage as a function of driving behavior

MICROSTRUCTURAL EVOLUTION IN MATERIALS

This is a repository copy of Study on Nucleation Kinetics of Lysozyme Crystallization.

ROUTE CHOICE MODELLING USING ADAPTIVE NEURAL FUZZY INTERFACE SYSTEM AND LOGISTIC REGRESSION METHOD

Fuzzy Logic based Short-Term Electricity Demand Forecast

Stochastic modeling and control of particle size in crystallization of a pharmaceutical

Gene Expression Data Analysis

Learning theory: SLT what is it? Parametric statistics small number of parameters appropriate to small amounts of data

Predicting Toxicity of Food-Related Compounds Using Fuzzy Decision Trees

Oral Delivery of Drugs

Modeling of Steam Turbine Combined Cycle Power Plant Based on Soft Computing

JBS Protein Crystallization Starter Kit

Neural Networks and Applications in Bioinformatics. Yuzhen Ye School of Informatics and Computing, Indiana University

The Material (in) Dependency of Impurity Affected Thin Film Growth

Structural Bioinformatics (C3210) Conformational Analysis Protein Folding Protein Structure Prediction

Nucleation of Iron Dust from Type II Supernovae

Neural Networks and Applications in Bioinformatics

27-302, Microstructure & Properties II, A.D. Rollett. Fall Homework 2, due Weds, Nov. 6 th

The New Software for Research and the Modelling of Grown-in Microdefects in Dislocation-Free Silicon Single Crystals

Presentation Overview

Wastewater Effluent Flow Control Using Multilayer Neural Networks

Corrosion Rates. OLI model basics Polarization curves

A Study in Hen Egg White Lysozyme Crystal Growth Christopher J. Hodgson 99, Sean N. Maduck 00, Dean S. Rahman 01

Crystallization Methods and Protein Crystal Properties. (adapted from )

Evaluation of Groundwater quality (Cases study; marvdashtand Arsenjan, Fars province, Iran)

Data Mining and Applications in Genomics

Comparative Study on Prediction of Fuel Cell Performance using Machine Learning Approaches

AP Statistics Scope & Sequence

Getting Started with OptQuest

Winsor Approach in Regression Analysis. with Outlier

Control Steam Turbine Combined Cycle Power Plant Based on Soft Computing

A test for diffusional coherency strain hypothesis in the discontinuous precipitation in Mg Al alloy

Drug Discovery in Chemoinformatics Using Adaptive Neuro-Fuzzy Inference System

Chapter 5. Differential Scanning Calorimetry

COMPRESSIVE STRENGTH MODELING OF SCC USING LINEAR REGRESSION AND ARTIFICIAL NEURAL NETWORK APPROACH

Evapotranspiration Prediction Using Adaptive Neuro-Fuzzy Inference System and Penman FAO 56 Equation for St. Johns, FL, USA

Unit Operations for Bioprocess Engineers

SUPERCRITICAL ANTISOLVENT PRECIPITATION: ATOMIZATION AND PRODUCT QUALITY

Chemical Engineering 160/260 Polymer Science and Engineering. Lecture 17: Kinetics and Thermodynamics of Crystallization February 26, 2001

Diffusion, osmosis. Physics-Biophysics 1. I. Diffusion Medical Biophysics book: chapter III/ October Tamás Huber

Clovis Community College Class Assessment

MODELING OF MICRO-EXPLOSION FOR MULTICOMPONENT DROPLETS

Business Analytics & Data Mining Modeling Using R Dr. Gaurav Dixit Department of Management Studies Indian Institute of Technology, Roorkee

CME 300 Properties of Materials. ANSWERS Homework 2 September 28, 2011

Single-cell sequencing

LYOPHILIZATION. Introduction to Freeze-Drying Third of Four Lectures. Freezing, Annealing, Primary, and Secondary Drying. J. Jeff Schwegman, Ph.D.

Neutron Flux Effects on Embrittlement in Reactor Pressure Vessel Steels

Chapter 7 IMPACT OF INTERCONNECTING ISSUE OF PV/WES WITH EU ON THEIR RELIABILITY USING A FUZZY SCHEME

Business Quantitative Analysis [QU1] Examination Blueprint

So.. Let us say you have an impure solution containing a protein of interest. Q: How do you (a) analyze what you have and (b) purify what you want?

Machine Learning Logistic Regression Hamid R. Rabiee Spring 2015

Measurement of Enzyme Kinetics by UV-Visible Spectroscopy

Distinguish between different types of numerical data and different data collection processes.

CHAPTER SEVEN EXPERIMENTAL RESULTS EFFECT OF THE STEEL S COMPOSITION 7.1 EFFECT OF ANNEALING TREATMENT ON STEEL B

Lecture 2. Crystals: Theory and Practice. Dr. Susan Yates Wednesday, February 2, 2011

Finite Element Thermal Analysis of Three Dimensionally Printed (3DP ) Metal Matrix Composites

Model construction of earning money by taking photos

Preparation of InGaAs Starting Materials with the Gradient InAs Concentration

Transcription:

Neuro-Fuzzy Modeling of Nucleation Kinetics of Protein DEVINDER KAUR, RAJENDRAKUMAR GOSAVI Electrical and Computer Science and Chemical Engineering University of Toledo 280, W. Bancroft, Toledo, OH 43606 USA http://www.eecs.utoledo.edu/~dkaur/ Abstract:- The estimation of the nucleation kinetics of protein using a predictive model will greatly help in reducing the number of parameters for protein nucleation studies. In this paper a Fuzzy Inference Model has been developed using the observed data for nuclear kinetics. The model was trained using neuro fuzzy inference system. The results obtained from the fuzzy model were compared with the experimentally observed values from the data collected. Results show that a better fit can be obtained using the fuzzy logic technique and the nucleation kinetics can be very well estimated at the conditions studied. Key Words:- Neuro-Fuzzy Modeling, Nucleation Kinetics, ANFIS Introduction Nucleation is the first step in the formation of solute crystals in a supersaturated solution. One can say that the nucleation process forms the templates of solid phase on which further growth of the solid phase takes place. Nucleation of crystals is first required in solutions with no crystalline phase present. An understanding of the factors that affect nucleation kinetics is important in any crystallization or precipitation process in order to control the outcome of that process (i.e. crystal size distribution). Supersaturation levels, solution hydrodynamics, temperature, solute concentration, nonidealities and impurities in the protein solution can affect the nucleation rates of protein crystals. The creation of a new phase in another phase that is not at equilibrium starts with nucleation. That is to say, the random formation of clusters of growth units (ions, atoms, molecules or molecular aggregates), that are able to grow further into the new phase. Nucleation is generally classified into two categories: homogeneous nucleation where external surfaces are absent and heterogeneous nucleation where the surfaces in contact with the mother phase play a key role in the formation of a second phase. While the growth process of globular proteins was studied extensively, knowledge about the nucleation process is still very limited. There are various ways to determine the nucleation rates. One approach is to allow a system to nucleate at a given solution conditions and change the conditions so that nucleation is suppressed and only growth takes place. The number of grown crystals then provides information about nucleation kinetics. An initial rate method is used to obtain the nucleation kinetic rates for the desired conditions. There are various instances where use of Fuzzy logic in protein science can be observed. Fuzzy logic has been used to predict protein s sub cellular locations from their dipeptide compositions []. It has also been successfully used [2] in molecular modeling where a new method which produces fuzzy structural vectors to predict secondary structural class of a protein from its sequence. Also the classification of protein based on the adaptive Neuro-fuzzy inference system [6] enhanced the protein secondary structure prediction. A hybrid GMC-fuzzy algorithm is used [3] in ph control of the enzymatic hydrolysis of cheese whey in a reaction carried out by alcalase immobilized on agarose gel. A strategy has been discussed [4] that uses fuzzy logic-based factors to modify algebraic

rate laws and eventually obtain kinetic models that are suitable for metabolic pathway modeling without complete enzyme mechanisms. A fuzzy model of kinetics of enzymic Penicillin-G conversion has been described. [5]. In this paper comparison of the model fit of nucleation data with that of the fuzzy logic method has been done. The nucleation data with certain conditions has been modeled using non linear regression technique [7]. Since the non linear regression and weighted least square regression methods did not give satisfactory fit on the nucleation data, the nucleation data was split into two regions namely homogenous nucleation and heterogenous nucleation. Data fitting was done on both these regions to achieve a better estimation of nucleation kinteci data. The description of use of Anfis for the data fittting to give a better fit is provided. The results indicate that the nucleation rates in the range of the conditions studied could be estimated with a decent 0% error of the actual nucleation data. 2 ANFIS: ANFIS stands for Adaptive Neuro-Fuzzy inference system (Anfis). ANFIS provides an optimization scheme to find the parameters in the fuzzy system that best fit the data. ANFIS applies two techniques in updating the parameters. For premise parameters that define membership functions, ANFIS employs gradient descent to fine tune them. For consequent parameters that define the coefficients of each output equations, ANFIS uses the least squares method to identify them. It learns from the data and adapts itself to the trend in the data. Anfis then fits a model for the data. A schematic of ANFIS is shown in Fig. The Fuzzy logic system that has been modeled can be denoted by a schematic as shown in Figure 2. The schematics for ANFIS look similar to that of the neural network except that in ANFIS at layer 4 there are external parameters that affect the optimization process. Salt concentration Protein concetration Fig : Schematic of ANFIS. Anfis (Sugeno Model) Nucleation rate Fig 2: Schematic of the fuzzy inference system. 3 What is protein Nucleation? Crystallization is a rate process similar to chemical reaction. To obtain the rate of reaction, the method of initial rates is often used [Ref Scott Fogler, Livenspeil]. The reaction is started at known initial reactant concentrations and the concentration of the product is measured with respect to time to obtain the reaction rate data. A similar procedure for crystallization requires measurement of the number of nuclei formed with respect to time in a given volume of crystallizing solution during the initial stages of nucleation where the concentration of the solution is known. Since nuclei are very small and by definition a critical nucleus will grow into a crystal of detectable size, counting the number of crystals formed at discreet time intervals should yield the nucleation rate. The assumption underlying this technique is that 2

every nucleus formed will grow into a crystal and can be detected. This method offers several advantages over the methods discussed above. The only challenge lies in detecting and counting crystals as they are forming and when they are very small, about a few microns in size. Since the amount of growth that takes place is very low, the concentration of the solution does not change significantly during the initial stages of crystallization. No knowledge of growth kinetics or particle size distribution is necessary. Figure 3 is an example of how a typical nucleation rate data would look like. This nucleation data has been obtained for lysozyme protein for 2%NaCl salt concentration. The slope of the line as shown in the plot gives the nucleation rate for lysozyme at the given conditions of salt concentration and protein concentration. The nucleation data was obtained for various salt concentrations. Number of crystals per ml. 4000 3500 3000 2500 2000 500 000 500 y = 52.59x - 3552.9 R 2 = 0.9708 0 0 20 40 60 80 00 20 Time, min Figure 3.Typical nucleation rate data set. Number density of particles is plotted against time [7]. For the empirical modeling of the nucleation rate kinetics, the classical nucleation theory was used [7]. The classical nucleation theory expresses the homogeneous nucleation rate J for a spherical nucleus as 3 2 2v ktσ G a 6πσ v J = N exp exp 3 3 2 h kt 3k T ( ln S) () where ν is the molecular volume, k is the Boltzmann constant, T is absolute temperature, σ is the surface energy per unit area of the nuclei, h is the Planck's constant, N is the number density of monomeric species in the solution, G a is the energy barrier to diffusion from bulk solution to the cluster and S is the supersaturation, often expressed as (C/C*), the ratio of bulk concentration of solute to equilibrium solubility. It should be noted that the activity coefficients at bulk and equilibrium concentrations are assumed to be approximately equal by considering (C/C*) as the driving force for phase change instead of the ratio of activities. One can lump the parameters in the above equation and transform it into the two-parameter empirical expression below: B J = A C exp (2) * 2 [ ln( C C )] 4. Model Fit Using Adaptive Neuro Fuzzy Logic The data set that was studied for anfis are the experimental protein nucleation rates at 2%, 2.5%, 3% and 4% NaCl concentrations at ph of 4.5 and temperature 4 o C. The input parameters were normalized. The fuzzy system generated a set of ifthen rules after evaluating the data. These rules are of the following format: If (Salt is Low) and (Protein is Low) then (Nucleation Rate is Value) From the data first the fuzzy inference system was created using the genfis command. Then the fismat created was trained with anfis command. Genfis in combination with anfis learns from the data and then fits the data to the best fit possible hence this procedure is termed as the adaptive neuro fuzzy system of inference. 3

Nucleation Rate (crystals ml - min - ) 000 00 0 4.0% 3.0% 2.5% ln(j/c) 2.0% 3.0 2.0 4.0%.0 0.0 -.0-2.0-3.0 0.0 0.2 0.4 0.6 0.8-2 [ln(c/c*)] 0 0 20 30 40 50 60 70 80 Protein Concentration, (mg/ml) Figure 4: The symbols represent individual nucleation rate measurements for 2% to 4% NaCl concentration as indicated on the graph. The lines represent the nucleation rate model Deviation of model predictions towards the lower protein concentration region is evident when the model is linearized and plotted as in the inset for 4% NaCl nucleation rate data[7]. be seen that the model fit follows the data points with a reasonable accuracy. Nucleation Rate 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0. Plot of Nucleation data 4% NaCl 3.5% NaCl 2.5% NaCl 2% NaCl 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Protein Concentration Figure 6: a) Nucleation Rate before using anfis Nucleation Rate, (crystals ml - min - ) 000 00 0 4.0% 3.0% 2.5% 2.0% Nucleation Rate 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Final fit of the plot 4% NaCl 3% NaCl 2.5% NaCl 2% NaCl 0 0 20 30 40 50 60 70 Protein Concentration, (mg/ml) 0.2 0. Figure 5: Model fit to data split into two regimes for 2 to 4% NaCl concentration. Solid symbols and lines indicate data points in the homogeneous (high protein concentration) range and open symbols with dashed lines belong to the heterogeneous (low protein concentration) range[7]. After treating with genfis it was observed that the model gave a reasonable fit with very small RMSE values on training data. Figures 6.a and 6.b show how well the model fits using genfis. In figure 6.b it can 0 0. 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Protein concentration Figure 6: b) Nucleation with model fit using anfis The error on the training data was obtained in figure 0. This graph can be used to determine the number of epochs required to obtain the minimum error. In this case the anfis used 00 epochs. The model fit obtained using various epochs lead to nucleation estimate that are not much different than each other. 4

Observed J Model J [7] Model J, FL* 2.79 2.7.8 3.66 3.90 3.35 7.35 7.28 7.4 9.25 9.0 9.5 4.04 4.5 20.23 9.58 6.93 20.23 20.7 6.93 2.67 2.0 8.67 5.70 28.76 27.53 28.56 29.0 27.53 28.56 28.32 27.53 28.56 60.52 63.79 58.70 60.7 63.79 58.70 57.64 63.79 58.70 28.83 28.00 30.74 32. 28.00 30.74 SSE** 06.54 82.76 R 2 0.9956 0.9966 Table.: Model estimates for 4% NaCl Observed J Model J [7] Model J, FL 3.32 2.95 2.47 7.62 8.74 8.62 8.2 8.74 8.62.37 0.03 9.5 8.33 5.46 0.87 8.25 7.8.90 8.6 7.8.90 6.36 0.30 2.75 6.09 0.30 2.75 6.26 0.30 2.75 7.04 20. 8.09 27.6 29.84 28.82 27.45 29.84 28.82 28.28 29.84 28.82 79.53 7.07 77.49 78.48 7.07 77.49 78.58 7.07 77.49 34.9 45.20 36.30 35.7 45.20 36.30 36.46 45.20 36.30 332.65 347.29 349.47 368.48 347.29 349.47 347.45 347.29 349.47 SSE 262.09 735.58 R 2 0.9956 0.9975 Table.2: Model estimates for 3% NaCl Observed J Model J [7] Model J, FL 6.92 5.44 6.60 6.8 5.44 6.60 2.25 3.55 2.0.96 3.55 2.0 28.42 27.06 27.75 26.75 27.06 27.75 55.629 52.39 56.78 58.56 52.39 56.78 8.02 2.60 6.33 3.89 2.60 6.33 242.4 244.35 245.92 249.8 244.35 245.92 SSE 63.28 42.52 R 2 0.998 0.9995 Table.3: Model estimates for 2.5% NaCl Observed J Model J [7] Model J, FL.5.02.5 3.42 3.9 3.72 3.74 3.9 3.72 7.44 8.98 6.89 8.66 8.98 6.89 7.7 9.38 7.52 8.4.2 0.3 9.2.2 0.3 9.47.2 0.3 6.34 4.47 4.63 20.824 20.4 2.04 20.789 20.4 2.04 20.373 20.4 2.04 22.97 23.0 24.7 25.74 23.0 24.7 23.54 23.0 24.7 23.49 23.99 25.3 28.3 2.04 25.3 30.86 4.7 28.9 24.74 22.45 36.26 44.69 30.58 42.82 47.72 30.58 42.82 48.33 30.58 42.82 43.76 40.95 47.72 47.67 40.95 47.72 47.85 40.95 47.72 58.48 58.96 59.66 57.82 58.96 59.66 65.76 77.38 79.48 27.42 2.40 3.93 27.25 2.40 3.93 28.87 2.40 3.93 37.65 40.39 36.85 35.03 46.08 4.42 3.08 46.08 4.42 3.48 46.08 4.42 95.75 96.80 94.89 94.7 96.80 94.89 94.52 96.80 94.89 285.46 254.0 273.74 287.6 254.0 273.74 260.06 254.72 274.44 259.6 254.72 274.44 278.7 254.72 274.44 295.8 322.65 295.30 SSE 6385.59 2039.03 R 2 0.9833 0.9947 Table.4: Model estimates for 2% NaCl * Fuzzy Logic ** Sum of Square of Errors 5

# of Epochs Training data RMSE 50 0.0479 70 0.0476 75 0.0483 80 0.0476 85 0.0480 00 0.0474 50 0.0472 200 0.0470 300 0.090 500 0.089 Table 2: Comparison of RMSE values for various epochs. 0.022 0.02 0.02 0.09 Training Error trnerr of 2% to 4%; for any protein concentration at that salt concentration below 62 mg/ml the nucleation kinetics rate can be estimated.. This implies that the fuzzy logic can be confidently used to estimate the nucleation rate at the conditions studied. If more data are available at various salt concentrations like 4% to 0% and below % apart from what has been used in this work, this fuzzy logic exercise could be extensively used to estimate the nucleation rate of protein at various salt concentrations. The future work in this area will comprise of involving more parameters such as temperature, ph and additive concentrations to estimate the nucleation kinetics of the protein. References:. Y. Huang, Y. Li, Prediction of protein subcellular locations using fuzzy k-nn method, Bioinformatics, 20(), 2-28, 2004. Error 0.08 0.07 0.06 2. J. Boberg, T. Salakoski, M. Vihinen. Accurate prediction of protein secondary structural class with fuzzy structural vectors, Protein Engineering, 8(6), 505-2, 995. 0.05 0.04 0 0 20 30 40 50 60 70 80 90 00 Number of epochs Figure 7: Training Data Error vs. Number of Epochs The Root Mean Square Error on the fit plotted in Fig. 7 was found to be 0.0474. Table 2 denotes the training data RMSE for various epochs studied. The regressed values obtained from the earlier work [7] and those obtained from the fuzzy logic technique were used to compare the fit using R 2 values. The comparison is provided in table.,.2,.3 and.4. 5 Conclusions: A fuzzy model for the protein nucleation was created from the data and then the model learnt using adaptive neuro fuzzy modeling generating a set of rules, with the help of which it can estimate the nucleation rate at any conditions of NaCl concentration or protein concentration in the range studied. Thus for any salt concentration in the range 3. R. Sousa, G. P. Lopes, G. A. Pinto, P. I. F. Almeida, R. C. Giordano. GMC-fuzzy control of ph during enzymatic hydrolysis of cheese whey proteins, Computers & Chemical Engineering, 28(9), 66-672, 2004. 4. B. Lee, J. Yen, L. Yang, J. C. Liao. Incorporating qualitative knowledge in enzyme kinetic models using fuzzy logic, Biotechnology and Bioengineering, 62(6), 722-729, 999. 5. R. Babuska, H. J. L. Van Can, H. B. Verbruggen. Fuzzy modeling of enzymic penicillin-g conversion, Proceedings of the World Congress, International Federation of Automatic Control, 3th, San Francisco, June 30-July 5, 996. 6. J. A. Hering, P. R. Innocent, P. I. Haris,. Neuro-fuzzy structural classification of proteins for improved protein secondary structure prediction, Proteomics, 3(8), 464-475, 2003. 7. V. Bhamidi, S. Varanasi, C. A. Schall. Measurement and Modeling of Protein Crystal Nucleation Kinetics, Crystal Growth & Design, 2(5), 395-400, 2002. 8. D. M. Himmelblau. Process Analysis by Statistical Methods, Wiley: New York, 970. 6