Design of Experiments Example 009 Tabandeh et al (2008) Chemical Engineering Science (63)

Size: px
Start display at page:

Download "Design of Experiments Example 009 Tabandeh et al (2008) Chemical Engineering Science (63)"

Transcription

1 ROK Bioconsulting Design of Experiments Example 009 Tabandeh et al (2008) Chemical Engineering Science (63) Response Surface methodology for optimizing the induction conditions of recombinant interferon beta during high cell density culture. Background and objective In this example the authors aim to optimize the induction conditions to maximise the production of human Interferon Beta (hinf-β) from recombinant E. coli. This tutorial is an example of how to process and analyse experiments for large number of responses in parallel and subsequent multi response optimization. A recombinant E. coli was engineered to with and inducible gene for an anti-viral drug, human Interferon Beta (hinf-β). hinf-β production starts after sufficient biomass (DCWInduction) has grown in bioreactor run according to a fed batch protocol. The authors have previously developed the fedbatch bioreactor protocol by feeding glucose at a constant flow rate. The biomass is induced by addition of the sugar IPTG (IPTG Concentration). The resulting hinf-β is produced as intracellular inclusion bodies which are purified from the E. coli biomass after bioreactor harvest. Model set up The DOE experiment has already been set up in a MODEE file Tabandeh.mips which can be obtained from ronan@rokbioconsulting.com. The data can also be provided in an Excel file if you are using other DOE software. Table 1 and Table 2 show the design factor and response names, abbreviations and measurement units. Factors (Parameters) Abbreviation Units Ranges DCW at induction A g/l 50 to 70 IPTG concentration B mm 1 to 3 Table 1. Factor names and ranges Response (Parameters) Abbreviation Units Notes DCW at Harvest X g/l Final Dry Cell Weight at harvest rhinf-β at Harvest P g/l Product concentration Yieldx/s Yxs g/g Biomass Yield /g of substrate glucose Yieldp/s Yps mg/g Product Yield /g of substrate glucose Yieldp/x Ypx mg/g Product Yield /g of biomass Biomass Productivity Qx g/l.h Productivity of biomass per hour Specific Productivity Qp mg/g.h Productivity of hinf-β per per g biomass Volumetric Productivity Qv g/l.h Productivity of hinf-β per hour Table 2. Response variables To set up the design in other software packages, the authors use a two factor 3 level response surface (RSM) DOE design with 4 center points resulting in a 13 experiments. Sort the DOE design into Page 1 of 18

2 standard order in your DOE software and then copy the data from the green area of Excel file TABANDEH CCC data file.xls. Evaluation of Raw data Figure 1 shows the worksheet with two factors and 8 responses with results. Factors Responses Figure 1 To start analysis, select Fit model from the Home menu (Figure 2). Figure 2 For a smaller number of responses, the analysis wizard (Figure 3) is typically used to sequentially analyse each response. Figure 3 However, a more efficient way to analyse multiple responses is to view the diagnostic plots for all the response. Start in the work sheet menu and check that the Select Responses dropdown is set to all responses (Figure 4). Figure 4 ROK Bioconsulting 2016 Page 2 of 18

3 Use the Select Responses drop down (Figure 5) to select either a single response or multiple responses to analyse. For the moment, make sure that all responses are selected. Figure 5 The first step is to assess experimental replication and identify any unusual response values by examining the replicates plots. Start analysis by clicking on the replicate plots in worksheet menu (Figure 4). The plots in Figure 6 show the experimental responses for each run. Replicated responses are shown by blue markers. In this experiment, only the design center points were replicated. As the range of the centre points is much less than the range of all the other points we conclude that reproducibility for each response is good with no obvious design point outliers. Figure 6 The next step is to assess the summary of fit to see how well each model fits the responses and whether each model is valid. Select the Summary of fit plot from the Analyze menu (Figure 7) ROK Bioconsulting 2016 Page 3 of 18

4 Figure 7 Figure 8 shows the Summary of fit plot for all the responses. While the R2 value and reproducibility for each response suggests that variance in the data is captured, the low Q2 and model validity for nearly all the responses shows that there is a potential issue with the dataset or the model fit. Figure 8 To identify whether the data contains any outliers, a residual plot shows how far each experimental point is from the model. To investigate this issue, select the Residuals plot from the Analyze menu (Figure 9). In this plot, outlier results can be identified quickly. In Figure 10, experiment 2 lies outside the 4sd limits for many of the responses. The outlier points have been selected manually so that they are shown in Red. The authors do not refer to any outliers in their analysis so it is not possible to determine a root cause. It is possible that these outliers could be a data entry error, incorrect response calculation or the result of a failed experiment or analysis. The root cause would typically be investigated and corrected if possible. Figure 9 ROK Bioconsulting 2016 Page 4 of 18

5 Figure 10 For the purposes of this example we will not carry out a root cause analysis and will exclude experiment 2 from all response analyses. Select the worksheet tab and select the row with experiment 2 (Figure 11). Use the drop down Incl/Excl menu to exclude the experiment. All the responses will be greyed out. Tthe models will automatically be re-fitted by MODDE and the active plots will be updated. Figure 11 Switch back to the residuals plot tab that you made earlier and you will see it has been updated (Figure 12). All the response results are well within the 4sd limits. ROK Bioconsulting 2016 Page 5 of 18

6 Figure 12 Check the updated Summary of Fit plot from earlier (Figure 13). R2, Q2, Model validity and Reproducibility are acceptable for DCW, rinf-β, Specific Productivity and Volumetric Productivity. Models for YieldX/S and biomass productivity have low Q2 indicating that there is a random noise in these responses. The model for YieldP/X has a low model validity suggesting there is an issue with Model Lack of Fit statistics. This issue will be investigated later. Figure 13 ROK Bioconsulting 2016 Page 6 of 18

7 Figure 14 The next step is to review the model coefficients for each model. The model coefficients show which coefficients are significant in explaining each of the responses. Select the Coefficients plot from the Analyze menu (Figure 14). Figure 15 shows the Coefficients plots for all the responses. Figure 15 As the Yields and biomass productivity models showed model fit issues in the summary of fit plots (Figure 13), we will exclude this from subsequent analysis and come back to them later. To focus on the well modelled responses, DCW, rinf-β, Specific Productivity and Volumetric Productivity, select the responses from the Select responses dropdown menu as shown in Figure 16. Figure 16 ROK Bioconsulting 2016 Page 7 of 18

8 The number of Coefficient plots will be updated to four plots (Figure 17). The coefficients pattern for rhinf-β concentration and Volumetric Productivity are very similar. As expected, for responses rhinfβ and Volumetric productivity, the main effects of factors A and B ( A= DCWinduction & B= IPTG Conc) are not significant when compared to the interaction terms (A*B) and squared terms (A*A and B*B). All coefficients in the models for harvest DCW and specific productivity are significant. Interaction term (A*B) and the A 2 term (A*A) are particularly important in all the models and shows the interdependency of factors and non-linear nature, which is typical for biological systems. Figure 17 The interactions and model curvature can be reviewed using the interaction plots available from the Analyze menu (Figure 18). Figure 18 These interaction plots provide a visual assessment of model curvature and the strength and nature of interactions between model parameter (Figure 19). For the harvest DCW response, the IPTG concentration has very little effect when the induction DCW is high. However, when induction DCW is low, final harvest DCW are decreased when high concentrations of IPTG are used to induce rhinf-β production. This would be expected when the cells are induced with higher IPTG concentrations as rhinf-β will be produced at a higher rate. This rhinf-β higher production rate results in cells are putting less energy to producing biomass. Inducer concentration also has a small positive effect on specific productivity. While the inducer is absolutely necessary rhinf-β expression, the inducer concentration range used in the design may not ROK Bioconsulting 2016 Page 8 of 18

9 have large enough for it to have be significant effect. As IPTG is a costly raw material, these contribution results suggest that inducer concentrations could be reduced. Figure 19 Where there are multiple interactions present in the model, the effect of each interaction can be selected by right clicking on the plot to select plot properties menu (right click on plot)(figure 20) Figure 20 The next step is to evaluate how well the experimentally observed values compare to the model prediction. Select the Observed vs predicted plot from the Analyze menu. Figure 21 shows an Observed vs predicted plots for four of the responses. The dotted line is a parity line showing where the predicted result is equal to the experimentally observed values. As the experiment points ROK Bioconsulting 2016 Page 9 of 18

10 for all of the responses sit very close to the parity line for each response, we can conclude the model is able to predict the experimental values used to create the models. The parity plot for DCW does show a cluster around 84 g/l predicted and these measurements should be checked for an errors. While this internal model validation does show promising results, it is important to run external validation experiments using model predictions typically from the high middle and low end of the model. The authors did not carry out additional verification runs so it is not possible to evaluate external model validation. Model predictions will be assessed in a subsequent section. Figure 21 Investigating issues identified with the Yp/x model In Figure 13, the summary of fit plot for the Ypx model showed a low model validity score. This low model validity suggested that there was an issue with significant lack of fit. Significant lack of fit can be caused by experiments showing larger than usual residuals or incorrect model selected. To investigate whether there is significant lack of fit (LoF) select the ANOVA table from the Analyze menu (Figure 22) Figure 22 ROK Bioconsulting 2016 Page 10 of 18

11 In the select responses dropdown, select Yp/x and for comparison DCW (Figure 23). ANOVA analysis tables for DCW and Yp/x will be shown similar to Figure 24. Figure 23 Figure 24 show a range of model fitting parameters including LoF results for each model, indicated by the red boxes. Lack of fit can occur if important terms are missing from the model such as interaction or squared terms. It can also occur where there are several unusually large residuals resulting from fitting the model. For Yieldp/x, the ANOVA analysis p value <0.05 indicating the model has significant lack of fit. For comparison the DCW model does not show significant lack of fit as p >> Figure 24 In this case, lack of fit analysis for the Ypx model shows that there is significant lack of fit a the probability value (p) < For comparison the DCW model does not show significant lack of fit as p>0.05. As R2, Q2 and reproducibility are acceptable for model Ypx (Figure 13), we will first investigate whether there are any unusually large residuals. ROK Bioconsulting 2016 Page 11 of 18

12 Figure 25 To investigate the residuals in more detail, select the Residuals vs variables plot from the Analyze menu (Figure 25). Figure 26 shows the Ypx response residuals plotted against the factor DCW induction. While the Ypx value for experiment 8 was not identified as an outlier, the data point located 1.5 standard deviations away from the model average and is anomalous. The larger than expected residual with experiment 8 is likely to be responsible for the model lock of fit. The anomalous value may be a result of a calculation error or an issue with the experiment would typically be investigated further. The authors did not report any issue with experiment 8 and this point will be excluded from the model. Select the Ypx value of experiment 8 in the worktable. Right click and select exclude. The single value will then be greyed out showing that it is excluded from the model. The Ypx model will be recalculated automatically. Figure 26 In the updated ANOVA table, the lack of fit probability is no longer significant and the summary of fit plots now show that that model validity for Ypx has been improved by removing the Ypx results for experiment 8. There are now at least 4 high quality models that can now be used to optimize and predict process performance using the Optimizer module. ROK Bioconsulting 2016 Page 12 of 18

13 Predictions The prediction menu in MODDE 11.0 offers a wide range of options. The optimizer module can be used to select setpoints that are predicted to give optimum responses. The criteria for either single or multiple responses can be adjusted and predictions and prediction quality can be evaluated. In this example we will use the optimizer to suggest optimal factor setpoints based on a number a response optimization criteria and will compare these predictions to those made by the authors. To get started select the Optimizer button (Figure 27) from the Predict menu. Figure 27 The Objective tab in the Optimizer tool is used to set the optimization criteria to identify optimum factor setpoints. Set the optimization criterion for volumetric productivity and specific productivity to maximum using the pulldowns to maximise both these responses (Figure 28). Adjust the min values of specific and volumetric productivity to 2.0 and 0.1, respectively. Click Run Optimizer to start the optimization process. Figure 28 Switch to the setpoints tab shown in Figure 29. The left window shows a list of log(d) and DPMO (defects per million operations) for each set of suggested setpoints. By default, the best setpoint conditions are selected by MODDE. In Figure 29, setpoint 3 was selected because the conditions (DCWInduction = 50 g/l & IPTConc = 2.62 mm) show the lowest defaults per million operations (DPMO) suggesting that there is a very low risk of failure for the suggested setpoints. ROK Bioconsulting 2016 Page 13 of 18

14 Figure 29 To view other suggested setpoints, switch to the alternative setpoints tab (Figure 30). Although the optimization algorithm has suggested 20 alternative setpoints, many of these are repeats. There are, in effect, only 2 suggested setpoint settings are recommended. Setpoint 3 (DCW induction = 50, IPTGConc = 2.6) predicting a final rhinf-β concentration = 2.23 and setpoint 2(DCW induction = 69.9, IPTGConc = 1.7) with a final rhinf-βconcentration = Tabandeh et el also recommended similar setpoint predictions which predicted similar final rhinf-β concentrations. While setpoint 2 does appear to predict a marginally increased final rhinf-β concentration, setpoint 3 predicts a higher specific productivity. This higher specific productivity would be expected to aid downstream purification of rhinf-β. The relative merits of each set point can be evaluated using the Design Space button shown in Figure 31 can be used to visualise how robust each setpoint would be. Figure 30 In the optimizer menu, select the design space tool shown in Figure 31. ROK Bioconsulting 2016 Page 14 of 18

15 Figure 31 Select OK and Next in the series of dialog boxes shown in Figure 32. No changes are required in this dialogs Figure 32 In the dialog shown in Figure 33, change the plot axis setting for DCWInduction low and high values so that the design space around both suggested setpoints can be visualised. Click finish and a design space plot will be generated showing probability of failure. Figure 33 The design space plot in Figure 34 show the risk of failures (as DPMO) as a function of DCWInduction and IPTG Concentration. The green area on the left and right hand side of the plot corresponds to setpoint 3 and setpoint 2, respectively. Setpoint 3 sits well within a green (low risk) region while setpoint 2, indicated by the black box, sits within a red (higher risk) region. ROK Bioconsulting 2016 Page 15 of 18

16 Figure 34 The design space plot in Figure 34 shows the risk of failures (as DPMO) as a function of DCWInduction and IPTG Concentration. The green area on the left and right hand sides of the plot correspond to areas in the design space where the prediction would be more reliable. Setpoint 3 sits well within a green (low risk) region while setpoint 2, indicated by the black box, sits within a red (higher risk) region. This shows that setpoint 3 is predicted to a more robust setpoint than setpoint 2. Tabandeh et al reached similar conclusions selecting a lower induction biomass and higher IPTG concentration to minimise the inhibitory effect of IPTG on cell growth and productivity. Their analysis did not assess the comparative robustness of the suggested setpoints. Summary In this bioprocess example, the optimum induction conditions for production of recombinant higf-β were investigated. A three level factorial experimental design was used to develop a response surface models for eight response parameters. Rather than use the analysis wizard which evaluates each response in a linear sequence, this example shows alternative analysis route that allows all the responses to be analysed in parallel. The analysis flow is summarised in Table 3. ROK Bioconsulting 2016 Page 16 of 18

17 Step 1 Home > Design Wizard Create design 2 Execute experiments & enter response data 3 Worksheet >Replicate plot analysis In this step reviews the response data and replicate experiment quality 4 Analyze. Summary of fit analysis After fitting models for each response, the summary of fit plot provides an overview of how much variance is captured (R2) and how much variance is predicted (Q2). Model validity assesses whether the terms in the model are appropriate and whether there is significant lack of fit. 5 Analyze > Coefficient plot The coefficients plot assess which model terms are significant and indicates which model terms can be removed. 6 Analyze >Interaction plot The interactions plot show how model main factors interact with each other and whether these interactions have an impact on the responses. 7 Analyze > Observed vs Predicted plots The observed vs predicted plot shows how well the predictions from each response model compares to the experimental data 8 Analyze >ANOVA analysis Analysis of Variance (ANOVA) shows a range of diagnostic statistical tests to assess the quality of each response model. 9 Predict > Optimizer The optimizer tool is used predict setpoints for factors based on optimization criteria set for single or multiple responses. 10 Predict > Optimizer >Design space Where multiple optimum factor setpoints are identified by the Optimizer Tool, the design space tool evaluates how stable each the predictions will be around selected setpoint. 11 Verify predictions While MODDE does not provide tools for verifying prediction, this is an important step. Additional experiments should be carried out using predicted factor setpoint that were not part of the original experimental design. These external validation setpoints allow comparison between the observed responses and predictions within the design space. Table 3. Analysis Flow summary Errors in the dataset were identified using the MODDE plotting tools and exclusion of experiments and a data point resulted in improved model reliability. Predictions carried out using the Optimizer tool showed two potential factor setpoints. While inducing the culture at a higher induction cell density and lower IPTG concentration resulted in a marginally better product concentration in the harvest, induction at a lower IPTG concentration resulted in much increased specific productivity. Increased specific productivity would benefit downstream purification ROK Bioconsulting 2016 Page 17 of 18

18 of the product. In addition, design space analysis showed that the lower induction DCW and higher IPTG concentration would result in a more reliable process outcome. If you would like to discuss this example or other examples, please contact Ronan@rokbioconsulting.com ROK Bioconsulting 2016 Page 18 of 18