Maximizing the potential of manufacturing data to advance biopharmaceutical production

Size: px
Start display at page:

Download "Maximizing the potential of manufacturing data to advance biopharmaceutical production"

Transcription

1 Maximizing the potential of manufacturing data to advance biopharmaceutical production Anja Zgodic 1 Ashley Lee, Paul Stey, Andras Zsom, Sam Bell 2 Tom Mistretta, Patrick Gammell 1 1. Amgen Process Development 2. Brown University 1

2 Biologic medicines are becoming more highly engineered and more diverse Batch/ Fed Batch Continuous Product quality drivers Supply requirements Financial considerations Next Gen Mfg One size does not fit all Appropriate manufacturing technologies can be matched to modalities to deliver to the Quality Target Product Profile

3 Therapeutic diversity requires a flexible manufacturing network Flexible modular platforms leveraging disposables Traditional Facility Large steel based plants & platforms Low productivity labor intensive processes Multiple manufacturing technology options and modular platforms create a competitive advantage

4 Biopharmaceutical processes, comprised of consecutive units, generate abundant data Upstream Cell Culture Process Downstream Purification Process Drug Substance Testing > 500 QC entries > 2000 Batch Record Entries Filling Process Drug Product Testing Drug Product Packaging Release > 500 million Continuous data points Challenge: Holistically monitor & control variance throughout the process

5 Previous data integration approaches were highly manual and inefficient Batch Records Documents LIMS Spreadsheets Manual collation of data Reports Lab notebooks Slides Spreadsheet Electronic systems Multiple source systems for information makes it challenging to fully capitalize on data and prior knowledge despite significant resource investment

6 Amgen has embarked on a journey to convert data to knowledge Culture & People Patients Early Wins Technology & Innovation Physicians and Payers PHASE 1 Manufacturing & Supply Chain Focused Initiatives Strategic Partnerships PHASE 2 PHASE 3 Governance Drug Development Amgen Proprietary Internal Use Only

7 Infrastructure is required to transform process data to process knowledge Raw Mat. Initial Cond./Set Points P T do2 Off-gas NIR/ FTIR Daily Offline Samples ph CD GC/MS HPLC On-line Metabolite Analyzer The Enterprise Data lake contains many data sources that can be combined to maximize access to a range of data sources (structured and unstructured)

8 Data integration allows global process monitoring and rapid signal detection Near real time process monitoring Immediate signal detection Batch on batch and multivariate analysis

9 Data integration facilitates process understanding and opportunities to enhance productivity

10 Strategic partnerships are required to apply next gen data analytics to next gen data Project Concept Augment traditional hypothesis driven data mining with exploratory methodologies Expected Outcomes Identify factors we already know about (validation), but also potential opportunities we do not Develops methodology to interrogate multiple factors at a portfolio level

11 Strategic partnerships are required to apply next gen data analytics to next gen data Project: Next gen data analytics Innovative model that identifies process parameters and performance metrics that maximize yield at various manufacturing process steps of drug substance process Predict yield of drug substance at each step of manufacturing process Next gen project data Data from one of Amgen s drug substance products 15K MAB process 56 Batches 50,267,007 data points

12 Prediction of process outputs remains a challenge due to measurement frequency differences between inputs and outputs Upstream Cell Culture Process Drug Substance Testing Measurements made every second (continuous time series) Input Parameter(s) Time Lot X V1 V2 V3 Measurements made once per lot (discrete measurements) Performance Parameter Lot

13 Prediction of process outputs remains a challenge due to measurement frequency differences between inputs and outputs Challenges: How do we reconcile measurement frequency differences? How do we unite continuous time series data with discrete data? How to summarize time series data to predict performance? Solution: Feature Engineering

14 Feature engineering of time series data can help resolve measurement frequency differences Outcome (Yield) Feature Engineering: Repeated for each unit procedure Std Dev Input Params Lot Lot X V1 V2 V3 Mean Input Params Input Parameter(s) Lot Time Lot X V1 V2 V3 Lot X V1 V2 V3.. Mediam Input Params Features resulting from Feature Engineering Yield Lot Lot Lot X V1 V2 V3 Lot X V1 V2 V3 What it does: Reconciles continuous and discrete data Enables the use of reconciled data in modeling How: Takes input variable(s) and perform mathematical operation to generate new variable Takes a parameter s time series and summarize in many ways

15 Feature engineering then facilitates predictive modeling of process outputs using machine learning approaches Median ph > = 6.3 LS Coef >= 0 Variance O2 Flow >= 0.5 Variance ph >= 0 Yield Prediction Yield Prediction LS Coef >= -2 Yield Prediction Median O2 Flow >= 0.9 Yield Prediction Median ph >= 7 Yield Prediction Yield Prediction Yield Prediction Variance ph >= 1 Yield Prediction Yield Prediction Improved data structure to facilitate modeling Approach Selected: Gradient boosting applied to regression trees (xgboost) Advantages: Highly accurate Drawbacks: Risk of overfitting model

16 Feature Engineering Boosts Predictive Power of Model of Intermediate Step Yield Predictive Power of Model Initial vs Final Model Higher R 2 is better 20 Features selected from dozens vs thousands Features: basic vs advanced feature engineering Time component: in final model, took into account impact of previous unit operation yield on current unit operation yield

17 Predictors of Intermediate Step Yields Differ Between Earlier and Later Process Stages Main Predictors of Yield Expected: In later steps, previous step Yield is most important predictor Findings: In early steps, Oxygen- and Filtration Feed-related predictors were important In later steps, Post-Column UV, Pump Flow, Pump Current, Pressure, Feed Flow, and ph were important

18 Predictors of Intermediate Step Yields Differ Between Earlier and Later Process Stages Main Predictors of Yield Bioreactor Centrifuge/Depth Filtration Chromatography 1 Chromatography 2 Oxygen Flow Oxygen Flow Totalizer Oxygen Record Filtration Feed Flow Filtration Feed Pump Previous Yield Post-Column UV1 Post-Column UV2 Previous Yield Filtration Feed Flow Post-Column UV1 Pump Current UFDF Viral Filtration CEX Chromatography UFDF2 Previous Yield Feed Retentate Previous Yield Viral Filter Differential Pressure Feed Flow Totalizer Previous Yield ph Product Pump Flow Previous Yield Product Pump Flow

19 Insights Show That Early Process Steps Should be Optimized to Maximize Yield in Later Steps Feature engineering of time series data facilitates modeling Model prediction improves as process sequence progresses Relevant process inputs that impact downstream step yields were identified and can be optimized In the early stages of a process, biochemical factors are important As process sequences progresses, previous yield becomes the dominant predictor of current yield Early process steps (bioreactor/centrifuge/depth filtration) should be optimized Bioreactor needs additional information on process dynamics to be predictive This approach combined with design of experiments or first principle models may improve applicability 19

20 Continued Data Collection and Linking Will Enable Better Machine Learning and AI in Biotechnology AI/Machine Learning (ML) for Biotech Data Continue data collection at any measurement frequency; valuable since AI methods can address data complexity and enable analysis Leverage AI/ML in upstream steps and in design of experiment stages to maximize manufacturing performance at origin AI can provide predictive power from drug discovery all the way to manufacturing performance Since initial yield is a critical predictor of future performance, engineering efforts should be deployed upstream and at the bioreactor step to maximize downstream performance Future direction: Are these conclusions generalizable across multiple drug substance products?