An Informatics Architecture Model

Size: px
Start display at page:

Download "An Informatics Architecture Model"

Transcription

1 COMMENTARY Supporting Workflows, Empowering R&D: An Informatics Architecture Model By MICHAEL H. ELLIOTT Michael Elliott is founder, CEO, and chief analyst at Atrium Research & Consulting LLC. Since its founding in 2003, Atrium Research has provided best-in-class educational material and analysis to support informed decision making in scientific informatics. Elliott s tenure includes over two decades experience developing, marketing, and selling LIMS at Perkin-Elmer Corporation (now Applied Biosystems) and Scientific Software, Inc. (now part of Agilent Technologies). Prior to starting Atrium Research, he served as senior VP for worldwide sales, marketing, product management, consulting, and support at SSI. Editor s Note: This article is adapted from Thinking Beyond : A look at the informatics architecture model, which appeared in Scientific Computing in December. You can download the article at Beyond aspx. Two things have driven the adoption of s over the past decade. Initially, s were implemented mainly to replace paper. More recently, particularly as s have become more LIMSlike, organizations have deployed the systems to streamline and expedite specific scientific workflows. Today, however, these drivers have created a disconnect. Organizations have failed to balance their desire to increase efficiency in departments with organizational, macro-level requirements to expedite R&D information sharing and data analysis. A single system cannot provide this balance. What s required is an informatics architecture model that supports individual domain needs while parceling summary information appropriately to higher levels in an R&D enterprise. Paper Is Dead! Long Live Paper! While many organizations now tout the efficiency of paperless processes, a look under the hood reveals workflows still running on paper-based paradigms. The most common information deliverable in a scientific workflow is still a report very often the same report that used to be created on paper. To be sure, the makes this report faster and easier to create, but the vocabulary and format of the report are specified by the department. The report helps the department codify its work it doesn t educate individuals downstream about the meaning of or possible actions dictated by the work as there is little linkage between derived conclusions and underlying primary and secondary data. This should not be surprising. In the early days of s, systems were mainly deployed in early discovery 4 MOLECULAR CONNECTION

2 What s required is an informatics architecture model that supports individual domain needs while parceling summary information appropriately to higher levels in an R&D enterprise. KNOWLEDGE UTILIZATION LEVEL 4 KNOWLEDGE PRESERVATION LEVEL 3 LABORATORY PROCESS EXECUTION LEVEL 2 HTS Structured Data Warehouse Biology Portal Business Intelligence Enterprise Search INTEGRATION SERVICES Enterprise Content Management INTEGRATION SERVICES Chemistry Long Term Archive Analytical Analytical LIMS INTEGRATION SERVICES MASTER DATA MANAGEMENT LEVEL 1 Controlled Vocabulary Chemical Registration Biologics Registration Metadata Standards Fig 1: An informatics architecture model designed to meet the needs of individual domains while distributing key summary information across R&D. research and medicinal chemistry where the need to share information cross-departmentally was not as great. Just like a paper notebook, s provided a way for chemists to document experiments. And they added value by enabling chemists to share experimental knowledge, reuse and clone experiments, and eliminate tedious manual tasks like performing stoichiometric calculations. The signed notebook page was the signal to post chemical product detail to a registration system and lock the experiment into a PDF for preservation. It was win-win: chemists streamlined their workflow, and organizations had a way to preserve intellectual property. Vendor claims to the contrary, no is architected to provide long-term records retention or management. Rather, s primarily serve as transaction systems for experiment documentation and laboratory process execution. It s up to organizations to determine how to retain records and which records to retain. Combining this fundamental reality of architecture with the departmentally focused output of these systems presents several problems for organizations that attempt to standardize on one system across R&D. As s penetrate laboratories further downstream (analytical, formulations, drug metabolism, etc.), departments must cooperate in a bidirectional data flow of requests and delivery. Information from all these efforts (and upstream work) must be brought together in order to make project-level decisions. Even in today s paperless world, information supporting project-level decisions is usually correlated manually. Consider the data management challenge associated with developing dosage forms for clinical trials. At least four different disciplines provide critical data to inform formulation development: process chemistry that provides data associated with the active pharmaceutical ingredient; analytical testing to determine characteristics of the formulation; pharmacokinetic studies to determine how the formulation performs in vivo; and formulation scientists charged with developing the final formulations. In today s electronic lab, each group will (or should) have its own discipline-centric or LIMS. But even so, the formulator will still need to manually consolidate data from reports, presentations, and spreadsheets produced by each group. Moreover, these correlations by necessity occur outside any, often in untracked, unversioned (yet still legally discoverable) documents like and PowerPoint presentations. For all that investment in the SPRING

3 A coherent integrated informatics architecture enables organizations to separate departmental requirements from the needs of the enterprise and maintain flexibility to meet overarching R&D demands. paperless lab, organizations still find that the evidence supporting intellectual property claims may be locked in isolated, unsearchable paper-metaphor records. data collection to information creation are implemented in an interconnected way to address workflow needs in particular domains. Technologies at this level include LIMS, CDS, SDMS, and. Master data management: This layer establishes a common electronic record organization and description to simplify the requirements of patent protection, knowledge preservation, utilization, and compliance. Enabling Lab Flexibility and Operational Information Sharing The reference architecture we have developed separates essential functional capabilities into four discrete levels. The aim is to segment the needs of individual domains and application areas from those of R&D as a whole. The architecture is grounded on a functional laboratory perspective, but is designed to distribute higher level information out of these functions to other departments and, ultimately, across R&D. Our informatics architecture model consists of four parts: Knowledge utilization: Tools such as enterprise search, data mining, portals, and business intelligence are deployed to exploit contained knowledge maintained in structured and unstructured repositories lower in the stack. Knowledge retention: Information across domains is gathered by project, molecular entity, formulation, or other grouping relevant to an organization s R&D goals. Organizations choose which information to retain and how to retain it, such as IP records, regulatory documents, standard operating procedures, training records, etc. Laboratory process execution: Systems needed to manage the process from experiment design to execution and from raw Fig 1 depicts one example of the informatics architecture model. The master data management level provides metadata and controlled vocabulary standards, along with functions for chemical and biological registration. Distinct and LIMS systems in level two address departmental requirements. A subset of structured data is posted to a warehouse, while documents (along with their metadata tags) are posted to the enterprise content management system. Enterprise search and business intelligence applications accessed through a portal merge content from the preservation repositories so that it can be acted upon. Simply transitioning paper processes into an electronic environment may provide short-term efficiency gains, but the underlying paper-based routines will impede crossdepartmental collaboration and hamper R&D decision making. When planning an informatics project, even for an individual project, organizations must consider the needs of information providers as well as information consumers. A coherent integrated informatics architecture enables organizations to separate departmental requirements from the needs of the enterprise and maintain flexibility to meet overarching R&D demands. MC 6 MOLECULAR CONNECTION

4 Informatics Strategies for the Convergent Analytical Lab TECHNOLOGY REVIEW In many labs today, the drive to replace paper has begun pitting two systems against each other. The functionality in LIMS, which entered labs three decades ago to manage laboratory workflows and the structured data produced by scientific instruments, has begun to converge with s, which gained traction last decade as tools for recording the unstructured work associated with experimental planning and execution. It s tempting to see this convergence as an opportunity to invest in just one system to capture data across the whole of R&D. But more accurately, this convergence is exposing the need for organizations to think holistically about enterprise R&D tasks and the tools they employ to expedite them. When we put together an initial laboratory informatics strategy several years ago, our focus was on implementing systems for specific functions that would give users efficiency gains while aiding them in meeting certain specific compliance demands, said Stan Piper, senior scientist at Pfizer. Now, we have three or four distinct systems used for core analytical activities and talking or not talking to each other to varying degrees. Our focus now is on integrating them together. The Evolution of Analytical Lab Informatics Perhaps more than any other R&D discipline, analytical labs appreciate the critical role informatics can play in driving efficiency. Instrumentation dominates in the modern analytical lab, and interacting with sophisticated hardware requires ways to schedule experiments, calibrate equipment, track experimental progress, and collect and transform raw data into information scientists can interpret and act upon. Three decades ago, most instrumentation came with its own basic controller software. But in a sophisticated lab conducting several types of analyses at once, disconnected instrumentation managed by one-off software packages slowed research and prevented scientists from getting an integrated picture of what was happening to a compound. Enter laboratory information management systems (LIMS), which were introduced specifically to interface with multiple laboratory instruments and handle the structured data all of them produced. LIMS was originally developed to manage and track samples with limited results management, said Michael H. Elliott, CEO and chief analyst for Atrium Research, who has spent nearly 30 years in laboratory informatics. It took less than a SPRING

5 Sample Logging Method Development Regulatory Documents Training Records LIMS Test Assignment Methods Lab Procedures Submission Sample Inventory Sample Preparation Instr Data Capture Review / Approval Document Management Instrument Data System Scheduling Std, QC, Sol Preparation Chemical Inventory Scientific Data Archiving Worklist / Plate Gen Prep Results Sourcing Other Systems Data Acquisition Instrument Raw Data Raw Data Archive Results Data Analysis Data Import Analysis Review / Approval Study Reporting Instrument Maintenance Instrument Calibration Resource Planning Fig 1: A bioanalytical workflow illustrating the various points where LIMS,, instrument data, archival, and other scientific data management systems converge. (Graphic courtesy of Atrium Research and Consulting LLC.) 8 MOLECULAR CONNECTION

6 The users find that with the, they can sit down and just start using it, and this has not historically been the case with applications providing such a large amount of functionality. Stan Piper Senior Scientist, Pfizer decade for LIMS vendors to begin adding functionality to manage not just the samples themselves, but associated information and workflow execution. Mechanisms for collecting electronic signatures and performing calculations at key points in the workflow were introduced. Workflows were made more robust to keep pace with demands to increase the throughput of analytical work. Better reporting and visualization tools were added. But as LIMS evolved toward a higher order laboratory resource planning system, many of the bench-level operations were ignored, said Elliott. Meanwhile, particularly as experimental throughput continued to increase and more experimental procedures were automated, scientists began to face more onerous documentation tasks. The more analytical work is completed, the more documentation scientists are required to maintain, not just to describe and record what they ve done, but to prove that they ve done what they said they ve done, said Paul Planje, director of sales at Vialis, a consultancy specializing in total solutions for lab data management. Instrumentation logbooks, lab notebooks, reference sample information in the modern lab with so much instrumentation and heightened regulatory oversight, all of this must be documented and cross-referenced so that scientists can see and trace who has done what, where, with which equipment. Initially, s emerged mainly as paper replacements (see Commentary, p. 4), with most vendors (and their customers) touting them as ways to rid laboratories of unnecessary (and unsearchable) paper-based notebooks. The emphasis was on providing a place to preserve unstructured information, such as write-ups describing experimental plans, methods developed for a particular analysis, or annotations concerning what went wrong, how to improve an experiment, or analysis of final results. Like LIMS, though, s have matured to do much more than their paper counterparts. In fact, Elliott explained that the evolution of looks much like LIMS evolution in reverse, with s today encompassing many mechanisms for dealing with structured data, including calculation functions and reporting and visualization tools. Convergence in the Analytical Lab Elliott defines informatics convergence as the evolution of informatics technologies that occupy a similar environment toward a holistic solution architecture with a uniform interface, eliminating overlapping functionality and independent integrations. Traditionally, LIMS and have played fairly distinct roles in the analytical lab, as illustrated in Fig 1, which shows one way systems might be configured to support a bioanalytical workflow. LIMS dominates most of the workflow because of its strengths in sample and results management. serves as the hub for methods development and sample preparations. Other systems supplement the workflow, such as document management systems and LCMS, CDS, or other instrument data systems. According to Elliott, historical precedent combined with the relative immaturity of s has led many organizations to stretch LIMS beyond these boundaries. Pfizer s Piper, in fact, uses this same terminology (stretching) to describe his company s experience with various analytical systems. Pfizer s Pharmaceutical Sciences groups knew what systems they needed: CDS, LIMS, lab data management, document management, and. But organizational size, the systems costs, and individual market maturity of the products prevented Pfizer s groups from investing in all the systems at once. The company chose to invest first in CDS, then LIMS. A lab data management archive sat alongside these systems to provide additional support for data capture among the standalone systems. was then added when the technology had matured to the point that it was worth the investment. While each system served specific needs, the way we deployed them meant that each system was often stretched to handle tasks SPRING

7 What we definitely know is that an, a LIMS, or a document management system will not bring about step change on its own. These systems may improve efficiency in one area, but it s when they are integrated and when data capture is automated that organizations get the most return. It s maybe that last 5% of the project optimization and integration that brings the ROI. Paul Planje Director of Sales, Vialis outside its original out-of-the-box configuration and often the goal in this stretching was to give the systems more -like capability, explained Piper. For instance, as both the CDS and the LIMS were being implemented, Pfizer scientists still preserved a lot of data and knowledge in paper. We might have the specification comparison performed in the LIMS, but the explanation of how those samples were prepared was still in the paper notebook. The CDS or the LIMS could be stretched to capture some of this information, but ultimately the contextual scientific decision making was still happening on paper rather than in our electronic systems. Planje of Vialis pointed out that such situations are becoming more and more common in R&D organizations and Vialis has built a business around helping customers find the right tool for the task. We do think that the analytical functionality in Symyx Notebook 6 will actually begin to push some LIMS functionality to the background at least that functionality scientists have been doing in LIMS because they haven t had another place to do it, Planje said. But Planje is quick to note that this doesn t mean is now positioned to completely replace LIMS. LIMS, Planje said, excels at collecting and recording detailed experimental data and tracking exactly what was done on an instrument. This data needs to be collected and managed. The addition of an integrated lets organizations be judicious about selecting what information needs to be stored in the notebook and automating the process of getting that information transferred. Scientists know they need certain details to validate experimental procedures, but they often don t care about or need to be bothered with those details, Planje explained. The addition of a well integrated means that scientists can select methods and send them to the LIMS, which runs the experiment and automatically uploads exactly what was done to a notebook running in the background. The work gets done and documented, using the systems for the tasks for which they are best suited. Managing Convergence: The Right Technology for the Right Task Vialis has developed a niche market in helping its customers manage convergence. We understand that the modern lab requires a lot of different applications working together, said Planje. More importantly, what you put in the lab means truly understanding the process and what exactly you need to document. Vialis takes a vendor-neutral approach to informatics strategy, working first with customers to identify exactly what their processes are, what they need to be doing as opposed to what they actually are doing, and how the existing process cost compares to the cost of the proposed process. Once return on investment (ROI) has been demonstrated, best-of-breed applications are selected to manage the various process components. Some organizations, however, don t have the luxury of this type of advance planning, either because of budgets, deployment strategies, or entrenched legacy systems. At Pfizer, for instance, moving to an integrated environment required stretching 10 MOLECULAR CONNECTION

8 that integration in quality control labs can reduce documentation by half, the numbers are harder to run for development labs. Product development timelines are much more extended in this environment, so while documentation may be reduced, it s harder to measure how it impacts specific projects. existing systems and then backing specific functions out of these systems once follow-on informatics were put in place. Piper described the integration process as involving two categories: one for process and one for technical. In Pfizer s case, the process category had to account for how systems had been stretched and how to move functionality into the appropriate system. Now that we have Symyx Notebook, we need to migrate some of the things we d stretched our LIMS and CDS to do into the, Piper said. It requires a lot of attention to how the experiment lifecycle should proceed. Where signatures should be captured, or in which system any specific calculation should be performed these are all important considerations. integrations we need such as with Waters Empower possible. Piper also pointed out that structuring many of the key integrations as Web services or in other distributed ways provides opportunities for more streamlined integrations. It s nice that Symyx is looking into ways to minimize hardware through software design, said Piper. Piper concluded that whether due to the development process or something inherent in the environment itself, scientists have called Symyx Notebook one of the most intuitive applications they ve used. The users find that with the, they can sit down and just start using it, and this has not historically been the case with applications providing such a large amount of functionality. What we definitely know is that an, a LIMS, or a document management system will not bring about step change on its own, Planje said. These systems may improve efficiency in one area, but it s when they are integrated and when data capture is automated that organizations get the most return. It s maybe that last 5% of the project optimization and integration that brings the ROI. The definitely plays an important role and may potentially be one of the most important systems labs have in their arsenal, Planje concluded. s can document methods, collate data, but those parts are nothing when taken in the context of an overall project it s peanuts, said Planje. But if you put the in the center and integrate everything through it to make a completely electronic process, that s when you get real benefits. With everything linked, scientists can see what s going on across experiments, get information not available to them before, and begin taking steps to mine data to make better decisions. MC On the technical side, Piper noted that companies again must choose the implementation that works best in their environment. We can t say enough about the agile development approach that Symyx has employed with Symyx Notebook, Piper said. Paper prototyping and iterative piloting of functionality has made the development of direct The Benefits of Well Managed Convergence Of course, however organizations opt to coalesce functions and integrate the various lab informatics available to them, Planje pointed out that ROI is always paramount. While Vialis can offer financial models and proven calculations to demonstrate For further reading, visit elnanalytical and SPRING