OMOP Specifications for Implementation of Standard Vocabularies in Observational Data Analysis

Similar documents
Validation of a common data model for active safety surveillance research

Interoperability for clinical research Observational Health Data Sciences and Informatics (OHDSI)

A Semantic Normal Form for Clinical Drugs in the UMLS: Early Experiences with the VANDF *

Terminology Needs in Clinical Decision Support. Samson Tu Senior Research Scientist Center for Biomedical Informatics Research Stanford University

Making the most of existing evidence in the OHDSI evidence generation environment

Coding Systems Understanding NDC and HCPCS

Health Data Standards and Open Science Activities at the National Library of Medicine

SNOMED CT to ICD-10 Project

What s New with MedDRA Version 21.0 and the MSSO

SEMANTIC DATA PLATFORM FOR HEALTHCARE. Dr. Philipp Daumke

The role of SNOMED CT in healthcare systems across the world Jan-Eric Slot MB MSc MBA CEO IHTSDO

Comparing Pharmacologic Classes in NDF-RT and SNOMED CT

Health Data Management

CHARGE DESCRIPTION MASTER (CDM)

MEANINGFUL USE CRITERIA PHYSICIANS

TERMS AND DEFINITIONS

SNOMED CT Editorial Guide

PCORI Methodology Standards: Academic Curriculum Patient-Centered Outcomes Research Institute. All Rights Reserved.

LOINC in Regulated Clinical Research a Lab LOINC Steeringg Committee Meeting 08 June 2017

What Mapping and Modeling Means to the HIM Professional

Billing with National Drug Codes (NDCs) Frequently Asked Questions

INVESTOR PRESENTATION FULL YEAR 2007 RESULTS. February 2008

E2B, Safety databases & Eudravigilance

Standardization efforts of digital pathology in Europe

SNOMED CT Editorial Guide

Physician Office Billing & Payment Guide

MND Review of Molecular and Genomic Diagnostic Testing Services Questions & Answers

Clinical Information Interoperability Council (CIIC) Providing a Shared Repository of Detailed Clinical Models for all of Health and Healthcare

Office for Human Subject Protection. University of Rochester

PHYSICIAN OFFICE BILLING INFORMATION SHEET FOR IMLYGIC (talimogene laherparepvec)

Laboratory Tests Chronic Renal Deficiency (CRD) Patients (NCD )

AMBULANCE POLICY. Policy Number: TRANSPORTATION T0 Effective Date: January 1, Related Policies None

Introduction PART. 1 Why Technicians Need to Study Pharmacology and Therapeutics. 2 Pharmacokinetics

1.4 Applicable Regulatory Requirement(s) Any law(s) and regulation(s) addressing the conduct of clinical trials of investigational products.

Off-Label Use of FDA-Approved Drugs and Biologicals

Connecticut Department of Social Services Medical Assistance Program Provider Bulletin. PB July 2008

Doctor of Pharmacy Course Descriptions

Standard Terms. Introduction and Guidance for Use. Version January 2018

STIMULI TO THE REVISION PROCESS

Interested parties (organisations or individuals) that commented on the draft document as released for consultation.

INVESTOR PRESENTATION

Copyright. Jeremiah J. Kelly (2015). All rights reserved. Further dissemination without express written consent strictly prohibited.

Chargemaster Fundamentals for a Solid Revenue Cycle Foundation. November 7, 2012 John Behn

Is FMT A Drug? Lance Shea, M.S., J.D. Washington Square, Suite Connecticut Ave., NW Washington, DC, D

Health Information. for Government. Maximize the Value of Your Health Information Exchange

Expanded Access and the Individual Patient IND

Official Letter from the DOH

GUIDANCE FOR INDUSTRY ON FIXED DOSE COMBINATIONS (FDCs)

HIV/AIDS Programme USER GUIDE DRUG REGULATORY STATUS DATABASE (DRS) Version 2

Communicating Emerging Drug Therapies Prior to FDA Approval. May 4, 2017

Glossary of Abbreviations

I need a medicine while

HIKMA PHARMACEUTICALS PLC. Merrill Lynch Middle East & North Africa (MENA) Conference 13 December r 2005

Pega Care Management for Healthcare

Guidance for Industry New Chemical Entity Exclusivity Determinations for Certain Fixed-Combination Drug Products

Working with Health IT Systems is available under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 Unported license.

Pharmacoepidemiology. Dr. Mahyar Etminan Scientist Department of Medicine, UBC

Clinical Data Acquisition Standards Harmonization (CDASH)

Abstract. Technical Aspects. Applying GastroPlus for Extensions of Biowaivers for BCS Class II Compounds 2

Advance Topics in Pharmacoepidemiology. Risk Management. Conflict of Interest Declaration. Benefit Harm Profile?

The Emerging Technology Program: FDA s Perspective

Testimony of Christopher Newton-Cheh, MD, MPH Volunteer for the American Heart Association

Supplementary Materials Developing a Prototype System for Integrating Pharmacogenomics Findings into Clinical Practice

ICD Revision Process: Beta Phase and Finalization

Guidelines on procedures and data requirements for changes to approved biotherapeutic products

The Lancet Publishes Results from the Landmark Phase III Rivaroxaban Study RECORD2

Scottish Medicines Consortium

SNOMED CT in the Age of Precision Medicine

Guidelines on procedures and data requirements for changes to approved biotherapeutic products. Proposed guidelines

Visions of new responsibilities and opportunities for terminologists and terminology centres

STATEMENT SANDRA KWEDER, M.D DEPUTY DIRECTOR, OFFICE OF NEW DRUGS CENTER FOR DRUG EVALUATION AND RESEARCH U.S. FOOD AND DRUG ADMINISTRATION BEFORE THE

"NOT FOR IMPLEMENTATION" GUIDANCE FOR INDUSTRY

Oracle Fusion Applications Workforce Development Guide. 11g Release 1 (11.1.4) Part Number E

Streamline Pre-Analytic Orchard Pathology Processes

LifeCycle Pharma A/S IMPROVING TREATMENTS IMPROVING LIVES. March 2011

TOTAL CANCER CARE: CREATING PARTNERSHIPS TO ADDRESS PATIENT NEEDS

to The Uganda Gazette No. 18 Volume CVII dated 28th March, 2014 Printed by UPPC, Entebbe, by Order of the Government No. 29.

Calendar Year 2018 Medicare Hospital Outpatient Prospective Payment System Proposed Rule

Cindy Kehr Director of Purchasing & Pharmacy Services Keystone Rural Health Center

Comments from the FDA Working Group on SUBGROUP ANALYSES. Estelle Russek-Cohen, Ph.D. U.S. Food and Drug Administration Center for Biologics

Using local RWD to drive global therapeutic advancements.

Guidelines: c. Providing the investigational drug for the requested use will not interfere with the initiation, conduct, or completion of clinical

Mircea Ciuca, MD Global Head Medical & Clinical Drug Safety

DEMONSTRATING YOUR MEDICINE S VALUE TO ALL STAKEHOLDERS TRUSTED COMMERCIALIZATION AND MARKET ACCESS EXPERTISE

HEALTH TECHNOLOGY ASSESSMENT GUIDELINES

University Of Pathology Informatics Certificate Of Completion Program

Integrated Healthcare Association Value-Based Pay-for-Performance Program. Audit Review Guidelines Measurement Year Released November 2017

ICH Topic E16 Genomic Biomarkers Related to Drug Response: Context, Structure and Format of Qualification Submissions. Step 3

Public release of clinical information in drug submissions and medical device applications

Eliminating Infusion Confusion

Investor Presentation: Phase II Data for LCP Tacro. March 3, 2008

Decentralised Procedure. Public Assessment Report

CDASH Clinical Data Acquisition Standards Harmonization

Industry Perspective on Manufacturing in Early Development

Due diligence in the European medical devices industry

A STUDY ON VARIATIONS IN PHARMACEUTICAL PRODUCTS IN PHILIPPINES AND VARITAION POLICIES IN US, CANADA, AUSTRALIA

Guidance on use of SNOMED CT and LOINC together

Transcription:

OMOP Specifications for Implementation of Standard Vocabularies in Observational Data Analysis June 2010

- License 2010 Foundation for the National Institutes of Health (FNIH). Licensed under the Apache License, Version 2.0 (the "License"); you may not use this document except in compliance with the License. You may obtain a copy of the License at http://omop.fnih.org/publiclicense. Unless required by applicable law or agreed to in writing, documentation and software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. Any redistributions of this work or any derivative work or modification based on this work should be accompanied by the following source attribution: "This work is based on work by the Observational Medical Outcomes Partnership (OMOP) and used under license from the FNIH at http://omop.fnih.org/publiclicense. Any scientific publication that is based on this work should include a reference to http://omop.fnih.org. 2010 Foundation for the National Institutes of Health Page 2

Table of Contents 1. Abbreviations 5 2. Introduction 7 2.1. Background 7 2.2. Definitions of Terms 8 2.3. Dictionary Representation in the CDM 9 3. Vocabularies 10 3.1. Overview 10 3.2. Vocabulary Life Cycle 12 3.3. Drug Domain 12 3.3.1. Vocabularies 12 3.3.2. Implementation of RxNorm 13 3.3.3. Implementation of Ambiguous Drug Concepts 16 3.3.4. Implementation of NDF-RT 17 3.3.5. Implementation of ATC 19 3.3.6. Implementation of ETC 20 3.3.7. Implementation of FDB Indication and Contra-Indication 21 3.3.8. Implementation of OMOP Drugs of Interest 22 3.3.9. Levels 22 3.3.10. Relationships 22 3.3.11. Mapping 24 3.4. Condition Domain 25 3.4.1. Vocabularies 25 3.4.2. Implementation of SNOMED-CT for Conditions 27 3.4.3. Implementation of MedDRA and ICD-9-CM for Conditions 27 3.4.4. Implementation of Ambiguous Condition Concepts 27 3.4.5. Implementation of OMOP Health Outcomes of Interest 27 3.4.6. Implementation of MedDRA SMQs 28 3.4.7. Levels 28 3.4.8. Relationships 28 3.4.9. Mapping 31 2010 Foundation for the National Institutes of Health Page 3

3.5. Procedure Domain 32 3.5.1. Vocabularies 32 3.5.2. Implementation of HCPCS, CPT-4 and ICD-9-Procedure 32 3.5.3. Implementation of SNOMED-CT for Procedures 33 3.5.4. Levels 33 3.5.5. Relationships 33 3.5.6. Mapping 34 3.6. Demographic Domain 34 3.6.1. Vocabularies 34 3.6.2. Implementation, Levels, Relationships and Mapping 35 3.7. Observation Domain 36 3.7.1. Vocabularies 36 3.7.2. LOINC Implementation, Levels, Relationships and Mapping 36 3.7.3. Qualitative Lab Results 37 3.7.4. Implementation of UCUM Levels, Relationships and Mapping 37 3.7.5. Implementation of SNOMED-CT for Observations 38 3.7.6. Levels 38 3.7.7. Relationships 38 3.7.8. Mapping 38 3.8. Visit Domain 39 3.8.1. Vocabularies 39 3.8.2. Implementation of CMS Place of Service Codes 39 3.8.3. Implementation of OMOP Visits 40 3.8.4. Levels 40 3.8.5. Relationships 41 3.8.6. Mapping 41 3.9. Observation Period Domain 41 3.9.1. Vocabularies, Implementation, Levels, Relationships, Mapping 41 4. Appendices 42 4.1. Appendix A: Concepts, Vocabularies, Classes and Levels 42 4.2. Appendix B: Relationships 46 2010 Foundation for the National Institutes of Health Page 4

1. Abbreviations Abbreviation Term Description ATC CCAE CDC CMS CPT-4 Anatomical Therapeutic Classification MarketScan Commercial Claims and Encounters Database Centers for Disease Control and Prevention Center for Medicare and Medicaid Current Procedural Terminology Drug classification developed by the World Health Organization (WHO) Collaborating Centre for Drug Statistics Methodology Product of Thomson Reuters, large insurance claims database U.S. government institution U.S. government institution Procedure terminology developed by the American Medical Association CTV3 Clinical Terms Version 3 Drug terminology developed by the National Health System Centre for Coding and Classification DOI Drug of Interest Drug classes used in OMOP research ETC FDA Enhanced Therapeutic Classification Food and Drug Administration Drug classification developed by First DataBank U.S. government institution FDB First DataBank Commercial provider of drug terminologies and classifications GE General Electric Provider of an electronic health record database GPI Product Identifier Identifier in the Medi-Span drug terminology HCPCS Healthcare Common Procedure Coding System Procedure terminology developed by the Centers for Medicare & Medicaid Services HL7 Health Level 7 Standards for the electronic interchange of health care data HOI Health Outcomes of Interest Outcomes used in OMOP research ICD-9 International Classification of Diseases, 9 th Revision Disease terminology developed by the World Health Organization (WHO) ICD-9-CM ICD-9 Clinical Modification Modification of the ICD-9 by the Centers for Disease Control and Prevention (CDC) for officially assigning diagnosis codes in the U.S. 2010 Foundation for the National Institutes of Health Page 5

Abbreviation Term Description ICD-9- Procedure LOINC ICD-9 Procedure Logical Observation Identifiers Names and Codes Modification of the ICD-9 by the CDC for officially assigning procedure codes in the U.S. Terminology for observations developed by the Regenstrief Institute MDCD MarketScan Medicaid Product of Thomson Reuters, large Medicaid database from several U.S. states MDCR MedDRA MarketScan Medicare Supplemental and Coordination of Benefits Medical Dictionary for Regulatory Activities Product of Thomson Reuters, large database of Medicare supplemental insurance paid for by employers Condition Terminology developed by the International Federation of Pharmaceutical Manufacturers and Associations (IFPMA) MSLR MarketScan Lab Database Product of Thomson Reuters, large database with lab data NDC National Drug Code Drug code list maintained by the FDA NDDF National Drug Data File Drug terminology developed by First DataBank NDF-RT NLM OXMIS RxNorm SNOMED CT UCUM VA National Drug File Reference Terminology National Library of Medicine Oxford Medical Information System Systematized Nomenclature of Medicine - Clinical Terms Unified Code for Units of Measure U.S. Department of Veterans Affairs (VA) Drug classification developed by the VA, NLM and Appleton U.S. government institution, medical library developing a number of medical terminology Disease Terminology developed in the United Kingdom Drug terminology developed by the National Library of Medicine Comprehensive clinical terminology developed by the International Health Terminology Standard Organization Unit code system develop by the UCUM Organization U.S. government organization VA-NDF VA National Drug File Drug Terminology developed by the VA 2010 Foundation for the National Institutes of Health Page 6

2. Introduction This document reflects both the requirements and implementation for the Standard Vocabulary as part of the OMOP Common Data Model. It does not discuss the different vocabulary standards that have emerged in the area and the rationale for choice. This will be laid out in a future document. The document is composed of six main sections, each representing a vocabulary domain: Drugs Conditions Procedures Demographics Observations Visits For each section, the Standard Vocabulary, consisting of terminology and classifications are defined and their characteristics discussed. Mappings from commonly used terminologies and classifications to the standard dictionaries are reviewed. The intended audiences for this document are researchers and developers who want to utilize the Common Data Model for drug-outcome research. 2.1. Background The (OMOP) is a public-private partnership designed to protect human health by improving the monitoring of drugs for safety and effectiveness. The partnership will conduct a two-year research initiative to determine whether it is feasible and useful to identify and evaluate safety issues of drugs on the market. The Partnership s methodological research will be conducted across multiple disparate observational databases (administrative claims and electronic health records) and plans to engage in collaborations with qualified organizations in a number of different ways: The Partnership is funding data provider organizations to participate in the initiative, either as a Research Core contributor by providing de-identified patient-level data into OMOP s centralized IT research environment, or as a distributed partner conducting the analysis within its organization and reporting back aggregate summary results to the Partnership. In addition, the Partnership will be encouraging other organizations with access to observational data to participate in conducting supplementary analyses as part of an Extended Consortium. In order to facilitate this research, there is a need to develop a common structure and framework for organizing and standardizing observational data that will enable the consistent application of analyses across disparate data sources. A Common Data Model and a method for standardizing its content (via Standard Vocabulary) will ensure methods can be systematically applied to produce meaningfully comparable results across sources. More generally, it is recognized that no single observational data source is likely to be sufficient to meet all expected drug safety analysis needs, so it can be anticipated that solutions will be required that can analyze combined (integrated) disparate sources. The Partnership will study the use of a Common Data Model and 2010 Foundation for the National Institutes of Health Page 7

a Standard Vocabulary as potential enabling technologies for its own methodological research and with intentions for future applications to support broader pharmacovigilance activities. The Standard Vocabulary does not just enable the mechanism for transforming raw data into standardized data. It also plays a role in searching and querying the transformed data in a CDM database, browsing and navigating the hierarchies of classes and abstractions inherent in the transformed data, and interpreting the results returned by those operations. Indeed, the ability to efficiently and effectively generate actionable information from a CDM database depends, to a large degree, on the Standard Vocabulary and its relationship to the CDM. The Standard Vocabulary contains all of the code sets, terminologies, vocabularies, nomenclatures, lexicons, thesauri, ontologies, taxonomies, classifications, abstractions, and other such data that are required for: 1. Creating the transformed (i.e., standardized) data from the raw data sets, 2. Searching and querying the transformed data, and browsing and navigating the hierarchies of classes and abstractions inherent in the transformed data, and 3. Interpreting the meanings of the data. Researchers will have the ability to query a CDM database for classes of concepts (i.e., classifications) without the need to know the individual concepts that those classes subsume. For example, a query for the common data repository for all drugs within a specific therapeutic class. For example the search for antihypertensive drugs will return all drugs that fall in that class. Similarly, a query of CDM database for a particular medical condition will retrieve all individual diagnoses that comprise that medical condition. 2.2. Definitions of Terms For purposes of the OMOP Common Data Model, the following terms are used: Term Standard Vocabulary Vocabulary Domain Vocabulary Terminology Classification Concept Description Contains all of the below in a set of tables A semantic category defined for OMOP purposes that are need for drug outcome research, like Drug, Condition, Procedure, etc. A combination of terminologies and classifications that belong to a Vocabulary Domain A controlled list of concepts, such as a list of conditions A hierarchical system of concepts and concept relationships that defines semantically useful classes, like chemical structures for drugs Basic unit of information defined in the vocabularies 2010 Foundation for the National Institutes of Health Page 8

2.3. Dictionary Representation in the CDM Vocabulary information is represented in the CDM in the following tables: Table CONCEPT CONCEPT_SYNONYM CONCEPT_ RELATIONSHIP SOURCE_TO_ CONCEPT_MAP VOCABULARY_REF CONCEPT_ANCESTOR Description A list of all valid terminology concepts across domains and their attributes. Concepts are derived from existing standards as described below. A table with synonyms for concepts that have more than one valid name or description A list of relationship between concepts. Some of these relationships are generic (e.g. Subsumes relationship), others are domain-specific. A map between commonly used terminologies and the CDM Standard Vocabulary. For example, drugs are often recorded as NDC, while the Standard Vocabulary for drugs is RxNorm. A list of terminologies that makes the CDM Standard Vocabulary, see table 1. A specialized table containing only hierarchical relationship between concepts that may span several generations. This document discusses the content of the tables. For a detailed discussion of the technical specifications of these database tables, see the OMOP CDM Specifications. 2010 Foundation for the National Institutes of Health Page 9

3. Vocabularies 3.1. Overview OMOP defines Standard Vocabularies (terminologies and classification systems) based on their origin as a national or international existing standard. Several terminologies belong to a vocabulary domain. With the exception of SNOMED, vocabularies belong to a single domain and are only used in their specific tables. Vocabularies used in external data sources but which are not defined as standards are mapped to the standard in the SOURCE_TO_DATA_MAP table. Vocabulary Domain Condition, Observation, Procedure, Person Status Vocabulary Name SNOMED-CT Vocabulary Type Terminology, Classification Drug RxNorm Terminology, Classification Vocabulary Code Standard Vocabulary Used in CDM table 01 X CONDITION_ OCCURRENCE, CONDITION_ERA, OBSERVATION, OBERSVATION_PERIOD 08 X DRUG_EXPOSURE, DRUG_ERA Drug NDF-RT Classification 07 X DRUG_ERA Drug FDB indications Terminology 19 X Drug ETC Classification 20 X Drug ATC Classification 21 X Drug Multilex Terminology 22 Drug Multum Terminology 16 Drug GPI Terminology 10 Drug NDC Terminology 09 Drug VA-NDF Terminology 28 Drug OMOP Intermediate Terminology 54 X DRUG_EXPOSURE, DRUG_ERA Drug Drug Drugs of Interest Classification 30 OMOP_DRUG_ERA, DOI_ERA Condition ICD-9-CM Terminology 02 Condition MedDRA Terminology, Classification 15 X Condition Non-condition, non-procedure OMOP Intermediate Condition OMOP Intermediate Terminology 53 X CONDITION_ OCCURRENCE, CONDITION_ERA Terminology X CONDITION, OBSERVATION 2010 Foundation for the National Institutes of Health Page 10

Vocabulary Domain Vocabulary Name Vocabulary Type Vocabulary Code Standard Vocabulary Used in CDM table Condition Read Terminology 17 Condition OXMIS Terminology 18 Condition Health Outcomes of Interest Classification 29 HOI_ERA, OMOP_CONDITION_ERA Procedure ICD-9- Procedure Terminology 03 X PROCEDURE_ OCCURRENCE Procedure CPT-4 Terminology 04 X PROCEDURE_ OCCURRENCE Procedure HCPCS Terminology 05 X PROCEDURE_ OCCURRENCE Procedure OMOP Intermediate Procedure Terminology 57 X PROCEDURE_ OCCURRENCE Observation LOINC Terminology 06 X OBSERVATION Observation UCUM Terminology 11 X OBSERVATION Demographic HL7 Administrative Sex Terminology 12 X PERSON Demographic CDC Race/Ethnicity Terminology 13 X PERSON Demographic Zip Code Terminology 25 X PERSON Demographic Demographic Visit U.S.Census Region U.S. State or Territory CMS Place of Service Terminology 26 X PERSON Terminology 27 X PERSON Terminology 14 X VISIT_OCCURRENCE Visit Visit Terminology 24 X VISIT_OCCURRENCE Various GE Terminology 51 OBSERVATION, PERSON, VISIT, OBSERVATION_PERIOD Various Thomson Terminology 52 OBSERVATION, PERSON, VISIT, OBSERVATION_PERIOD Any No matching concept Any 00 Any Table 1. Standard Vocabularies and Vocabulary Domains for OMOP CDM. For a detailed list of vocabularies, classes and counts in the CONCEPT table see Appendix A. Vocabulary code is a catch-all for intermediate concepts mapped from OXMIS or Read that are not Clinical findings or Procedures (see below). Vocabulary code 00 contains one record: Concept 0 representing a code that cannot be matched. For technical reasons this concept 2010 Foundation for the National Institutes of Health Page 11

code is used instead of a NULL.. Vocabularies 51 and 52 contain all proprietary codes for General Electric Centricity and Thomson-Reuters MarketScan data. 3.2. Vocabulary Life Cycle All vocabularies that have been incorporated into the OMOP Standard Vocabulary system in a release at or before June 2009. However, data change over time, either because the underlying domain changes (e.g. new drugs, drugs off the market, new disease definitions), or as a continuous improvement of the product. In either case, for the purpose of OMOP research there is currently no provision to accommodate updates, changes, versions, modifications, or making data obsolete. All data are fixed and final. 3.3. Drug Domain 3.3.1. VOCABULARIES The standard drug vocabulary in the OMOP CDM consists of the following components (figure 1): Drug reference terminology: RxNorm 1, maintained by the National Library of Medicine (NLM) vocabulary code 01. Classifications for mechanism of action, physiological effect, chemical structure, indication: National Drug File, Reference Terminology (NDF-RT) 2, developed by a consortium of the NLM, the Department of Veteran s Affairs (VA) and Appelon, Inc. vocabulary code 07. Two classifications for therapeutic class: The Anatomical Therapeutic Chemical (ATC) 3 classification maintained by the WHO Collaborating Centre for Drug Statistics Methodology, vocabulary code 21 and the Enhanced Therapeutic Classification (ETC) 4, maintained by First Databank (FDB) vocabulary code 20. Indication and contra-indications: FDA-approved and off-label indications as well as contraindications are provided by First DataBank (FDB) 5 vocabulary code 19. Drugs of Interests: Ten special definitions that represent drug classes OMOP research is focusing on vocabulary code 30. RxNorm is used to define individual drugs as well as their ingredients. For these RxNorm-based concepts, a number of drug classes exist, some for the ingredient (NDF-RT-derived) and some for the clinical drugs (ATC and ETC). In addition to that, NDDF-based indication and contraindications are linked to the clinical drugs. All other drug nomenclatures that may be utilized in the various source data are mapped to RxNorm. 1 RxNorm documentation, http://www.nlm.nih.gov/research/umls/rxnorm/docs/index.html 2 Lincoln-M, National Drug File Reference Terminology (NDF-RT), Department of Veterans Affairs ; Salt Lake City, UT, February, 2008. 3 WHO Collaborating Centre for Drug Statistics Methodology, http://www.whocc.no/atc/structure_and_principles/ 4 NDDF PLUS Documentation, Nov 2009, First Databank Inc. 5 Ibid. 2010 Foundation for the National Institutes of Health Page 12

Top-level concepts (Level 4) NDF-RT NDF-RT ATC ATC ETC ETC Indications Indications and and CI CI Classifications (Level 3) NDF-RT NDF-RT ATC ATC ETC ETC Indications Indications and and CI CI Ingredients (Level 2) Low-level drugs (Level 1) RxNorm RxNorm RxNorm RxNorm Mapping Existing De Novo Derived Source codes NDC NDC GPI GPI Multilex Multilex VA-NDF VA-NDF Multum Multum ICD-9-PCS* ICD-9-PCS* CPT-4* CPT-4* * Procedure Drugs Figure 1: OMOP Standardized Terminology and Classification of Drugs. 3.3.2. IMPLEMENTATION OF RXNORM RxNorm is structured into elements that reflect the active ingredients, strengths, and dose form comprising each drug (Table 2). For each element, a separate RxNorm concept is defined. Element Definition Examples Ingredient Semantic Clinical Drug Semantic Branded Drug Brand Name Pack A compound or moiety that gives the drug its distinctive clinical properties. Ingredient plus strength and dose form. Aspirin Aspirin 500 MG Oral Tablet Ingredient, strength, and dose form plus Aspirin 500 MG Oral Tablet [Bayer Aspirin] brand name. Branded Drug Delivery Device. {24 (Acetaminophen 500 MG / Diphenhydramine 25 MG Oral Tablet [Tylenol Extra Strength P.M.]) / 50 (Acetaminophen 500 MG Oral Tablet [Tylenol]) } Pack [Tylenol Extra Strength Day and Night Value Pack] Pack Drug Delivery Device. {24 (Acetaminophen 500 MG / Diphenhydramine 25 MG Oral Tablet) / 50 (Acetaminophen 500 MG Oral Tablet) } Pack Brand Name A proprietary name for a family of products containing a specific active ingredient. Bayer Aspirin Precise Ingredient A specified form of the ingredient that may or may not be clinically active. Most precise ingredients are salt or isomer forms. Acetylsalicylate Sodium 2010 Foundation for the National Institutes of Health Page 13

Element Definition Examples Dose Form Semantic Clinical Drug Component Semantic Branded Drug Component Semantic Clinical Drug Form Semantic Branded Drug Form The physical form of a drug intended for administration or consumption. Ingredient plus strength see section on Rules and Conventions, below, for units of measurement and for rules pertaining to the calculation of strengths. Branded ingredient plus strength. Ingredient plus dose form. Branded ingredient plus dose form. Table 2. Concept types and examples as defined by RxNorm. Oral Tablet Aspirin 500 MG Aspirin 500 MG [Bayer Aspirin] Aspirin Oral Table Aspirin Oral Tablet [Bayer Aspirin] All RxNorm elements are imported into the vocabulary CONCEPT table for the convenience of researchers navigating drug information and codes. However, only five elements are used in the Standard Vocabulary: Semantic Clinical Drug and Semantic Branded Drug as well as Delivery Devices (branded and generic packs) are the low-level drug concepts (Level 1). These low-level concepts report into Ingredients, which are implemented as parent concepts (Level 2). All other RxNorm elements are not part of the Standard Vocabulary, are not used for mapping and classification, and therefore do not have a class level designation (Level 0). RxNorm defines a number or reciprocal relationships. For example, the RxNorm relationship has_ingredient defines Ingredients for Clinical Drug Components, and ingredient_of defines the Clinical Drug Component for each Ingredient. In the Standard Vocabulary, relationships are only captured unilaterally, in this example only RxNorm Has ingredient is maintained (see Table 3). 2010 Foundation for the National Institutes of Health Page 14

Relationship Relationship description Defines relationship between Type 002 RXNORM Has precise Brand Name and Ingredient Variant ingredient 003 RXNORM Has trade name Branded and Clinical Drug and Component 004 RXNORM Has dose form Dose Form and Clinical/Branded Drug/Form 005 RXNORM Has form Ingredient and Ingredient Variant 006 RXNORM Has ingredient Clinical/Branded Drug Component and Ingredient/Brand Name 007 RXNORM Constitutes Clinical Drug Component and Branded Drug/Pack 008 RXNORM Contains Clinical/Branded Pack and Clinical/Branded Drug 009 RXNORM Reformulation of Brand Name to Brand name Table 3. Subset of RxNorm relationships in the Standard Vocabulary. For a detailed list of relationships see Appendix B. In addition, a new Relationship Type 103 OMOP Has Ingredient has been defined. It connects all Level 1 concepts (Clinical and Branded Drugs and Packs) with the Level 2 Ingredients. It is important to note that this relationship is therefore somewhat different from the original 006 RxNorm Has ingredient relationship, which connects Ingredients to Clinical Drug Components and Brand names to Branded Drug Components. OMOP Has Ingredient is also the only relationship that is used for ancestry definition in the CONCEPT_ANCESTOR table, as none of the original RxNorm relationships are hierarchical. The resulting Standard Drug Terminology structure derived from RxNorm is shown in Figure 2. OMOP Has Ingredient Ingredient Ingredient Has Ingredient Clinical Clinical Drug Drug Component Component Relationship Hierarchy defining ancestry Constitutes Constitutes Clinical Clinical Drug Drug Source Source codes codes Clinical/Branded Clinical/Branded Pack Pack Has Trade Name OMOP OMOP Intermediate Intermediate Source Source codes codes Branded Branded Drug Drug maps to maps to maps to Constitutes Reformulation of Source Source codes codes Has Form Has Dose Form Has Ingredient Form Form Dose Dose Form Form Brand Name Brand Name Figure 2. RxNorm implementation. Structures in bold belong to the Standard Vocabulary: Level 1 (Clinical/Branded Drugs/Packs, Concepts, see below) and 2 (Ingredients) as well as the OMOP Has Ingredient relationship between them. All the other RxNorm elements are loaded into the CONCEPT and RELATIONSHIP tables, but are not part of the Standard Vocabulary. 2010 Foundation for the National Institutes of Health Page 15

In OMOP release v3.0, there are 52,115 Clinical and Branded Drugs and Packs and 3,869 Ingredients in the Standard Vocabulary. 3.3.3. IMPLEMENTATION OF AMBIGUOUS DRUG CONCEPTS RxNorm has an implementation for drugs that consist of several active ingredients. However, not all combinations that are available or have been available on the market in the past are represented as distinct Clinical or Branded Drugs. In such cases, an Drug Concept is introduced (vocabulary_concept_code= 54 ) as a placeholder. In addition, often times source codes cannot be mapped unambiguously to a single RxNorm Clinical Drug, and the same Intermediate Drug concept is introduced. This Intermediate concept is Level 1 and is connected through the Drug Concept To RxNorm relationship to the composite Clinical and Branded Drugs (mostly Level 1 as well, but could be Ingredient Level 2): Clinical Drug (Level 1) Clinical Clinical Drug Drug Clinical Clinical Drug Drug Intermediate Drug (Level 1) Source codes Intermediate Intermediate Source Source code code Figure 3. Implementation of Intermediate Drug Concepts for drugs with ambiguous content. For example, the GPI code 65990002220320 stands for Oxycodone w/ Aspirin Tab Full Strength. It cannot be mapped to a single RxNorm drug. Instead, it is mapped to an equivalent Intermediate Concept 1238796, which in turn has relationship to the alternative RxNorm concepts Aspirin 325 MG / Oxycodone Hydrochloride 4.5 MG / oxycodone terephthalate 0.38 MG Oral Tablet, Aspirin 325 MG / Oxycodone 4.88 MG Oral Tablet and Aspirin 325 MG / Oxycodone 5 MG Oral Tablet. This relationship between these concepts is code 102 OMOP Intermediate Drug Concept To RxNorm. There are a total of 2121 Intermediate Drug Concepts in the CONCEPT table. 2010 Foundation for the National Institutes of Health Page 16

3.3.4. IMPLEMENTATION OF NDF-RT NDF-RT is a classification system of drugs. Similarly to RxNorm, it defines drugs defined by ingredient, strength and form as VA_Product, which is part of the VA National Drug File (VA_NDF). The next level above VA_Product is Ingredients. For each ingredient, relationship to higher-level drug classes are defined (Table 4). Relationship Type Relationship Description Hierarchical Defines for each ingredient 010 Subsumes X Hierarchical relationship 011 NDFRT Has DoseForm Dose forms, similar to RxNorm Dose Form 012 NDFRT Induces Adverse side effect 013 NDFRT May Diagnose Drugs used for diagnosis 014 NDFRT Has PE X Physiological effect 015 NDFRT CI PE Pathological effect 016 NDFRT Has Ingredient X Pharmaceutical class 017 NDFRT CI ChemClass Chemical class of side effect 018 NDFRT Has MoA X Mechanism of action 019 NDFRT CI MoA Mechanism of action of side effect 020 NDFRT Has PK Pharmacokinetic effect (excretion, metabolization) 021 NDFRT May Treat Indication 022 NDFRT CI With Contraindication 023 NDFRT May Prevent Indication 024 NDFRT Has Active Metabolites Active metabolites 025 NDFRT Site of Metabolism Site of metabolism 026 NDFRT Effect May Be Inhibited By Drug-drug interaction 027 NDFRT Has Chemical Structure X Chemical structure of pharmaceutical class Table 4. NDF-RT relationships as reflected in the RELATIONSHIP_TYPE. Drug class hierarchies are defined on the basis of these relationships. The relationships defined as hierarchical are used for ancestry definition (see below). For a detailed list of relationships see Appendix B. All NDF-RT concepts are loaded into the CONCEPT table, but not all of them belong to the Standard Vocabulary. VA_NDF codes provide a duplication with RxNorm, and are therefore not Standard Vocabulary (Level 0 and not included in ancestry definitions). However, VA_NDF are part of the mapping from source vocabularies to RxNorm (see below). The map between NDF- RT and RxNorm ingredients are defined in NDF-RT through the RXN RELA relationship. All relationships that are defined for NDF-RT Ingredients are transferred to the corresponding RxNorm ingredients, generating one common Drug Domain. 2010 Foundation for the National Institutes of Health Page 17

As a result of this transformation, NDF-RT classes become parent concepts (Level 3 concepts) of the RxNorm Ingredients (Level 2 concepts). An exception is the top level ancestor classes, e.g. Physiological Effect, which are defined as Level 4 concepts. For example, table 5 lists of the NDF-RT-based ancestors of Aspirin, CONCEPT_ID 1112807: Concept ID Concept Name Concept Level Concept Class 4325480 Aspirin 3 Chemical Aspirin as chemical 4326615 Benzoic Acids 3 Structure 4351638 Carboxylic Acids 3 43505 Hydroxy Acids 3 4351563 Hydroxybenzoic Acids 3 4349709 Phenols 3 4349977 Salicylic Acids 3 4340569 Chemical Ingredients 4 Top concept 4257751 Arthritis 3 Indication or contraindication 4341935 Arthritis, Rheumatoid 3 4266435 Bacterial Infections 3 4341703 Brain Ischemia 3 4341697 Cardiovascular Diseases 3 4341020 Cerebral Infarction 3 4342229 Fever 3 4342290 Gout 3 43404 Gram-Positive Bacterial Infections 3 4342143 Hypertension 3 4344648 Ischemic Attack, Transient 3 4342462 Joint Diseases 3 4347016 Myocardial Infarction 3 4343250 Nutritional and Metabolic Diseases 3 4344441 Osteoarthritis 3 4266421 Pain 3 4343368 Pre-Eclampsia 3 4347835 Rheumatic Diseases 3 4345158 Streptococcal Infections 3 Note Various conditions that are declared as indications. The conditions have no direct relationship to ICD-9- CM, SNOMED or any other condition vocabulary. 4341296 Diseases, Manifestations or Physiologic States 4 Top concept 4324016 Cyclooxygenase Inhibitors 3 Mechanism of 4324560 Enzyme Inhibitors 3 Action 4325133 Cellular or Molecular Interactions 4 Top concept 4319025 Decreased Coagulation Activity 3 Physiologic 4333583 Decreased Eicosanoid Production 3 Effect 4318286 Decreased Immunologic Activity 3 2010 Foundation for the National Institutes of Health Page 18

Concept ID Concept Name Concept Level Concept Class Note 4334140 Decreased Platelet Activating Factor 3 Production 4327834 Decreased Platelet Aggregation 3 4334142 Decreased Prostaglandin Production 3 4334295 Decreased Thromboxane Production 3 4331211 Physiological Effects 4 Top concept Table 5. Example of NDF-RT-derived relationships and drug classes for Aspirin in the Standard Vocabulary. Concept 1112807 "Aspirin" is the RxNorm Ingredient for Aspirin (level 2), while concept 4325480 "Aspirin" is the lowest level NDF-RT Pharmaceutical class (level 3), which happens to have the same concept name. Currently, each ingredient has on average 54.4 hierarchical classes of the various levels assigned to it, and each class concept has on average 10.2 ingredient members and 480 individual drug members. 3.3.5. IMPLEMENTATION OF ATC Within ATC, drugs are divided into fourteen anatomical main groups (1st level), with one pharmacological/therapeutic subgroup (2nd level). The 3rd and 4th levels are chemical/pharmacological/therapeutic subgroups and the 5th level is the chemical substance. All ATC concepts are loaded into the standard vocabulary with a concept vocabulary code of 21 and are assigned a concept class of Anatomical Therapeutic Chemical Classification and a Concept level of 3. The hierarchic relationships between the ATC concepts are captured using the Subsumes concept relationship (relationship type 010). RxNorm Clinical drugs are tied to ATC classification system using a separate hierarchical concept relationship (relationship type 131) that ties an RxNorm Clinical Drug concept to one or more ATC concept. As a result, ATC concepts become ancestors of RxNorm clinical drug concepts. Relationship Type Relationship Description Hierarchical Notes 010 Subsumes X Hierarchical relationship among ATC concepts 131 WHO ATC to RxNorm X Hierarchical relationship between RxNorm clinical drugs and low level ATC concepts Table 6. List of ATC relationships as reflected in the RELATIONSHIP_TYPE. For a detailed list of relationships see Appendix B. 2010 Foundation for the National Institutes of Health Page 19

Note the contrast to the NDF-RT implementation. ATC classes are defined for Clinical Drugs, while NDF-RT classes are defined for Ingredients. As an example following are the ATC classification concepts for the RxNorm clinical drug concept ID 1545996, which is Atorvastatin 40 MG Oral Tablet : Concept ID Concept Name Concept Level Concept Code Concept Class 216018 HMG COA REDUCTASE INHIBITORS 3 C10AA Anatomical Therapeutic Chemical Classification 21601854 LIPID MODIFYING AGENTS, PLAIN 3 C10A Anatomical Therapeutic Chemical Classification 21601853 LIPID MODIFYING AGENTS 3 C10 Anatomical Therapeutic Chemical Classification 21601237 CARDIOVASCULAR SYSTEM 3 C Anatomical Therapeutic Chemical Classification Table 7. Example of ATC-derived relationships and drug classes for Atorvastatin 40 mg oral tablet in the Standard Vocabulary. 3.3.6. IMPLEMENTATION OF ETC The ETC system is a therapeutic classification system and all concepts are assigned a concept class of Enhanced Therapeutic Classification and a concept level of 3. The hierarchic relationships between the ETC concepts are captured using the Subsumes concept relationship (relationship type 010). RxNorm Clinical drugs are tied to ETC classification system using a separate hierarchic concept relationship. As a result, ETC concepts become ancestors of RxNorm clinical drug concepts. Relationship Type Relationship Description Hierarchical 010 Subsumes X Notes Hierarchical relationship among ETC concepts 130 FDB ETC to RxNorm X Hierarchical relationship between RxNorm clinical drugs and low level ETC concepts Table 8. List of ETC relationships as reflected in the RELATIONSHIP_TYPE. For a detailed list of relationships see Appendix B. 2010 Foundation for the National Institutes of Health Page 20

Similarly to ATC, ETC classes are defined for Clinical Drugs, while NDF-RT classes are defined for Ingredients. The ETC classification system is also based on RxNorm clinical drugs and not ingredients. As an example following are the ETC classification concepts for the RxNorm clinical drug concept Id 1545996, which is atorvastatin 40 MG Oral Tablet : Concept ID Concept Name Concept Level Concept Code Concept Class 215023 Cardiovascular Therapy Agents 3 23 Enhanced Therapeutic Classification 21500263 Antihyperlipidemics 3 263 Enhanced Therapeutic Classification Table 9. Example of ETC-derived relationships and drug classes for Atorvastatin 40 mg oral tablet in the Standard Vocabulary. 3.3.7. IMPLEMENTATION OF FDB INDICATION AND CONTRA-INDICATION FDB developed indication and contra-indication concepts for all drugs from a variety of sources such as FDA MedWatch, journal articles, expert treatment guidelines (like AHFS Drug Information, The Medical Letter) and product package inserts 6. All indication and contraindication concepts are part of the Standard Terminology and are assigned a concept level of 3 and a concept class of Indication with one or more of the following concept relationships: Relationship Type Relationship Description Hierarchical Notes 126 FDB FDA approved drug indication According to label 127 FDB off-label drug indication Off-label indications 128 FDB indication to ICD9CM Link to ICD-9 concepts 129 FDB drug contra-indication According to label Table 10. List of Indication relationships as reflected in RELATIONSHIP_TYPE. Indications are linked to drug concepts and ICD-9-CM condition concepts. RxNorm Clinical Drugs are linked to both indication (FDA-approved and off-label) and contraindication concepts through concept relationships types 126, 127 or 129. For each Indication concepts, a list of ICD-9-CM concepts is provided through the Indication is condition relationship (relationship type 128). 6 NDDF PLUS Documentation, Nov 2009, First Databank Inc. 2010 Foundation for the National Institutes of Health Page 21

3.3.8. IMPLEMENTATION OF OMOP DRUGS OF INTEREST Each Drug of Interest 7 is a class of drugs comprising the defined Clinical Drugs. DOI concepts have vocabulary code 30, concept_level 2 and relationships 134 " DOI Consists of" to the Clinical Drugs, and the CONCEPT_ANCESTOR table is populated accordingly. 3.3.9. LEVELS The resulting combined Standard Vocabulary for drugs has 4 levels (table 11). Both Clinical Drugs (branded and generic) as well as Drug concepts for ambiguous drugs are level 1. Level 1 contains marketed drug products which can be administered to patients. Level Description 0 Concepts not used for the Standard Vocabulary 1 Clinical Drugs and Branded Drugs (ingredient-form-strength) 1 Drug concepts for ambiguous drugs 2 Ingredients (but not branded ingredients) 3 Drug classes, indications, contra-indications 4 Top level drug class concepts Table 11. Standard Drug Vocabulary levels. 3.3.10. RELATIONSHIPS RxNorm, the Intermediate Drugs (see above) and the classification systems NDF-RT, ATC, ETC and indications form a combined drug vocabulary. All original RxNorm and classification relationships as well as relationships derived by combining terminologies (e.g. RxNorm and NDF-RT) are stored in the CONCEPT_RELATIONSHIP table: Relationship Type Description Hierarchical 002 RXNORM Has precise ingredient 003 RXNORM Has tradename 004 RXNORM Has dose form 005 RXNORM Has form 006 RXNORM Has ingredient 007 RXNORM Constitutes 7 Health Outcomes of Interest Definitions, available at http://omop.fnih.org/hoi 2010 Foundation for the National Institutes of Health Page 22

Relationship Type Description Hierarchical 008 RXNORM Contains 009 RXNORM Reformulation of 011 NDFRT Has DoseForm 012 NDFRT Induces 013 NDFRT May Diagnose 014 NDFRT Has PE X 015 NDFRT CI PE 016 NDFRT Has Ingredient X 017 NDFRT CI ChemClass X 018 NDFRT Has MoA X 019 NDFRT CI MoA 020 NDFRT Has PK 021 NDFRT May Treat X 022 NDFRT CI With 023 NDFRT May Prevent X 024 NDFRT Has Active Metabolites 025 NDFRT Site of Metabolism 026 NDFRT Effect May Be Inhibited By 027 NDFRT Has Chemical Structure X 028 NDFRT RXN RELA X 102 Drug Concept To RxNorm X 103 OMOP Has Ingredient X 126 FDB FDA approved drug indication 127 FDB off-label drug indication 128 FDB Indication is Condition 129 FDB Drug contra-indication 130 FDB ETC to RxNorm X 131 WHO ATC to RxNorm X Table 12. Relationships in the Drug Domain. Each relationship description also contains its source. For a detailed list of relationships see Appendix B. 2010 Foundation for the National Institutes of Health Page 23

In contrast, the CONCEPT_ANCESTOR table contains only relationships that are defined as hierarchical and reach across the terminologies: For example, NDFRT PE (physiological effect) and NDFRT May Treat, but not NDFRT CI (contraindication) are used for ancestry. The CONCEPT_ANCESTOR table contains for each concept all descendants down to the lowest concept. This way, all RxNorm-derived drugs (Levels 1 and 2) and Intermediate Drug Concepts that belong to an NDF-RT-derived class (level 3, e.g. Cyclooxygenase Inhibitors ) can be queried in one step as opposed to iteratively collecting concepts, resulting in the following descendants (selection): Concept Description Vocabulary code Level Source code Concept class 4333316 COX-2 Inhibitors 7 3 N0000008288 Mechanism of Action 1112807 Aspirin 8 2 1191 Ingredient 19059059 Aspirin 162 MG Oral Tablet 8 1 243686 Clinical Drug or Pack 19139944 AXORID 200mg+20mg modified release capsules 54 1 10174 Clinical Drug or Pack Table 13. Descendants of ancestor NDF-RT class Cyclooxygenase Inhibitor. Classes can have lower-level classes, Ingredients, Clinical or Branded Drugs and Drugs. In this example, Axorid is a Branded Drug not available in the U.S., therefore not part of the RxNorm-based Standard Vocabulary. 3.3.11. MAPPING Mappings are provided in the SOURCE_TO_CONCEPT_MAP table for drug codes from the following sources (MAPPING_TYPE= DRUG ): National drug codes NDC vocabulary code 09. NDCs are maintained by the FDA and by the individual manufacturers and is used very widely. However, because of this NDC codes is not very well controlled code base both in format as well as in code lifecycle, leading to relatively greater mapping problems than the other source vocabularies. Medi-Span Product Identifier (GPI) codes vocabulary code 10. Multum Cerner MMDCs and Multum Drug IDs vocabulary code 16. Department of Veterans Affairs VA ID Drug identifiers from the VA National Drug File VA-NDF vocabulary code 28. First Databank UK Multilex iproductid containing branded and generic drug products in the UK vocabulary code 22. In addition to drug codes, cross-references are available for procedure drugs (MAPPING_TYPE= PROCEDURE DRUG ) from the following sources: American Medical Association (AMA) CPT-4 codes vocabulary code 04 Center for Medicare and Medicaid (CMS) HCPCS codes vocabulary code 05 CMS ICD-9-Procedures codes vocabulary code 03. Only a few codes have enough specific information about the administered drug. 2010 Foundation for the National Institutes of Health Page 24

Some mappings are publicly available, some of them have been derived by using an intermediary mapping step (e.g. HCPCS codes through NDC codes to RxNorm codes). All mappings point to target concept ids that are derived from RxNorm Branded Drugs (Level 1) if the information is available. For GPI, Multum, VA and Multilex, Branded Drug can often not be mapped, and in RxNorm Clinical Drug (also Level 1) concepts and then RxNorm ingredients (Level 2) are used as secondary mappings instead. Table 14 shows the coverage of the mapping in general and in two reference databases: Source code vocabulary Total # of codes Total # mapped # used in GE # mapped in GE # used in CCAE # mapped in CCAE GPI 19,194 19,194 16,029 11,025 Not used NDC 272,389 272,389 Used but not mapped 88,869 53,521 Multum Unknown 9,472 Not used VA-NDF 19,902* 11,165 Not used Multilex 32,400** 27,760 Not used HCPCS 7,370 606 289 289 101 101 CPT-4 10,032 86 79 79 85 85 ICD-9-Proc 4,592 17 Not used 17 17 *Count includes non-drug products such as supplies, imaging agents etc. **Count includes only product type drug and excludes appliances Table 14. Available mappings and utilization for drugs. GE: General Electric Centricity data, CCAE: Thomson- Reuters MarketScan Commercial Claims data. 3.4. Condition Domain 3.4.1. VOCABULARIES The OMOP Standard Vocabulary for conditions consists of SNOMED-CT concepts and hierarchies (Figure 4). Top-level classification (Level 3) SNOMED-CT SNOMED-CT Higher-level classifications (Level 2) Low-level concepts (Level 1) SNOMED-CT SNOMED-CT SNOMED-CT SNOMED-CT Mapping Existing De Novo Derived Source codes ICD-9-CM ICD-9-CM SNOMED-CT SNOMED-CT Read Read Oxmis Oxmis Figure 4: OMOP Standardized Vocabulary and Classification of Conditions based on SNOMED-CT. 2010 Foundation for the National Institutes of Health Page 25

The Systematized Nomenclature of Medicine - Clinical Terms (SNOMED-CT) 8 is maintained by The International Health Terminology Standard Organization (IHTSDO). SNOMED-CT covers most areas of clinical information such as diseases, findings, procedures, microorganisms, pharmaceuticals etc. All condition concepts are taken from the Clinical Findings hierarchy. For experimental purposes, two alternative Standard Vocabularies will be implemented on the base of MedDRA and ICD-9-CM (figure 5). Medical Dictionary for Regulatory Activities (MedDRA) 9 codes are distributed by the International Federation of Pharmaceutical Manufacturers and Associations (IFPMA) and stored in vocabulary code 15. ICD-9 Clinical Modification (ICD-9-CM) 10 diagnostic morbidity codes (Volumes 1 and 2) are maintained by the National Center for Health Statistics (NCHS) and implemented under vocabulary code 02. OMOP also developed Health Outcomes of Interest 11, representing complex definitions of patient cohorts with a specific outcome. Standardized MedDRA Queries (SMQ) 12 are groupings of MedDRA Preferred Terms that are linked to a defined medical outcome, similarly to OMOP health outcomes of interest. MedDRA MedDRA System organ class (Level 5) 3-digit code (Level 3) 4-digit code (Level 2) ICD-9-CM ICD-9-CM ICD-9-CM ICD-9-CM MedDRA MedDRA MedDRA MedDRA MedDRA MedDRA High-level group terms (Level 4) High-level terms (Level 3) Preferred terms (Level 2) 5-digit code (Level 1) ICD-9-CM ICD-9-CM MedDRA MedDRA Low-level terms (Level 1) Source codes ICD-9-CM ICD-9-CM Mapping Existing De Novo Derived Figure 5. OMOP Standardized Vocabulary and Classification of Conditions based on MedDRA and ICD-9-CM. 8 January 2010 International release, SNOMED CT Technical Reference Guide, availabe at http://www.ihtsdo.org/publications/implementing-snomed-ct/implementation-guidance/ 9 Version 12.0, March 2009, documentation available at http://www.meddramsso.com/subscriber_library_ptc.asp 10 6 th edition, September 2009, documentation available at http://www.cdc.gov/nchs/icd/icd9cm.htm 11 Health Outcomes of Interest Definitions, available at http://omop.fnih.org/hoi 12 Version 12.0 March 2009, documentation available at http://www.meddramsso.com/subscriber_smq.asp 2010 Foundation for the National Institutes of Health Page 26

3.4.2. IMPLEMENTATION OF SNOMED-CT FOR CONDITIONS All of SNOMED-CT is loaded into the vocabulary CONCEPT table for the convenience of the researcher, but only the Clinical finding is used for conditions (see below for procedures and observations). 3.4.3. IMPLEMENTATION OF MEDDRA AND ICD-9-CM FOR CONDITIONS As experimental alternatives for SNOMED-CT conditions, MedDRA and ICD-9-CM are loaded as condition concepts. 3.4.4. IMPLEMENTATION OF AMBIGUOUS CONDITION CONCEPTS Similarly to drugs, often times source codes cannot be mapped unambiguously to a single SNOMED-CT Clinical Finding. For these cases, Intermediate Condition concepts are introduced (Figure 6). Condition (Level 1) Clinical Clinical finding finding Clinical Clinical finding finding Intermediate Condition (Level 1) Intermediate Intermediate Source code Source Source condition condition Figure 6. Implementation of Intermediate Condition Concepts for source codes that cannot be unambiguously mapped. The Condition concepts have vocabulary code 53, and a OMOP Intermediate Condition Concept To SNOMED relationship (code 101) is defined between them and the composite SNOMED-CT-derived Clinical finding concepts. Intermediate concepts that result from ambiguous OXMIS and Read mapping, however, are in vocabulary code if they cannot be mapped to Clinical findings or Procedures. Their relationships are either 101, or code 103 Procedure Concept To SNOMED as well as code 104 OMOP Intermediate Concept To SNOMED. 3.4.5. IMPLEMENTATION OF OMOP HEALTH OUTCOMES OF INTEREST OMOP Health Outcome of Interest include 36 special cohort definitions that represent outcomes relevant for OMOP research. Though their definition is based on a variety of different tables, they are treated equivalently to conditions. However, no relationships or ancestry is defined for HOI concepts, which have vocabulary code 29. 2010 Foundation for the National Institutes of Health Page 27

3.4.6. IMPLEMENTATION OF MEDDRA SMQS Standardized MedDRA Queries (SMQ) concepts carry the vocabulary code 31 and are linked to the corresponding MedDRA concepts through relationship 132 " SMQ Consists of" and the appropriate records in the CONCEPT_ANCESTOR table. 3.4.7. LEVELS SNOMED-CT has no strict hierarchy between levels, any concept can be related to any other. It is therefore not possible to assign distinct concept levels. The rule adopted for the OMOP Standard Vocabulary is that all lowest concepts in SNOMED-CT without any descendant concepts are designated Level 1, and all higher level concepts are designated Level 2. Level 3 is the top level Clinical finding concept. MedDRA is a stratified hierarchical vocabulary with 5 Levels: Lowest Level Terms (LLT, Level 1), Preferred Terms (PT, Level 2), High Level Term (HLT, Level 3), High Level Group Terms (HLGT, Level 4) and System Organ Class (SOC, Level 5). MedDRA concepts are also linked to SNOMED clinical findings using a lateral non-hierarchic concept relationships. 3.4.8. RELATIONSHIPS As for the other vocabulary domains, relationships between conditions are stored in the CONCEPT_RELATIONSHIP and CONCEPT_ANCESTOR tables. Relationships as defined within SNOMED-CT, MedDRA, ICD-9-CM and between conditions and SMQ. SNOMEDderived relationships were imported, and SNOMED "IS-A" were converted to OMOP "Subsumes" relationships: Relationship Type Relationship Description Hierarchical 010 Subsumes X 029 SNOMED Recipient category 030 SNOMED Procedure site 031 SNOMED Priority 032 SNOMED Pathological process 033 SNOMED Part of 034 SNOMED Severity 035 SNOMED Revision status 036 SNOMED Access 037 SNOMED Occurrence 038 SNOMED Method 039 SNOMED Laterality 040 SNOMED Interprets 041 SNOMED Indirect morphology 042 SNOMED Indirect device 2010 Foundation for the National Institutes of Health Page 28

Relationship Type Relationship Description 043 SNOMED Has specimen 044 SNOMED Has interpretation 045 SNOMED Has intent 046 SNOMED Has focus 047 SNOMED Has definitional manifestation 048 SNOMED Has active ingredient 049 SNOMED Finding site 050 SNOMED Episodicity 051 SNOMED Direct substance 052 SNOMED Direct morphology 053 SNOMED Direct device 054 SNOMED Component 0 SNOMED Causative agent 056 SNOMED Associated morphology 057 SNOMED Associated finding 058 SNOMED Measurement Method 059 SNOMED Property 060 SNOMED Scale type 061 SNOMED Time aspect 062 SNOMED Specimen procedure 063 SNOMED Specimen source identity 064 SNOMED Specimen source morphology 065 SNOMED Specimen source topography 066 SNOMED Specimen substance 067 SNOMED Due to 068 SNOMED Subject relationship context 069 SNOMED Has dose form 070 SNOMED After 071 SNOMED Associated procedure 072 SNOMED Procedure site - Direct 073 SNOMED Procedure site - Indirect 074 SNOMED Procedure device 075 SNOMED Procedure morphology 076 SNOMED Finding context 077 SNOMED Procedure context 078 SNOMED Temporal context 079 SNOMED Associated with 080 SNOMED Surgical approach 081 SNOMED Using device 082 SNOMED Using energy 083 SNOMED Using substance 084 SNOMED Using access device Hierarchical 2010 Foundation for the National Institutes of Health Page 29