January Amnon Shabo (Shvo), PhD IBM Research Lab in Haifa

Size: px
Start display at page:

Download "January Amnon Shabo (Shvo), PhD IBM Research Lab in Haifa"

Transcription

1 January 2010 Amnon Shabo (Shvo), PhD IBM Research Lab in Haifa Co-chair, HL7 Clinical Genomics Co-editor, HL7 Pedigree (Family History) Co-editor, HL7 CDA Co-editor, HL7 CCD

2 Estab Employees: 440 Estab Employees: 1750 Estab Employees: 210 Estab Employees: 110 Estab Estab Employees: 40 Estab Employees: 270 Estab Employees: 80 Employees: 180

3 Challenge & Objectives An EC-FP7 funded project addressing challenge HEALTH : Molecular epidemiological studies in existing well characterized European (and/or other) population cohorts Objective: To define a comprehensive genetic epidemiology disease model of essential hypertension (EH) by integrating new technologies of high-throughput genotyping with sophisticated statistical-mathematical modeling and methods of genetic epidemiology Scientific Coordinator State University of Milano Collaborators State University of Milano; Katholieke Universiteit Leuven; Uniwersytet Jagiellonski Collegium Medicum; Sineurra; IMS Research; State Scientific Research Institute of Internal Medicine, Russian Academy of Medical Sciences Siberian Department; Imperial College London; UC San Diego; INSERM - College de France; Warwick Medical School; Prassis-SigmaTau Research Institute, Milano; STMicroelectronics; Losanna & Ginevra University; Pharm-Next; Softeco Sismat Spa, Genoa; Shanghai Institute of Hypertension; Charles University in Prague; Faculty of Medicine in Pilsen; State University of Padova; Medical University of Gdansk

4 !"! Two IBM Haifa Research Groups: Healthcare & Life Sciences Group Machine Learning Group IBM Lead IBM significant Involvement IBM IBM

5 #$%& Standardize [data / information / knowledge representations] and harmonize them using a common language (reference information model) Develop constraining formats to generic standards (e.g., CDA): capturing similarities while preserving disparities Develop a biomedical information infrastructure (BII) Core component: a general-purpose health data warehouse Semantic warehousing mitigate the tension between integration and computation Accommodate a variety of data formats: XML, tabulated, mass genomic (high throughput), etc. Data access mechanisms based on semantic computation (e.g., context computation) Super users can directly access the warehouse (e.g., XQUERY) For research users data marts creation tool For healthcare comply with emerging standards like IHE QED and common like XDS Develop interfaces to health data analysis tools Support both research and healthcare environments (e.g., various query mechanisms) Use advanced technologies (e.g., DB2 purexml, Spring Framework) and new development methodologies (e.g., Agile programming)

6 $'()&*+, CDA Template Header One instance per subject subject id. Body reference1 to GV reference2 to GV reference2 to PD. clinical & environmental observations Pedigree Template subject id GV Template Observed subject id Genomic Observations Phenotypes Genotype Phenotype Interpretive Encapsulation or referencing knowledge Raw Genomic Data subject id HapMap / BSML / MAGE or Relational schemas optimized for persistency Disease Model EHR

7 &$ Entry Point: GeneticLocus Encapsulating Obj. Bubbled-up Obj. Expression Individual Allele Determinant Polypeptide Bio Sequence Attributes Related Allele Expression Data Polypeptide genotype phenotype Sequence Variation (SNP, Mutation, Variation Attributes Clinical Phenotype Polymorphism, etc.)

8 &-"./ ++* Genomic Data Sources Bridging is the challenge Clinical Practices HL7 CG Messages with mainly Encapsulating HL7 Objects Encapsulation by predefined & constrained bioinformatics schemas Knowledge (KBs, Ontologies, registries, reference DBs, Papers, etc.) EHR System HL7 CG Messages with encapsulated data associated with Bubbling-up is done continuously by specialized DS applications HL7 clinical objects (phenotypes) Decision Support Applications Bubble up the most clinically-significant raw genomic data into specialized HL7 objects and link them with clinical data from the patient EHR

9 &0% Family History % & -$ "# )B)* C B & 0&)#)-) /" ($$- "#/! 7)#-$))*! 0"' 7)#))"!404)#))*$%&&,#)#-$))* ( ( "#)B)* C B (!0D $,&"!&&&" ",'/#'7!7&&, "# +'1'&,'#04,, #,'"!&&&,&",#"+#C;&0", &,4G1'#++! 71'&#','&= "!&&& % & )#& ' I!76& #'7"# $,"# '+0 &,!,#!"#'1'& 4 I!76& #'7 4'!;/#'C1!,%3) "#G;6-$3:D3IIA%DAA:B--- #'7#,'"#, "''&&!'& "& Risk Assessment Results $, '71# #/,/!7,& #'7#! &,I!76& #'7"!&&>,",,&&0+E" ' '& ) %!#"!9'+!-$ $,"# '#!9'+!&&#" #!=&24'!7&&:&0! &# 10 1' '&0&,!7&&!=&& ( 7&1"/" 7,:)2:3!4#',I#'C1!;/,"# '+0,#!&.&& ( 7.,,(!0 '+0,#!&,&& ( 7 &!/ $,4 >,",,'&1'#++! 7#/,(4,&&#'0 # /,?"#@ '+0 #/,&#0'"" $,1'#++! 7&'1'& ', '4 " 1'#++! 7 (" $,1'#++! 7#/,(4,&&#' 0 # /,?"#@ '+0 #/,&#0'""!&&:! ('&=&' # #/,1'#++! 7#/,( #""0''4,C1#&4'#01('&0&,"# '#! 8#GC1#&<4'#01 ' ' # 14'!7&&:&0! & ' -$ "#)B)* 4 #% '( #C1'-$ C B /" ($9%$-,#)#-$))* ' "# )B)* (!0D ' "# )B)* (!0:% ' "# )B)* (!09%:% ' ' "# )B)* (!0:% ' ' "# )B)* (!0:% '! 2' 7 " ",#" Patient &"#1:#! *+8:3%<,. / /#' '1'& &,&#0'"#//#' # /'#>,",,&/!7,& #'7>&"#!" $,&0+E" #/,&/!7,& #'7 $,1'#++! 7#/,(4,&&#' 0 # /,?"#@ '+0 #/,&#0'"" H&,&&&#" # #'1'& 1'#+!=#>,/!7, "# + '+0 #&1"/"/!7+' 1 $,&"!&&'1'& &,'&0! &#/!7&&# #, "1 0',/!7,& #'714' Relative Mother & Father IDs $,:! ("!&&'1'& &1 J&'! (&&"#1 +7,2'&# 7$,+&&#/,&1' #/,#!&, :A/ ##//!7+''! #&,1&>,",'+&#,'! #&,1+ >&"#14 7'#!I#'C1!;,"#)6%B&/&.$,1!7'#/,'#!&",!#/, &"#14 7.;,&4#&/#'7 71#//!7'! #&,1 #,,&&(!# #!7 #,'! #&,1+ >,1 '! ('" Recursive!7&&#" >,,1 ;',',&& '0 /#'7'! #&,1+ >/!7+'&#,&14';/#' C1!;+ >,1 J&#,'8,&"#1'<,'/,' 8,'#!< Relation $,1'#++! 78C1'&&1'" 4&< #/,(4,&&#'0 # /,?"#@ '+0 #/,&#0'""!&& $,,!,"'1'#('&"#14,1 >,!,/!7,& #'7>&"#!" *+83:<,- 1'#('3'45 #. / 1 2'&#!"#$% & ' (')#))* +',$$- "&%./!&. "&$$- '")#-$))*,"'#01)#-$))* '! #&,16#!' -,#>#/, & 4 ",#"/#'"0'' #'"&4 ))* 4'"1!",#!' #,#!"##"!"! 84;1'#+!&;4#&&;'" #& #'04&;!'4&; "< "# # "!"!#"),#" "$ # &0+E" & 4 "$ "!"!#"),#" -$ "# )B)* 4 #% C B & 0&)#)-) /" ($9%$- "#/! 7)#-$))*! 0"' 7)#))" (!0D,#)#-$))* 2'&#,#!&!&, '# &1"/",/!7'#!1!7+72'&# 2'&#&!&#,&"#1'#/,'! ('#!&8/#'#'!&&, 9:#!)#(#"+0!'7;#2'&#!:! of subject #&,1:#!$71< %=4+"=/'#:! ( #2'&#!#>&1!"41'&#!!&#/,'! (!&#+!&'"0'&('1'& ##/7,4,'4'#/'! #&; 4;4'/,';,'#04,,&&&#" #& 42'&#;/#'+#,?10'@,''","!'1'& #&>!&&1"/74/,'#,'& "# *+8%3)<,!. / *+8%3)<,!. / '! ( "# )B)* (!09%:%! "# )B)* (!09%:% $,&&, "%#"0&)A$<;,' /" #/,6% )!"!#"&-;!4>,! 71&#/ 4#" -,#>#/,)!"!#"& ",#"&!' #,",#" &&#" >,,2 '#! 8, '71# #/,&#!< " A0! 1!!#";0!54, "%#"0&)A$/#'",!#"0& Estimated age )) "!"!3+&'( # "# )#& ' B"&& 4"# $,"#&,!'1'& & "&&!' #,%3)"#G8$B$6< & 48"0'' #'"&< #/,1 F'! ("&&>,',&F,'+', &0=#> & 4 )#& ' %(4& 4"# $,,"#&,!'1'& & "&&!' #,%3)"#.G., '1'& &, "#"1 #/& 48&#11#& #1'"&4< '"0'&(&&#" #, '&&&#' "#1!C & &;"#&& >,, )!"!- #! 6#!&,& 4#/,&0+E" 8;,1 #'##/,'! (&< >,,#+&'( #>& "# )B)* (!09%:% )#& ' B & 4"# $,"# '+0 &,!,#!"#'1'& 4 4#/&0+E",/" ( >,, &#0'"#+&'( #>&/#', &0+E" GeneticLocus / Loci CMETs Relative HL7 Vocabulary= FAMMEMB Clinical Data Estimated age of subject at diagnosis

10 0 1$. Taken from a patient pedigree, the portion related to patient s daughter (in collaboration with Partners HealthCare & other HL7 CG SIG members) Bubble up To phenotype and beyond. Point back Point back to the raw data of this relative providing personal evidence

11 1$0., Raw genomic data represented in Bioinformatics markup HL7 v3 XML

12 0,2 e.g., an OMIM Entry: Despite the dramatic responses to EGFR inhibitors in patients with non-small cell lung cancer, most patients ultimately have a relapse. {12:Kobayashi et al. (2005)} reported a patient with EGFR-mutant, Gefitinib-responsive, advanced non-small cell lung cancer who had a relapse after 2 years of complete remission during treatment with Gefitinib. The DNA sequence of the EGFR gene in his tumor biopsy specimen at relapse revealed the presence of a second mutation ({ }). Structural modeling and biochemical studies showed that this second mutation led to the Gefitinib resistance.

13 .3%. An OMIM entry is essentially unstructured Curated from published studies Indexed and hyperlinked to variants and studies Text Analytics: Analyze the text to create a more structured representation For example: Free text: presence of a second mutation led to the resistance Structured data based on the CG spec derived from the RIM:

14 .3%.456 CG spec Clone name RIM Class name Observation SequenceVariation [EGFR Variant id ] Relation: [explanation] Other relevant relations: evidence, support, cause, manifestation Observation ClinicalPhenotype [responsive] Code drawn from an internationally recognized terminology Observation SequenceVariation [EGFR Variant id ] Relation: [explanation] Observation ClinicalPhenotype [resistant] [subject] Relation: Entity / Role ManufacturedMaterial [Gefitinib ] Relation: [participation] SubstanceAdministration DrugTherapy [Gefitinib intake details: dose, time, etc. ]

15 +7 A data warehouse prototype (RIMon) was populated with standardized data transformed from proprietary sources (~10 disparate cohorts) Rich XML representation of clinical data for ~4000 subjects 1m SNP data per subject in relational tables Preliminary version of the BII is available on ext. server Export feature to a data mart Enabling IBM DDQB atop of the data mart for interactive query building and running Direct access to the data warehouse & mart through JDBC Use advanced technologies & development methodologies DB2 purexml Cruise Control Spring Framework Combined SQL+ XQuery

16 & 4 6$ RESEARCH Analysis Data Marts BII Mart - RDF BII Mart - RDF Analysis HEALTHCARE New knowledge & Information XQUERY Exploration HL7 RIM-based XML Storage Promote & Export Semantic Computation Library (e.g., context computation) Health Semantic Warehouse Mass Data Extraction Non-XML format, e.g., genomic; images; sensors EHR Patient data RDF HL7 v3 Sources of Proprietary or Standard Proprietary or Standard Sources of Proprietary or Standard Knowledge Information Data

17 *+8

18 #, OWL Ontology Data Source Adapter Template Model CTS Instance Generation Engine Java API Mapping local Vocabularies Representing constraints Standard-based Instances (e.g., CDA) Conform to the Template Model

19 .+,9 Data Producers BII Data consumers Rimification of Domain Query & Send v3 instances or RIM Rimification of v3 Domain Instances Persisting Replacing each clone name with the most specialized RIM class name based on the classcode hierarchy, and preserving it RIM-based XML Persistency Domainification of Query results XQUERY Rimified Query for data mining & CDS on longitudinal and crossinstitutional EHR

20 .31$ Blood Pressure Observations: <ActRelationship clonename="entryrelationship" contextconductionind="true" typecode="comp"> <Observation classcode="obs" clonename="observation" moodcode="evn"> <code code=" " codesystem=" " codesystemname="snomed CT" displayname="systolic BP"/> <value unit="mmhg" value="143.7" xsi:type="pq"/> </Observation> </ActRelationship> <ActRelationship clonename="entryrelationship" contextconductionind="true" typecode="comp"> <Observation classcode="obs" clonename="observation" moodcode="evn"> <code code=" " codesystem=" " codesystemname="snomed CT" displayname="diastolic BP"/> <value unit="mmhg" value="97.7" xsi:type="pq"/> </Observation> </ActRelationship>

21 , :1$ XQueries XQueries with embedded SQL SQL/XML queries Update and delete operations with XML columns XQUERY declare default element namespace "urn:hl7-org:v3"; for $doc in db2- fn:xmlcolumn('db2inst1.cda.document')/clinicaldocument where return $doc/recordtarget/patientrole/id

22 # +-7 Large scale efforts FDA Sentinel Public health monitoring Longitudinal & Cross-institutional EHR projects Undirected aggregation of medical records summarized into the EHR Pharma Bridge to patients EHR Disease registries

23 &. Thank you for your attention Questions?