CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION

Size: px
Start display at page:

Download "CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION"

Transcription

1 CONCEPT DRIFT IN ONTOLOGY MAPPING AND SEMANTIC ANNOTATION ADAPTATION Cédric PRUSKI Dri%- a- 2016, November 20 th, Bologna, Italy 1

2 MOTIVATION K S? K T? Outdated mappings and annotations may trigger undesirable results in biomedical systems Malignant neoplasm = malignancy Malignant neoplasm Crucial maintaining mappings and annotations valid data malignancy data malignancy inaccessible Large size and complexity Prevents a totally manual maintenance 2

3 PROBLEMATIC What is the impact of concept drift (or ontology evolution) on ontology mappings and semantic annotations? Quantitative Qualitative How can we formally characterize concept drift? Basic changes (Addition/Deletion of concepts) Complex changes (Split, merge, move of concepts) Can we reuse information that characterizes concept drift to adapt ontology mappings and semantic annotations? Prevention of re-alignment / re-annotation of whole datasets 3

4 AGENDA 1 Concept drift for mapping adaptation a. DynaMO research project b. Change patterns 2 Concept drift for semantic annotation maintenance a. ELISA research project b. Background knowledge 3 Discussion a. Concept drift for LOD 4

5 THE CASE OF MAPPING ADAPTATION 5

6 ONTOLOGY MAPPING ADAPTATION Definition and Problematic Adaptation of existing mappings according to modifications affecting KOS elements at evolution time M V1 =(s, t, r) M V2 =(s, t, r ) Hypothesis: There is a correlation between the way KOS elements evolve and the way mappings are adapted 6

7 UNDERSTANDING MAPPING EVOLUTION Identify potential interdependencies between changes affecting KOS entities and the mapping evolution How concept drift impact mappings? Empirically examine official and real-world mappings over time Evolution of SNOMED CT and ICD9CM as a case study SNOMED CT Jan/10 SNOMED CT Jul/10 SNOMED CT Jan/11 SNOMED CT Jul/11 M ST 1 Jan/10 M ST 2 Jul/10 M ST 3 Jan/11 M ST 4 Jul/11 ~ mappings analyzed ICD9CM 2009 ICD9CM

8 KEY FINDINGS Before Evolution After Evolution ICD9CM Attributes -Concretion of intestine -Enterolith -Fecal impaction SNOMED CT Impaction of intestine is-a How to identify This these concept changed attributes? similarity Time is-a is-a This concept was added ICD9CM Observed modifications SNOMED CT Typhlolithiasis (disorder) Concretion Enterolith of intestine (disorder) (disorder) Fecal impaction of colon Fecal impaction Fecal impaction Mapping adaptation based on the evolution of colon of relevant concept attributes Fecal impaction 8

9 CHARACTERIZATION OF CHANGES Lexical change patterns inflammatory specified bronzed diabetes behavioral bowel diseases problem Ø Total Copy (TC) time j a sup1, a sup2,, a supn Ø Total Transfer (TT) c s 0 a 1, a 2,, a n a sib1 Ø Partial Copy (PC) a sub1, a sub2,, a subn Ø Partial Transfer (PT) time j+1 c s 1 a 1, a 2,, a n a sib1, a sib2 inflammatory unspecified mental bowel bronzed diabetes diseases behavioral 1 problem time specified inflammatory behavioral problem bowel diseases CONTEXT = SUP SUB SIB 9

10 CHARACTERIZATION OF CHANGES Semantic change patterns Kappa Focal familial Diabetes atelectasis chain hyperchylomicronemia type disease I Ø Equivalent (EQV) time j a sup1, a sup2,, a supn Ø Partial Match (PTM) c s 0 a 1, a 2,, a n a sib1 Ø More Specific (MSP) Ø Less Specific (LSP) time j+1 Helical atelectasis Diabetes type 1 a sib1, a sib2 time c s 1 a 1, a 2,, a n a sub1,, a subn CONTEXT = SUP SUB SIB Kappa light chain familial disease chylomicronemia 10

11 LINKING CP AND MAINTENANCE ACTIONS Heuristics KOS K S Affected by KOS changes Kappa chain disease KOS K T unchanged relevant attributes semtype time j a s1, a s2, a s3,, a c 0 sn a 1,, a k c s t Semantic CP!Lexical CP (Total Transfer) time j+1 c s 1 a s1, a s2, a s3,, a sn a sib1, a sib2,, a sibn c cand 1 CONTEXT = SUP SUB SIB Kappa light chain disease MoveM(m st, c cand1 ) 11

12 CONCEPT DRIFT FOR MAPPING ADAPTATION Lessons learned Concept drift has a huge impact on ontology mappings but some changes in concept do not affect mappings Drift of attribute values governs the mapping adaptation process In most of the cases concept drift results in local changes Change in super, sub concepts and siblings Considering ontology versions alone is not enough to characterize concept drift Need of external background knowledge to better determine the semantic relationship between versions of concept Cf. semantic annotation adaptation 12

13 THE CASE OF SEMANTIC ANNOTATIONS ADAPTATION elisa project.lu 13

14 SEMANTIC ANNOTATIONS ADAPTATION Problem 14

15 METHODOLOGY Impact of concept drift on semantic annotations 15

16 RESULTS 16

17 RESULTS 17

18 RESULTS 18

19 RESULTS 19

20 CONCEPT DRIFT FOR ANNOTATIONS Use of external knowledge source Concept may have labels before and after evolution that are disjoint from the syntactic or lexical point of view Ex: Cancer Malignant neoplasm Lexical and Semantic change patterns cannot be applied Consideration of external knowledge sources are required to characterize the evolution of concepts in such situations We propose a methods exploiting Bioportal to overcome this limitation Ontologies Mappings The method is able to find the semantic relationship between two versions of the same concepts Equivalent, less specific, more specific, unrelated, partially matched 20

21 USE OF EXTERNAL KNOWLEDGE SOURCE Example Pituitary)dwarfism ) (MeSH)) Pituitary)dwarfism)II ) (MeSH)) 1 Search)in)ontologies) Search)in)ontologies) 1 SNOMED)CT,) ICD9CM,)MEDDRA,) NCIT,)DOID,)RCD,)HP,) DERMLEX,)NATPRO,) CRISP,)SOPHARM,) BDO,)SNMI) (Direct)method)) No)common)ontologies) Use)mappings) 15)mappings)available) (OMIM)ontology)) 2 OMIM) NDFRT) (Indirect)method)) Pituitary)dwarfism)II )(OMIM)) Mapped_to) LaronRtype)isolated)somatotropin)defect )(SNOMED)CT)) SNOMED)CT)is)the)common)ontology) LaronRtype)isolated)somatotropin)defect )and) Pituitary)dwarfism )have)the)same)super)concept) ( short)stature)disorder ))they)are)siblings) 3 21

22 CONCEPT DRIFT IN ANNOTATION ADAPTATION Lessons learned (so far ) Ontology regions do not evolve in the same way Unstable regions à handle with care Interesting for predicting concept drift Concept drift has a different impact on annotation tools GATE NCBO annotator Background knowledge gives promising results for characterizing concept drift Bioportal ontologies RDF datasets, Web data under investigation Will machine learning help in understanding concept drift? Identification of relevant features What ML techniques to use? 22

23 DISCUSSION Concept drift for LOD Linked Open Data requires vocabulary for semantic interoperability purposes LOD for characterizing concept drift Quality of LOD is problematic Some datasets rely on outdated vocabularies Concept drift impacting LOD: FOAF, DC not so dynamic as domain ontologies No control over the datasets using controlled vocabularies à How to propagate changes observed in the vocabulary to RDF datasets? 23

24 COLLABORATORS Silvio Cardoso, Dr. Marcos Da Silveira, Dr. Duy Dinh, Dr. Julio Dos Reis, Dr. Anika Gross, Pr. Erhard Rahm Pr. Chantal Reynaud-Delaître, And all the others 24

25 REFERENCES M. Da Silveira, J. C. Dos Reis, C. Pruski, Management of Dynamic Biomedical Terminologies: Current Status and Future Challenges, IMIA Yearbook of Medical Informatics, 10(1), , 2015 J. C. Dos Reis, D. Dinh, M. Da Silveira, C. Pruski, C. Reynaud-Delaître, Recognizing lexical and semantic change patterns in evolving life science ontologies to inform mapping adaptation, Artificial Intelligence in Medicine, 63(3), , (DOI: J. C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, 47, 71-82, S. D. Cardoso, C. Pruski, M. Da Silveira, Y-C Lin, A. Gross, E. Rahm, C. Reynaud-Delaitre, Leveraging the Impact of Ontology Evolution on Semantic Annotations, Knowledge Engineering and Knowledge Management - 20th International Conference, (EKAW) 2016, Bologna, Italy, November 19-23, 2016 J.C. Dos Reis, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Characterizing Semantic Mappings Adaptation via Biomedical KOS Evolution: A Case Study Investigating SNOMED CT and ICD, AMIA 2013 Annual Symposium, Washington DC (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, Mapping Adaptation Actions for the Automatic Reconciliation of Dynamic Ontologies, ACM International Conference on Information and Knowledge Management (CIKM 2013), San Francisco, CA (USA), 2013 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira, C. Reynaud-Delaître, The influence of similarity between concepts in evolving biomedical ontologies for mapping adaptation, European Medical Informatics Conference (MIE), 31/08-03/09, Istanbul, Turquie, 2014 J.C. Dos Reis, D. Dinh, C. Pruski, M. Da Silveira and C. Reynaud-Delaître, Identifying change patterns of concept attributes in ontology evolution, Proc. of the 11th ESWC, Anissaras, Crete, (Greece), C. Pruski, J.C. Dos Reis, M. Da Silveira, Capturing the relationship between evolving biomedical concepts via background knowledge, 9 th International SWAT4LS conference, Amsterdam,