Acquisition and Understanding of Process Knowledge Using Problem Solving Methods

Size: px
Start display at page:

Download "Acquisition and Understanding of Process Knowledge Using Problem Solving Methods"

Transcription

1 Acquisition and Understanding of Process Knowledge Using Problem Solving Methods Jose Manuel Gómez Pérez Facultad de Informática Universidad Politécnica de Madrid Campus de Montegancedo sn Boadilla del Monte, Madrid Phone: Fax: PhD thesis Date: 14/07/2009

2 Outline Introduction and motivation Open research problems and work hypotheses Acquisition of process knowledge by SMEs Provenance analysis of process executions by SMEs Evaluation Conclusions and future research problems 2

3 Knowledge Acquisition: Towards SME empowerment KA Frameworks DARPA s HPKB & RKF The Knowledge Acquisition Bottleneck Subject Matter Expert (SME) The Role Differentiation Principle The Knowledge Level DARPA s KSE Knowledge modeling Ontologies Ontology editors KA by Knowledge Engineers (KEs) KRR Languages OCML KARL Knowledge formulation by SMEs KA by Subject Matter Experts (SMEs) SHAKEN KB edition by SMES KB maintenance KRAKEN DISCIPL-RKF CHIMAERA Knowledge Engineer (KE) Knowledge programming Problem Solving Methods Collaborative knowledge creation SEMANTIC WIKIS 3

4 Knowledge types RUL (inference) CMP (comparison) DAT (data structures) TRANS (translation) Source: the Halo project KR analysis phase for the domains of Chemistry, Biology, and Physics CLS (classification) TAB (tables) PROC (procedural) NF (non functional) TIME (temporal) FACT (factual knowledge) PCS (processes) EXP (experiments) SPACE (spatial) GRA (diagrammatic) MAT (mathematics) CAUS (cause-effect) US (underspecified) PWR (part-whole) Processes are special knowledge types that Relate to the sequence of operations and involved resources leading to the production of some outcome Encapsulate preconditions, results, contents, actors, and causes Process knowledge is complex It builds on top of other simpler knowledge types, like facts and rules What is released/added/increased upon binding of two amino acids? A piece of solid calcium is heated in oxygen gas.... Find correct RNA sequence for a given DNA sequence. 4

5 Why processes are important Processes appear in 37% (average) in the domains of Biology, Chemistry, and Physics The most important knowledge type in Chemistry (53%) Second in Biology (35%) Fourth in Physics (22%) Source: The Halo project KR analysis phase for the domains of Chemistry, Biology, and Physics 5

6 Work objectives Objective 1: To enable SMEs to formulate processes without KEs Objective 2: To support SMEs in understanding process executions How Provide reusable guidelines to formulate process knowledge Support reasoning Describe the main rationale behind a process What PCS PSMs SMEs Whom 6

7 PSM perspectives Task-method decomposition Interaction Black-box perspective Knowledge transformation within the PSM PSM establishes and controls the sequence of actions required to perform a task Defines knowledge required at each task step Knowledge flow Hierarchically defines how tasks decompose into simpler (sub)tasks Describes tasks at several levels of detail Provides alternative ways to achieve a task Task Method Role 7

8 Provenance analysis of process executions? 8

9 In summary This thesis proposes the use of PSMs as a novel approach for supporting SMEs both in the formulation of process knowledge and in the provenance analysis of process executions It also explores to what extent it is possible to build such tools that take KEs out of the formulation and analysis loop Ultimately, it aims at showing that it is possible to engage users To generate computer-readable content represented in formal languages To apply knowledge representation and reasoning techniques to analyze the outcomes of automated, knowledge-intensive processes 9

10 Outline Introduction and motivation Open research problems and work hypotheses Acquisition of process knowledge by SMEs Provenance analysis of process executions by SMEs Evaluation Conclusions and future research problems 10

11 Open research problems and work hypotheses: Objective 1 Objective 1: To provide SMEs with the means to formulate process knowledge in their domains of expertise without the intervention of KEs H1: Empowering SMEs can increase KB quality and reduce costs H2: The complexity of process knowledge requires providing SMEs with specific means to acquire and reason with processes H3: PSMs can reduce the complexity of acquiring process knowledge by SMEs H4: The proposed methods and tools abstract SMEs from the underlying representation H5: The proposed methods and tools produce sound and complete executable process models H6: The proposed method and tools are flexible and reusable across domains 11

12 Open research problems and work hypotheses: Objective 2 Objective 2: To support SMEs in analyzing and understanding process executions H7: The analytical capabilities of PSMs can provide SMEs with meaningful interpretations of process executions H8: The method proposed identifies the main rationale behind processes by detecting occurrences of PSMs in their execution logs H9: The method proposed can use the hierarchical structure of PSMs to describe process executions at different levels of detail 12

13 Outline Introduction and motivation Open research problems and work hypotheses Acquisition of process knowledge by SMEs Provenance analysis of process executions by SMEs Evaluation Conclusions and future research problems 13

14 Acquisition of process knowledge by SMEs Four main contributions C1: a process metamodel, which provides the terminology necessary to express process entities in scientific domains, and the relations between them C2: a PSM library, which provides high-level, reusable abstractions for process representation and a method for its development C3: a graphical process modeling and reasoning environment, which applies the previous contributions in order to enable SMEs to formulate process knowledge C4: a method for the automatic synthesis of executable process models from SME-authored process diagrams, supported by an underlying representation and reasoning formalism Objective 1: To provide SMEs with the means to formulate process knowledge in their domains of expertise without the intervention of KEs 14

15 Contribution 1: The process metamodel Resources (roles) Containers of domain conceptsthat can play a particular role Actions Inspired by activities in EO and TOVE Relations Forks 15

16 Contribution 2: Building a PSM library for acquisition of process knowledge Identification Decomposition and abstraction Extends work done in the Halo analysis phase by Omniscience and Ontoprise teams Reusable PSM library 755 AP questions analyzed >100 domain-specific processes Four main process categories Join Split Modify Locate 16

17 Contribution 2: A PSM example (decompose & combine) Crystallization occurs when certain pairs of oppositely charged ions attract each other so strongly that they form an insoluble ionic solid. This process coexists with dissolution processes in precipitation reactions The addition of a colorless solution of potassium iodide (KI) to a colorless solution of lead nitrate [Pb(NO 3 ) 2 ] produces a yellow precipitate of lead iodide (Pbl 2 ) that slowly settles to the bottom of the beaker. name goal actions input action output action input roles output roles decompose & combine member(recombination set, Element) and member(constituents set, Piece) and part-of(piece, Element) and part-of(piece, Combination) and properties(element, ep) and properties(combination, cp) and not equal(ep, cp) decompose, combine decompose combine Recombination set, Decomposer, Combinator Combination, Byproduct 17

18 Contribution 3: The graphical process modeling environment Process metamodel Domain process to which this process diagram is bound Consistency maintenance (knowledge base and process data flow) PSM library (e.g. decompose & recombine) Domain-level reasoning and control flow evaluation 18

19 Contribution 4: The process representation and reasoning formalism Bridges the gap between the knowledge level and the operational level Focus on three main aspects Process frame Data flow Control flow Input action Conditional precedence (control flow) Output action In a long-distance jump competition, an athlete can jump only after his mitochondria have accumulated enough energy for his muscles to contract. 19

20 Contribution 4: Addressing the frame problem Two states (pre and post) per process action Pre state: portion of the process frame in the scope of an action Post state: updated pre state of the action after its the execution Actions read from their pre state and write into their post state Pre state of action Dissolve Post state of action Dissolve At modeling time we automatically synthesize process rules that manage the process frame during execution Setup rules: build the pre state of the input actions of the process Precedence rules: describe what actions can be connected with each other by means of their outputs and inputs Transition rules: describe the transition between pre and post states Explicit manipulation of the process frame allows runtime management of data and control flow 20

21 Contribution 4: The Code synthesis mechanism Action-centric algorithm Each action results into a set of process rules in F-logic input actions intermediate actions output actions setup rules x - - transition rules precedence rules x x x - x x setup transition FORALL m, e, j j: jump@poststate(musclecontraction) AND j: OUTPUT@postState(muscleContraction) AND musclecontraction[provides -> <.- AND m:tool@prestate(musclecontraction) AND m[is_used_by -> musclecontraction]@prestate(musclecontraction) AND e:energy@ prestate(musclecontraction) AND e:resource@prestate(musclecontraction) AND e[is_consumed_by -> FORALL m, e, v m:mitochondrion@prestate(accumulateenergy) AND m:tool@prestate(accumulateenergy) AND e: energy@prestate(accumulateenergy) AND e:resource@prestate(accumulateenergy) AND e[hasenergyvalue -> v]@prestate(accumulateenergy) <- m:mitochondrion AND e:energy AND e[hasenergyvalue -> v]. precedence FORALL e, v e:energy@prestate(musclecontraction) AND e[hasenergyvalue -> v]@ prestate(musclecontraction) <.- e:energy@poststate(accumulateenergy) AND e[hasenergyvalue -> v]@ poststate(accumulateenergy). 21

22 Contribution 4: Domain-level reasoning within processes The minimum amount of energy needed to jump are 5 calories The length of the jump is directly proportional to the amount of energy accumulated check FORALL anenergy, v <- AND anenergy:energy[hasenergyvalue -> AND greater(v, 5). update FORALL length, anenergy, v ajump(out(haslength, length):jump@update(musclecontraction) ajump(out(haslength, length) [haslength -> length]@update(musclecontraction) <- AND anenergy:energy[hasenergyvalue -> AND multiply(length, 2, v). transition FORALL j, m, e, length j: jump@poststate(musclecontraction) AND j: OUTPUT@postState(muscleContraction) AND muscle contraction[provides -> j]@poststate(musclecontraction) AND j[haslength-> <.- AND m:tool@prestate(musclecontraction) AND m[is_used_by -> musclecontraction]@prestate(musclecontraction) AND e:energy@ prestate(musclecontraction) AND e:resource@prestate(musclecontraction) AND e[is_consumed_by -> AND AND j:jump@update(musclecontraction) AND j[haslength -> length]@update(musclecontraction). 22

23 Contribution 4: Sample question In a long-distance jump competition, an athlete can jump only after his mitochondria have accumulated enough energy for his muscles to contract. At least, what amount of energy does a long jump athlete need to consume in order to jump more than 8m long? a. 100 cal b. 50 cal c. 250 cal d. 1 cal energy1:energy[hasvalue -> 100].\n FORALL j, oa, v <- long jump :PROCESS@ProcessModule AND long jump [OUTPUT_ACTION ->> oa]@processmodule AND j:jump[hasvalue -> v]@poststate(oa) AND greater(v, 8). 23

24 Contribution 4: Properties of the process models Sound and complete Based on F-logic s proof theory plus additional proof for the process formalism A process action is sound its post state can be deduced from its pre state A process action is complete it allows deducing all the possible clauses of its post state from the clauses in the pre state A process model is sound and complete all its actions are sound and complete Optimized Attribute and concept names ground person(peter) instead of instanceof(person, Peter) Allows OntoBroker to index tuples by class and attribute name Process rules are generally stratified Critical in the presence of negation (forks and loops) Avoid costly well-founded evaluation mode 24

25 Outline Introduction and motivation Open research problems and work hypotheses Acquisition of process knowledge by SMEs Provenance analysis of process executions by SMEs Evaluation Conclusions and future research problems 25

26 Provenance analysis of process executions by SMEs Objective 2: To support SMEs in analyzing and understanding process executions Two main contributions C5: A method and algorithm that uses PSMs as high-level, reusable process abstractions and visualization paradigm to identify and explain the reasoning strategies and rationale of executed processes C6: An architecture and integrated environment for the analysis of process executions at the knowledge level 26

27 Contribution 5: Towards knowledge provenance PSMs as semantic overlays on top of existing process documentation Provenance, from a knowledge perspective How provenance relates to the execution of a process Simpler process analysis proposing decompositions into simpler subprocesses Visualize provenance at different levels of detail Supporting SMEs in two main ways Source: mygrid Task: What is going to be achieved by executing a process PSM: HOW Validation of process executions Identification of reasoning patterns in process executions 27

28 Contribution 5: The twig join function Based on XML pattern matching algorithms on Directed Acyclic Graphs (Bruno et al., 2002) twig_join detects the occurrence of a pattern in a XML DAG Given P, a process T, a task potentially describing P M, a PSM providing a strategy on how to achieve T i(t), the set of input roles of T o(t), the set of output roles of T D, the DAG resulting from documenting the execution of P twig_join(d,i(t),o(t)) checks whether a twig exists for M that connects i(t) with o(t) in D In this case, PSM M is the pattern to be identified in the process documentation DAG D 28

29 Contribution 5: A twig join example Domain entities Bridges (mapping) PSM entities twig join! 29

30 Contribution 5: The matching algorithm twig_join(ti, D) decompose(ti) Task-method decomposition twig_join recursively applied at each decomposition level Each task decomposed by one or several PSMs (taskmethod decomposition view) Knowledge flow defines the sequence of evaluation twig_join(t11, D) twig_join(t12, D) Knowledge flow twig_join(t13, D) twig_join(t14, D) Backtracking possible at PSM and role levels Interaction 30

31 Contribution 6: A Knowledge-Oriented Provenance Environment PSM- Ontology bridges Matching visualization Provenance query Matching detection 31

32 Outline Introduction and motivation Open research problems and work hypotheses Acquisition of process knowledge by SMEs Provenance analysis of process executions by SMEs Evaluation Conclusions and future research problems 32

33 Objective 1 Evaluation settings 2 Chemistry SMEs, 2 Biology SMEs, and 2 Physics SMEs Judith Lennart Christianne Martina Markus Andreas Syllabus Chemistry: Stoichiometry, solutions and equilibrium (Brown & Lemay, pages 75-83, , and ) Biology: Cell and DNA structure and processes (Campbell and Reece, pages , , , , , and ) Physics: Kinematics and Dynamics (Serway and Faughn, chapters 2,3, and 4) Two main dimensions: usability and utility 33

34 Evaluation results: utilization of the PSM library # of processes modeled SME1 (Physics) 0 SME2 (Biology) 2 SME3 (Biology) 6 SME4 (Chemistry) 0 SME5 (Chemistry) 3 SME6 (Physics) 0 H3: PSMs can reduce the complexity of process KA H6: The proposed methods and tools are flexible and reusable Objective 1 H1: SME empowerment can increase KB quality and reduce costs Total 11 Processes PSMs SME2 (Biology) SME3 (Biology) SME5 (Chemistry) Transition from G2 phase to mitosis Mitosis Mitosis Carbohydrate metabolism Cellular respiration Detoxification Photosynthesis Ribosome protein synthesis Complete ionic equation Molecular equation Net ionic equation n.a. n.a. decompose & combine consume, transform decompose, consume transform form by combination situate & combine form by combination decompose & combine form by combination 34

35 Evaluation results: performance of process models Objective 1 with respect to configuration C0 Query C0 C1 C2 SME3-q0 31 1,00 0 0, ,52 SME3-q1 63 1, , ,25 SME3-q2 31 1, , ,52 SME3-q3 47 1, , ,34 SME3-q4 15 1,00 0 0,00 0 0,00 SME3-q5 32 1, ,50 0 0,00 SME3-q , , ,15 SME3-q7 63 1, , ,49 SME3-q8 47 1, , ,34 SME3-q9 62 1, , ,26 SME3-q , , ,00 Average 79,7 1,00 59,5 0,75 56,4 0,71 Median 47 1, , ,34 Min 15 1,00 0 0,00 0 0,00 Max 203 1, , ,15 <1 - faster =1 - same time as config C0 >1 - slower H5: The proposed methods and tools produce sound and complete executable process models C0 Well-founded evaluation on Concept/attr. names ground off C1 Well-founded evaluation on Concept/attr. names ground on C2 Well-founded evaluation off Concept/attr. names ground on 35

36 Evaluation results: utility and usability Objective 1 Physics SMEs did not use processes Not so important for Chemistry SMEs SME2 (Biology): It makes the representation of biological models easier SME3 (Biology): The modeling of processes is very useful. It must be possible to ask questions about the various states of a process. And asking questions with T&D worked okay System Usability (SU) scale SMEs answered a questionnaire about the system with a quantitative value ranging between 0 and 100 Average obtained: 64,5 H2: Due to its complexity, SMEs require specific means for process KA H4: The method and tools proposed abstract SMEs from the underlying KRR formalism 36

37 Evaluation settings (Provenance Challenge) Brain Atlas Workflow Brain Atlas Provenance Data Flow Objective 2 Catalogation PSM 37 37

38 Evaluation results Objective 2 H7: PSMs can provide SMEs with explanations of process executions Focus on precision and recall metrics Identified at three different layered contexts Method Task Decomposition-level H8: The method proposed identifies the main rationale behind processes by detecting PSM occurrences H9: PSMs describe process executions at different levels of detail Perfect match Partial match No match 38

39 Outline Introduction and motivation Open research problems and work hypotheses Acquisition of process knowledge by SMEs Provenance analysis of process executions by SMEs Evaluation Conclusions and future research problems 39

40 Conclusions Objective 1: To enable SMEs to acquire processes without KEs Qualitative evidence rather than statistical proof (only 6 SMEs) However, evidence found that it is possible to engage users in acquiring process knowledge without the intervention of KEs SMEs using the PSM library (SME3 and SME5) produced more and better quality process models (82%) than the rest (SME2) The method used to create the PSM library has also shown evidence to be reusable in other domains Objective 2: To support SMEs in understanding process executions Semantic overlays e.g. PSMs on top of process documentation provide the required abstractions to analyze provenance from a knowledge perspective Provenance analysis by SMEs favors from a hierarchical structure in such overlays The matching algorithm has not been applied to large PSM libraries and provenance logs 40

41 The ubiquity of processes Business Biology Healthcare Climate prediction Ecology Chemistry 41

42 Future research problems The Web is driving a new computing paradigm through the involvement of users forming online communities Additionally, focus change from data to process The solutions proposed live in the Semantic Web in the small Challenge: move to the Web in the large Process representation and reasoning Performance, coverage, scale Collaboration in user communities Process validation, trust maintenance More expressivity (events, qualitative reasoning) Distributed reasoning algorithms Share and reuse processes Process reliability and validation Incomplete, inconsistent, contradictory knowledge bases Uncertainty, nonmonotonicity Conciliation of partial results Heuristics (assumptions, defaults) Caching Compare and recommend processes Process-specific query mechanisms Trust 42

43 Acquisition and Understanding of Process Knowledge Using Problem Solving Methods Jose Manuel Gómez Pérez Facultad de Informática Universidad Politécnica de Madrid Campus de Montegancedo sn Boadilla del Monte, Madrid Phone: Fax: PhD thesis Date: 14/07/2009