Process Mining for the Real World

Size: px
Start display at page:

Download "Process Mining for the Real World"

Transcription

1 Learning Hybrid Process Models from Events Process Mining for the Real World Smart Data Analytics (SDA) research group, University of Bonn, Wil van der Aalst

2

3

4 industry mobility business data science science health government society

5 Wil van der Aalst Launched in 2013

6 Data Science infrastructure analysis effect as the enabler volume and velocity extracting knowledge people, organizations, society o instrumentation o big data infrastructures and distributed systems o databases and data management o programming o security o... o statistics o data/process mining o machine learning/artificial intelligence o operations research o algorithms o visualization o... o ethics & privacy o IT law o human technology interaction o operations management o business models o entrepreneurship o...

7

8 Uptake of process mining Start of process mining at TU/e (1999) Alpha miner & Heuristic miner ( ) First version of ProM (2004) Evolved from 29 plug-ins in 2004 to 274 plug-ins 2009 (ProM 5.2) to over 1500 plug-ins today. Conformance checking, other perspectives, prediction, etc. ( ) Process mining book, courses, uptake commercial tools, etc. ( ) publications on process mining (2016/17 incomplete) based on Scopus data

9 optimization operations management & research stochastics process mining formal methods & concurrency theory process automation & workflow management process science business process improvement business process management

10 process mining as the missing link

11 discovered using ILP miner 12,666 cases (orders) 80,609 events

12 Taxonomy: Not just CF discovery! Task Perspective Type Process discovery Control Flow (CF) only Offline Conformance checking CF + time Online Extending process models CF + data Decision making CF + resources

13 Taxonomy: Not just CF discovery! Task Perspective Type offline control-flow discovery Process discovery Control Flow (CF) only Offline Conformance checking CF + time Online Extending process models CF + data Decision making CF + resources

14 process model discovered using the inductive miner (showing only the most frequent paths)

15 zooming in

16 using conformance checking to see all deviations

17 bottleneck analysis: enriching the model with performance information

18 animating the event log showing real cases

19 seamless abstraction: one log many views

20 Data-driven & process-centric P data-driven process-centric event data M process models

21 Answering two types of questions performance questions Why are these cases late? Where are the real bottlenecks? Which resources are overloaded? P data-driven process-centric event data M process models How often is the four-eyes principle violated? Which activities are often skipped? Which resources cause deviations? conformance questions

22 Process mining results may hurt and trigger resistance, but this only supports the need for it.

23 Process Mining Software

24 1500+ plug-ins available covering the whole process mining spectrum >130k downloads

25

26 Interaction with industry challenges ideas data new techniques and approaches

27 Job done? Not really

28

29 Boxes and arrows: What do they mean?

30 An example log and proper models 1000 cases and 5000 events Inductive miner, ILP miner, Alpha miner, etc.

31 Model with 3 concurrent activities Too difficult?

32 Concurrency & Semantics Much easier? Semantics?

33 Concurrency & Semantics Do frequencies make sense?

34 Concurrency & Semantics Do frequencies make sense? do not add up were performed in any order

35 Concurrency & Semantics Do times make sense?

36 Concurrency & Semantics Do times make sense?

37 Concurrency & Semantics Correct numbers

38 Concurrency & Semantics hours hours hours hours hours hours Correct times

39

40 faking confidence (even with fitness = 1.0)

41 Need to have a PhD in Petri nets?

42 How to combine the best of both worlds? informal when needed & fast and scalable precise (having formal semantics) whenever possible & useful commercial tools heuristic miner fuzzy miner etc. basic inductive miner ILP miner other region-based approaches etc.

43 Joint work with Riccardo De Masellis, Chiara Di Francescomarino, Chiara Ghidini

44 People do not hate Petri nets (BPMN, etc.): they hate to be precise (when )! Vagueness can be a feature! Semi-structuredness can be deliberate!

45 Wil van der Aalst, Riccardo De Masellis, Chiara Di Francescomarino, Chiara Ghidini: Learning Hybrid Process Models From Events: Process Discovery Without Faking Confidence. BPM July 2016

46 Hybrid Petri nets have three types of arcs a sure and precise b strong causality logic can be captured* determined based on thresholds a sure but not precise b determined based on thresholds logic cannot be captured* a? unsure b weak causality * = easily / of a certain quality / within the representational bias

47 Hybrid Petri nets have three types of arcs informal (annotations that are deliberately vague) formal (firm statements about the inclusion or exclusion of traces) Type II Type I strong causality ( sure )? Type III weak causality ( unsure )

48 Phase 0: Get data

49 Phase 1: Learn a Causal Graph? strong causal relation R S weak causal relation R W start place order? send invoice send reminder cancel order? confirm payment pay prepare delivery end make delivery Different approaches possible (business as usual)

50 Phase 2: Learn a Hybrid Petri Net p1 start p2 place p3 order? send invoice send reminder cancel order? p5 confirm payment p7 pay p4 prepare delivery end p9 p6 make delivery p8

51 Phase 2: Learn a Hybrid Petri Net p1 start p2 place p3 order? send invoice send reminder? cancel order p5 confirm payment p7 a sure and precise b pay p4 prepare delivery end p9 a a sure but not precise? unsure b b p6 make delivery p8

52 Phase 2: Learn a Hybrid Petri Net p1 start p2 place p3 order? send invoice send reminder cancel order? p5 confirm payment p7 a sure and precise b pay p4 prepare delivery end p9 a a sure but not precise? unsure b b p6 make delivery p8

53 Phase 2: Learn a Hybrid Petri Net p1 start p2 place p3 order? send invoice send reminder cancel order? p5 confirm payment p7 a sure and precise b pay p4 prepare delivery end p9 a a sure but not precise? unsure b b p6 make delivery p8

54 Phase 2: Learn a Hybrid Petri Net start place order? pay send invoice send reminder? prepare delivery cancel order confirm payment make delivery end How? Generate candidate places A candidate place is characterized by two sets of activities A input and A output such that sure arcs are connecting any activity in A input to any activity in A output p1 start p2 place p3 order? pay send invoice p4 send reminder? prepare delivery cancel order p5 confirm payment p7 end p9 A input A output p6 make delivery p8

55 Phase 2: Learn a Hybrid Petri Net start place order? pay send invoice send reminder? prepare delivery cancel order confirm payment make delivery end How? Generate candidate places A candidate place is characterized by two sets of activities A input and A output such that sure arcs are connecting any activity in A input to any activity in A output p1 start p2 place p3 order? pay send invoice p4 send reminder? prepare delivery cancel order p5 confirm payment p7 end p9 A input A output p6 make delivery p8

56 Phase 2: Learn a Hybrid Petri Net start place order? send invoice send reminder cancel order Determine the quality of each candidate place p=(a input,a output )? confirm payment pay prepare delivery end make delivery A input A output p1 start p2 place p3 order? pay send invoice p4 send reminder? prepare delivery cancel order p5 confirm payment p7 end p9 Ideally: start empty, finish empty, not negative (compare ILP miner) p6 make delivery p8

57 Phase 2: Learn a Hybrid Petri Net start place order? send invoice send reminder cancel order A input A output? confirm payment pay p1 start p2 place p3 order? pay prepare delivery send invoice p4 make delivery send reminder? prepare delivery end cancel order p5 p6 confirm payment make delivery p7 p8 end p9 Three possible scoring functions: a) Fraction of cases perfectly fitting b) Fraction of relevant cases perfectly fitting c) Global score (extremely efficient)

58 Generate candidate places and evaluate co cancel order cp confirm payment md make delivery e end ({co},{e}) ({cp},{e}) ({md},{e}) ({co,cp},{e}) ({co,md},{e}) ({cp,md},{e}) ({co,cp,md},{e})

59 Generate candidate places and evaluate co cancel order cp confirm payment md make delivery e end ({co},{e}) ({cp},{e}) ({md},{e}) ({co,cp},{e}) ({co,md},{e}) ({cp,md},{e}) ({co,cp,md},{e}) co cancel order cp confirm payment md make delivery e end

60 ProM Implementation sliders start place order? send invoice send reminder cancel order? confirm payment pay prepare delivery end make delivery fuzzy causal graph

61 ProM Implementation sliders sure arc unsure arc fuzzy causal graph p1 start p2 place p3 order? pay send invoice p4 send reminder? prepare delivery cancel order p5 p6 confirm payment make delivery p7 p8 end p9 fuzzy Petri net

62 Parameters Threshold for activity frequency (t freq ) Parameters used to compute strength of relations taking into account concurrency and loops (c and w) Thresholds for strong and weak causalities (t RS and t RW with t RS > t RW ) Threshold for place quality (t replay )

63 Evaluation (see paper and technical report for more details) Behaves as expected, e.g., when t replay goes up fitness goes up and precision goes down arxiv.org/abs/

64 Performance Good, but room for improvement Smartly pruning the set of candidate places (avoid conflicting or less informative places) Lazy place evaluation Distribution/decomposition using e.g. Spark (see joint work with Long Cheng and Boudewijn van Dongen in a slightly different setting)

65 Evaluation is not easy

66 But, the need is obvious

67 But, the need is obvious

68

69

70 p1 start p2 place p3 order? send invoice send reminder cancel order? Learning Hybrid Process Models from Events pay p4 prepare delivery p5 p6 confirm payment make delivery p7 p8 end p9 Process Discovery Without Faking Confidence

71 Looking for talented PhDs and Postdocs!!