Optimized Execution of Business Processes on Crowdsourcing Platforms

8th International Conferene Conferene on Collaborative Computing: Networking, Appliations and Worksharing, Collaborateom 212 Pittsburgh, PA, United States, Otober 14-17, 212 Optimized Exeution of Business Proesses on Crowdsouring Platforms Roman Khazankin, Benjamin Satzger, and Shahram Dustdar Distributed Systems Group, Vienna University of Tehnology Argentinierstrasse 8/ 184-1, A-14 Vienna, Austria {lastname}@dsg.tuwien.a.at http://dsg.tuwien.a.at/ Abstrat-Crowdsouring in enterprises is a promising approah for organizing a flexible workfore. Reent developments show that the idea gains additional momentum. However, an obstale for widespread adoption is the lak of an integrated way to exeute business proesses based on a rowdsouring platform. The main differene ompared to traditional approahes in business proess exeution is that tasks or ativities annot be diretly assigned but are posted to the rowdsouring platform, while people an hoose deliberately whih tasks to book and work on. In this paper we propose a framework for adaptive exeution of business proesses on top of a rowdsouring platform. Based on historial data gathered by the platform we mine the booking behavior of people based on the nature and inentive of the rowdsoured tasks. Using the learned behavior model we derive an inentive management approah based on mathematial optimization that exeutes business proesses in a ost-optimal way onsidering their deadlines. We evaluate our approah through simulations to prove the feasibility and effetiveness. The experiments verify our assumptions regarding the neessary ingredients of the approah and show the advantage of taking the booking behavior into aount ompared to the ase when it is partially of fully negleted. Index Terms-Human-entri BPM, Crowdsouring, Inentive Management, Adaptive Proess Exeution I. INTRODUCTION Today's fast hanging business environments require ompanies to be highly flexible in order to stay ompetitive. Short, unpreditable business yles and flutuations, rapidly emerging tehnologies and trends, globalization, and the global interonnetedness provided by the Internet have inreased the world's eonomial lok speed and ompetition among ompanies. Reent developments in IT promise to help ompanies to ope with the new requirements they are faing. Soial networks an be leveraged to support people in loosely oupled and open team strutures to effiiently ollaborate. Web-based rowdsouring, on the other hand, allows to broadast tasks to a large network of people, the rowd, via an open all. Crowd members voluntary agree to solve rowd soured problems motivated by inentives, suh as money or soial reognition. Crowdsouring has the potential to give ompanies flexible aess to a talent pool of almost unlimited size. In fat, aording to an internal strategy doument that leaked out in early 212, IBM plans to employ a radially new business model [1]. It involves to let the ompany run by a small number of ore workers. A dediated Web-based platform is used to attrat speialists and to reate a virtual "loud" of ontributors. Similar to loud omputing where omputing power is provided on demand, IBM's people loud would allow to leverage a flexible on-demand workfore. Today's rowdsouring systems are still relatively simple and only suitable for non-ritial, atomi tasks requiring minor efforts. In partiular, Amazon offers a task-based rowdsouring marketplae alled Amazon Mehanial Turk (AMT) [2]. Requesters are invited to issue human-intelligene tasks (HITs) requiring a ertain qualifiation to AMT. The registered ustomers post mostly tasks with minor effort that, however, require human apabilities (e.g., transription, lassifiation, or ategorization tasks [3]). We foresee that in the future ompanies will inreasingly use rowdsouring to address a flexible workfore. However, it is still an open issue how to arry out business proesses leveraging rowdsouring. Most task-based rowdsouring platforms simply provide requesters the possibility to publish simple task desriptions into a database all workers have aess to. A task desription in AMT for instane onsists of a title, textual desription, expiration date, time allotted, keywords, required qualifiations, and monetary reward. Business proesses or workflows, on the other hand, desribe a logial struture between tasks rowd souring platforms annot handle. The main problem is, however, that people book tasks voluntarily in rowdsouring, whih means the only way to influene booking and exeution times of single tasks is to either hange inentives or modify other aspets of a task, e.g., define a later deadline. The ontribution of this paper is an enterprise rowdsouring approah that exeutes business proesses on top of a rowdsouring platform. For eah single task in the business proess we reason about the optimal values for inentive and time allotted when rowdsouring them. The goal is to arry out the business proess with minimal investments before the deadline. During exeution of the business proess we onstantly monitor the progress and adjust these values for tasks that have not been booked by a worker yet. Our approah for alulating optimal values is based on mining historial data about task exeutions, e.g., whih influene higher rewards have on the booking time, analyzing the urrent status of the business proess, and quadrati programming, whih is 443 978-1-936968-36-7 212 lest DOl 1.418/ist.ollaborateom.212.25434

a mathematial optimization fonnulation that an be solved effiiently. We evaluate our approah through simulations for different proess sizes and strutures. The experiments show the effetiveness of the approah and demonstrate its adaptivity to the poorly preditable rowdsouring environment. The paper is organized as follows. In Setion II we introdue a motivating senario for our work. Setion III introdues the approah in detail. Results of onduted experiments are presented in Setion IV. Related work is disussed in Setion V. Finally, Setion VI onludes the paper. Crowd souring Platform Logs I I I = I Type I Effort I Time Allotted I Reward I Booking Time I I.Net I 24h I 14d I.Net I 12h I 4d.Net I 36h I 5d.Net I 36h I 5d UI Design I 12h I 8d I... I... I $52 I $325 I $57 I $85 I $45 I... I ld I 5h I 6d I 5h I 3h I... Fig. 2. The rowdsouring platform maintains a database ontaining information related to the proessing of eah task. II. MOTIVATING SCENARIO We onsider a senario of a large software ompany that plans to install a enterprise-internal rowdsouring platform for software development tasks. The platform allows employees to book tasks related to software development. Figure 1 shematially shows how tasks are presented to the employees, whih is similar to most task-oriented rowdsouring platfonns like Amazon MTurk. Type Desription Ready to Effort Time Reward start Allotted JavaSript Clik here now 21h 5d $2,9 Book VI Design Clik here Apr I, 2pm 12h 7d $81 Book.NET Clik here Apr 3, 9am 3h 14d $11 Book..................... Fig. 1. Shemati UI for the enterprise-internal rowd souring platform of a large software ompany. Eah task is desribed by a type and a textual desription. The third olumn gives an estimate when the employee will be able to start working on the task. Sine a task may require input from other tasks the atual start date and time an deviate from the announed one. Effort provides information about the time effort neessary to finish the task. Time allotted defines the time frame in whih the task is supposed to be proessed. It starts when the task is ready to start or when the task is booked, whatever is later. The reward tells the employee how muh he will get for suessfully proessing the task on time. Instead of money, rewards ould also onsist of more abstrat reward points. When an employee books a task s/he is responsible for delivering the results within the allotted time. Tasks an be booked before they are ready to start. As long as tasks are not booked the system may modify allotted time and reward. The rowdsouring platform generates and stores log infonnation for eah proessed task, as illustrated in Figure 2. We assume that at least information regarding the task type, time effort, time allotted, the reward for whih the employee atually booked the task, and the time it took from publishing to booking is stored for eah task proessed via the platform. Usually paying muh for a task would redue its booking time; also, having a high allotted time should make tasks more attrative ompared to tasks with a tight deadline. However, the software ompany has problems to map its business workflows to the rowdsouring platfonn. Figure 3 shows a workflow desribing a business proess the ompany wants to exeute. The aim of the proess is to integrate a new Fig. 3. An exemplary business workflow desribing the development of a new plugin for a software produt. The plugin onsists of two features that need to be developed, integrated, and deployed into the software produt. Eah implementation step is followed by testing. There is a deadline for the ompletion of the whole plugin. plugin into an existing software produt. The plugin onsists of two features that together make up the funtionality of the plugin. Eah feature onsists of three tasks, the atual implementation and the writing of test ases, whih an be done in parallel, and the testing of the implementation using the test ases. After both features have been implemented and tested, an integration and deployment step is neessary to ensure proper installation into the software produt. Three different testing tasks are to ensure the high quality of the plugin; all three are based on the integration test ase. The introdued business proess is simple yet helps to understand the hallenges addressed by this paper. The question is how to rowd soure the tasks of the workflow using the rowdsouring platform, i.e., how to set the values for the rowdsoured tasks. Type, desription, and estimated effort of eah single task typially are already available; the approximate ready-to-start times an be omputed by the rowdsouring platform one it has sheduled all predeessor tasks. This is relatively straightforward yet not trivial sine it involves omputation based on onstant monitoring and realulations to over deviations from the shedule, e.g., delayed or early finished tasks. However, there is no obvious solution at all for how to determine the values for time allotted and reward. Time allotted should be assigned in a way to ensure the adherene to the deadline. Rewards should onsider the usual "market pries" of the respetive tasks, but also sometimes be inreased to strengthen the ompetition among employees and ensure that tasks ritial to the suess of the workflow are timely booked. Also, in ase the allotted time is short to proess the task, then this should be refleted in the reward. 444

In this paper we introdue a novel algorithmi approah to rowdsoure tasks that belong to a business workflow. We try to map tasks to the rowdsouring platform suh that the deadline is met and the aggregated rewards are minimized. In the next setion we present our approah in detail. III. ApPROACH The goal of our solution is to ensure the timely exeution of business proesses that ontain rowdsoured tasks while minimizing the expenses assoiated with rowd souring rewards. We assume that the rowdsouring platform allows to speify allotted time and reward for eah task (as desribed in Se. II). The expenses an be redued by setting lower rewards for tasks, however, if the reward is too small, a task might stay not booked for too long, if booked at all. Suh situations an signifiantly affet the exeution of the proess, and beome a reason of missed deadlines. The allotted time also affets expenses: it is less likely that an employee deides to take an urgent task for a regular reward. Hene, there an be more employees interested in a non-urgent task at a lower reward, beause some of them might have less experiene in this type of tasks, and would like to improve, but need more time allotted. Obviously, the time allotted also diretly influenes the proess ompletion time. The main idea of our approah is to find a most benefiial trade-off between rewards, allotted times, and expeted booking times for rowd soured tasks in the proess. Speifially, we address the following questions: How to estimate booking time? The time it takes someone to book a task after it is announed in the platform, or the booking time, an be influened by various fators. However, we are onvined that it is driven mostly by the strength of the ompetition among employees. Therefore, tasks whose time allotted and reward ombination satisfies demands of more people are generally booked earlier, and vie-versa. Undoubtedly, even if an employee is satisfied with the time allotted and the reward offered, s/he an still refuse to book a task, beause of being too busy, not interested in this partiular type of tasks, or just not being in the mood. Nevertheless, if the rowd is large enough, the trend should remain. Our approah is to determine this trend using platform logs (see Se. II). How to optimize allotted times and rewards? The optimization should onsider the dependeny between booking times, rewards, allotted times, and the struture of the proess. Also, it an happen that something goes not as expeted (e.g., a worker delays a ritial task or ompletes it signifiantly earlier), so either it beomes neessary to get some tasks done faster to ope with a deadline or an opportunity to ut more osts emerges. The optimization therefore should perform adaptively, and onsider the proess state as well. When should tasks be published in the rowdsouring platform? If a task is booked by an employee, then the platform undertakes a ommitment and an't demand the employee to perform faster or hange the reward for this task any more. However, as mentioned above, an adaptive behavior an be advantageous, and it an be more benefiial to publish tasks later. But this should not be done too late and be aligned with the proess exeution state. Our approah is to publish a task when the sum of its optimized booking and allotted times, and expeted exeution time of subsequent ativities in the proess is almost as long as the time left before deadline. In Setion IV we experimentally prove that suh an approah produes optimal results. We thus map the desribed funtionality to omponents and propose a framework for a deadline-driven reward optimization for proesses ontaining rowdsoured tasks (see Fig. 4). The estimator ollets the statistial data from the platform logs and estimates the funtional dependeny for eah type of task CD. The optimizer retrieves struture and state of proesses @, booking state of already published tasks, funtional dependenies, and determines optimal values for booking time, reward, and time allotted. These values are further used by the publisher whih announes tasks to the platform at appropriate time and updates them if needed. -BProess Engme- u=- Fig. 4. Optlml ze r ----- minimize f(x) = Y, xt Qx + T x subjet to Ax "b Estllnato r ---- The arhiteture of the framework Subsetions below provide a detailed desription of the orresponding omponents of the framework. Estimation should be performed for all task types before rewards an be optimized. After some time, it an be re-exeuted to onform to the rowd's hanging harateristis. The optimization and publisher are ativated periodially thus realizing the adaptive behavior. A. Estimator As desribed in Se. II, eah log entry orresponds to a proessed task and inludes task type, weight (e.g., in hours of effort), allotted time, booking time, and reward. Let us referene to these values as T, W, t, bt, r. The estimation for eah 445

task type is done independently. Therefore, for a partiular type of task, an entry an be represented as < w, t, bt, r >. Assuming that reward and time allotted linearly depend on the weight, and booking time depends on the ombination of reward and time allotted, we an onsider a mapping bt' = f(t', r') populated with log entries as follows: t' = t/w; r' = r /w; bt' = bt, whih reflets tasks with weight one. Further, we need to estimate a funtion ub( t, r) whih is an upper bound for mapping f for eah (t, r). Even with a weak ompetition, a task an be booked fast due to a oinidene. Therefore, for stable predition in the ontext of deadline fulfillment, we are interested in the maximum time that it takes a task with speified reward and time allotted to get booked. The partiular methods for upper bound estimation an vary aording to the real setup. Finally, using the disovered upper bound, we need to approximate the funtion g(t, bt) that reflets the reward that needs to be set for a speified time allotted and expeted booking time. This funtion will be used in an objetive funtion for optimization. The dataset for approximating 9 is obtained as a set of tuples < ri, ti, ub(ti, ri) > for eah log entry. We argue that 9 should be approximated with a 2nd degree polynomial, beause (i) polynomial-based optimization is well studied [4], (ii) many optimization frameworks support quadrati prograrmning [5], and (iii) the 2nd degree is a good trade-off between optimization omplexity and fitting auray for the problem. In our experiments, the differene in auray between 2nd and 5th degree polynomial approximation was less than 5%, whereas it was more than 2% between 1st and 2nd degree. The estimation should also determine minimum and maximum values for all arguments. The approximated funtion should therefore have the following form: g(t, bt) = al * t 2 + a2 * t * bt + a3 * bt 2 + a4 * bt + a5 An example of an approximated funtion is illustrated in Se. IV. B. Optimizer The goal of the optimization is to fulfill proess deadline, while trying to minimize the offered rewards. Therefore, based on the state and struture of the proess and estimated dependenies, we formulate a quadrati programming problem. The onstraints ensure that the proess an be finished before the deadline onsidering all the booking and allotted times, and the optimization objetive is the sum of the rewards that will be paid for tasks ontained in the proess. We formally represent a proess as a direted ayli graph, where nodes represent tasks and edges represent ontrol flow. The graph has 2 speial nodes that represent the beginning and the end of the proess, namely in and out. Thus, for eah node in the graph, there exists a path from in to out whih ontains this node. A task an be started when all its inoming edges are adjaent to already finished tasks. Besides rowd soured tasks, a proess an ontain simple ativities, whose exeution times are regarded as onstants. For simpliity, in and out nodes an be onsidered dummy, i.e., simple ativities with exeution time O. Suh representation is more general than, e.g., ombination of flow and sequene ativities in BPEL. In this work we do not onsider onstruts suh as onditions or loops for the sake of simpliity, although the approah an be extended to support suh elements. Eah task has a property that indiates its status, whih an be either unavailable, published, ready, started or finished. The proess engine an thus launh only the tasks that are ready. Tasks are marked as published when published in the rowdsouring platform, and hange their status to ready when booked. For eah not yet booked rowd soured task, deision variables for allotted time and for booking time are inluded for optimization. The variables are restrited using the minimum and maximum values provided by the estimator. Two ategories of onstraints are inluded into the optimization model: 1) Constraints overing all the proessing and allotted times throughout all possible proess exeution paths from the urrent state. The ombination of these onstraints ensures that the slowest branh will omplete before the deadline. 2) Constraints overing booking time for unavailable and published tasks and all possible subsequent proess exeution paths. These onstraints ensure that booking times will not endanger the deadline fulfillment. If a task is already booked or represents a simple ativity, then its exeution time is fixed. If it has been already started, then its ompletion time an be estimated for urrent situation. In both of these ases, the exeution time is regarded as a onstant from the perspetive of the optimization. The exat algorithm for building the onstraints is desribed in Algorithm 1. An example of the algorithm's funtionality is shown in Fig. 5. A simple ativity has already started and has been proessed for five time units, as illustrated by the time line. Sine the expeted proessing time for the ativity is 2 time units, the optimizer assumes that the ativity finishes in 15 time units. As there is only one started ativity, the optimizer adds two onstraints of first ategory beause there exist two paths from this ativity to out ativity. As all the other tasks are either unavailable or published, the onstraints of the seond ategory should be reated for them. Therefore, two onstraints for booking time t2 are generated, overing both suessor paths. For both t3 and t4, only one onstraint is generated for eah single path to out. The optimization objetive is omposed of a sum of rewards using the saled values of estimated dependeny funtions: min L)gtypes (ts/ws, bts) * ws) ses where S is the set of all tasks in the proess, gtypes represents the dependeny funtion for the type of task s, ts, bts are deision variables, and Ws is the weight of task s. If the optimization problem turns out to be infeasible, the optimizer should try to extended the deadline and try again, 446

until a feasible solution is found. This will ensure that even if the deadline annot be met, then the best possible solution will be provided. Algorithm 1 Algorithm for reating optimization onstraints input : timetodeadline, proessgra ph all : reateconstraints ([I, proessgra ph.outnode) reateconstraints( list path, task t) { add t to the beginning of path; foreah (inoming edge e of t) { get adjaent node t' whih is soure of e; if (status of t' is not finished ) reateconstraints(path, t '); } if (there were no inoming edges with adjaent not finished nodes) addconstraint(path, 1 st type ); else addconstraint(path, 2nd type); remove t from path; addtermstoexpression(list path, onstraint expression expr) { foreah (t in path) if (status of t is unavailable or published) add deision variable for allotted time of t to expr; else add expeted exeution time left for t as a onstant to expr; } addconstraint( list path, onstraint type Type) { t = first task in path; if (Type is "2nd type" and status of t is not either unavailable or published) return; reate onstraint expression expr; if (t is unavailable) add deision variable for booking time of t to expr; if (t is published) add predited time for t to be booked as onstant to expr; addtermstoexpression(path, expr); add optimization onstraint [expr < timetodeadline ]; } 15 + t2 + t3 < 1 15 + t2 + t4 < 1 bt2 + t2 + t3 < 1 bt2 + t2 + t4 < 1 bt3 + t3 < 1 bt4 + t4 < 1 Fig. 5. Example for the generation of optimization onstraints for the quadrati programming problem formulation. C. Publisher A task is published in the platform when the sum of its expeted booking time, allotted time, and expeted exeution time of subsequent ativities in the proess is almost as urrent time to deadline. In other words, the time to deadline when it should be published an be determined by (i) finding the longest path from the task node to out node in the proess graph, where weights of edges are set to the determined optimal values for time allotted of their soure nodes, and (ii) adding the booking time of the task to this value. A task an also be updated after being published if it has not been booked yet. In pratie, the time to deadline value whih is provided to optimizer and publisher an be lower than the real value in order to keep a small fration of time reserved for handling unexpeted events. IV. EVALUATION As we mention in Setion V, best to our knowledge there are no similar approahes. Therefore, we were not able to make a omparative evaluation. Beause the distinguishing feature of our approah is onsideration of ompetition in rowd souring and booking time, we ompare the effetiveness of the optimization with ases when booking time is partially or fully negleted. We also empirially prove the optimal hoie of task publishing time, and evaluate the overall performane overhead of the optimization omponent. To evaluate our approah, we examined a prototype implementation of the framework (aessible online I ) in a simulated environment. We used MATLAB surfae fitting for funtional approximation, and GUROBI [6] framework for solving the quadrati optimization problem. We used a disrete time model, so time was measured in arbitrary integer units. A. Simulation setup Workers. The size of the simulated rowd was assumed to be 1 workers. Platform logs were generated assuming that, at any point, only about 5% of them are willing to use the rowdsouring platform (at various times those an be different workers). For every task type, eah worker was assigned two values: the least time allotted that s/he needs to finish a task of this type with weight 1, and the minimum aeptable reward. These values were generated using the normal distribution. Then, for a random sampling of time allotted and reward pairs, 2 log entries were reated. To pik a sensible booking time for a log entry, the ompetition value was alulated as a number of workers from the rowd, whose least time allotted and minimum aeptable reward were less than the log entry had. Then, assuming that only 5% of the potential ompetitors would atually ompete for the task, the booking time estimation was guided by the probability that at least one of ompeting workers books the task before time n, assuming that the time of booking the task by one worker follows the normal distribution. The atual value was determined by stepwise inreasing n and omparing a uniformly distributed random value with this 1 http://www.infosys.tuwien.a.at/prototypelinentivemgmt 447

TABLE I TASK GENERATION PARAMETERS Pro perty Type 1 Type 2 Type 3 Ayg Dey Ayg Dey Ayg Dey Reward 1 15 5 7 8 1 Time allotted 2 3 15 3 13 2 Booking time for 3 9 2 8.5 15 5 one worker probability. One it happens that the random value is less than the probability, n is the booking time. Suh a method overed oinidental fast bookings while generally exhibiting the trend assoiated with ompetition. This method was also used to simulate atual booking while performing experiments, i.e., the rowd simulator was not informed about the booking time hosen. Tasks. Three types of rowdsoured tasks were simulated. Eah task was desribed by average reward and allotted time, booking time by one worker, and orresponding deviations. The types of tasks and their generation parameters are shown in Table I. The estimated dependenies between booking time, allotted time, and reward for tasks of Type 1 are illustrated in Fig. 6. 1 ::!o Booking time -r 3 3 25 Time allotted 14 I:!O 1 Fig. 6. Estimated quadrati polynomial that desribes the dependenies between booking time. allotted time. and rewards for tasks of Type 1. Proesses. The simulator randomly generated proesses with different sizes. We simulated small proesses (5-1 tasks) and big proesses (1-3 tasks), inluding all types of rowdsoured tasks and simple ativities. We believe that bigger numbers are not realisti in a real setup, beause usually business logi is lustered into onise ompositions that are then managed on a higher level. Weights for tasks were seleted randomly from the range [.5,5]. B. Experiments We ran the prototype in different simulation settings by randomizing proess strutures, and by emulating the inauray of task exeution and booking times (the results from different random generation seeds were averaged). The atual exeution time for tasks was set using normal distribution with deviations of.1 and.2 of the supposed exeution time. We onsidered both ases when the deadline ould and ould not be adhered thus exploring how different parameters of the approah impat both ritial and nonritial situations. We ompare the results based on average reward, total time penalty (time penalty is a delay of a proess with regard to the deadline for ases where deadline was missed), and number of missed deadlines produed, as these indiators fully reflet the goal of the approah. Publishing time. In our solution a task is published when the differene between the time left before deadline and the sum of the task's allotted time, and expeted exeution time of subsequent ativities in the proess (let us refer to this value as booking buffer) is equal to its deided booking time. In order to prove that this is the optimal hoie, we performed experiments where tasks were published at earlier and later times. The results are shown in Figure 7. We used booking buffer values equal to.2,.5, 1,2,3,4,5,6 multiplied by the task's deided booking time, and, finally, we tested the ase when the tasks were published in the platform immediately after a proess was started (OnStart mark in the figure). It an be learly seen that booking buffer equal to the expeted (deided) booking time produes optimal results. When it is less than this value, the tasks are not booked on time, so the optimizer has to ompensate that by putting higher awards, and, regardless of that, more deadlines are missed beause of these delays. When booking buffer is greater than the deided booking time, then there is less room for maneuvering to handle unertainty in exeution and booking times, beause tasks beome booked earlier and their parameters annot be hanged anymore. This results into more missed deadlines and bigger penalties. This behavior then gradually hanges in an opposite way, whih an be explained by the fat that the real booking time an be longer than the estimated one, and therefore the impat of this inauray is redued for bigger values of booking buffer. However, the number of missed deadlines remains at least 25% greater than in the ultimate ase when all the tasks are published when the proess is started; rewards and time penalties in this ase are almost the same. Booking time. Consideration of booking time is a key feature in our approah. To analyze the effetiveness of this feature, we ompare the full-featured optimization to the ase where booking time onstraints (seond type of onstraints, see Setion III) are ompletely removed from the optimization problem (tasks are still published at the appropriate time aording to the expeted booking time whih inferred from the deided rewards and allotted times), and to the ase where average booking time is hosen for eah task. The results for small and big proesses are shown in Figures 9 and 1 respetively, depiting the same indiators as in the previous set of experiments. As the results show, hoosing an average booking time always results in approximately l3% more expenses for all proess sizes, and produes more or almost as muh penalties and deadline misses as the full-featured optimization does. 448

!!! ::J III CO 1 C '5 Q. III u o ' v ::J m. Q. e "- 1.6 --- Reward --e- Penalty time --e-- Nurrber of nissed deadlines 1 2 3 4 5 6 OsStart Booking buffer (the assumed booking time multiplied by the value on this axis) Fig. 7. Publishing time. This figure shows the effet when the booking buffer is varied and tasks are published not as suggested by our approah (1 on the x-axis). " C ()) 4 35 3." 25 " ()).<::: 2 N 15 E 1 a 5 approahes. In perfet onditions this assertion would hold. However, on the one hand, the booking time is estimated as an upper bound, therefore, the booking an often take less time than predited. On the other hand, when the ompetition is too low, it an have the opposite effet: a task, whih is assumed to be booked and exeuted earlier and osts more, an be eventually delayed more than a task whih osts less and was expeted to be booked later. Therefore, estimating realisti booking times is one of the key requirements of this approah. The auray of booking time in our experiments was 92%, whih resulted in at most 2% of more missed deadlines and penalties. :J 1.2 > ' [ 1.1 8 ' ()).9 OJ :E.8 o a. e Il..7- =J Full-featured optimization =J Optimization with average booking time _ Optimization without booking time onstraints r-... r--.. l,... _.. L,... Reward Time penalty Missed deadlines Fig. 9. Effet of negleting the booking time in optimization for small proesses (5-1 tasks). 2, 4, Number of onstraints 6, Fig. 8. Performane overhead. This figure shows the dependeny of the performane overhead on the number of onstraints in the optimization problem for one run. This happens beause some tasks do not need to be booked fast, and the full-featured optimization would pik less ompetitive and therefore less expensive values for rewards. Disabled booking time onstraints do not affet the indiators for big proesses. This an be explained by the fat that there is almost always enough time for booking the tasks in long proesses. Only first tasks of the proess an be delayed, but it does not affet the overall performane. Also, booking times are always hosen to be maximal in this ase beause they are not onstrained and it redues the paid rewards. However, for smaller proesses, the onsideration of booking time is ruial, as it an stronger affet the relatively short proess exeution time, The unonstrained ase produes 14% more deadline misses and penalties. One an argue that the full-featured optimization should perform at least with the same performane as two other :J 1.2 > ' [ 1.1 8 ' ()) :J OJ.9 > OJ t: a. e Il..8 =J Full-featured optimization =J Optimization with average booking time _ Optimization without booking time onstraints r-.7 -.. L--.-.-.-.--. - -.. <--- Reward Time penalty Missed deadlines Fig. 1. Effet of negleting the booking time in optimization for big proesses (1-3 tasks). Performane overhead. The overhead of optimization depends on the number of variables and the number of onstraints, However, the number of variables for our problem 449

is proportional to the number of tasks involved, whih was less than or equal to 3, and the variane within this limit did not affet the overhead. However, the number of onstraints is proportional to the total amount of all possible paths that go through in and out ativities (see Setion III), and this number depends on the proess struture and sales from one to tens of thousands. Figure 8 depits the overhead dependeny on the number of onstraints for one optimization run 2. For small proesses (up to ten tasks), the worst ase is when there is a sequene of five onstruts eah with two ativities in parallel, the number onstraints is less than 2 5 * 2 = 64, whih implies that an optimization run for a small proess always takes less than.1 seond. For bigger proesses, the worst ase is 2 15 * 2 = 65535, so an optimization run an take up to 35 seonds in this ase. In both ases, the overhead is aeptable for performing periodial adaptations in proesses with human tasks whih an span from several minutes to hours or days. C. Disussion The results learly show that booking time should be onsidered when publishing tasks to ahieve the best adaptable behavior, beause the best results are ahieved when the booking buffer equals to the estimated expeted booking time. Booking time onstraints for optimization are not always favorable, however. They should not be used for bigger proesses (more than ten tasks in out setting), but beome important for smaller proesses. Suh smaller proesses an emerge in, e.g., agile software development environments, where work is organized into short yles with small sets of tasks. V. RELATED WORK In this work we ombine rowdsouring and business proess exeution. Major industry players have been working towards standardized protools and languages for interfaing with people in a servie-oriented way, whih may be used as tehnial foundation for implementing our ideas in real businesses. Speifiations suh as WS-HumanTask [7] and BPEL4People [8] have been defined to address the lak of human interations in servie-oriented businesses [9]. Although some prospetive features and possible extensions for dynami resoure management were outlined for these standards [1], [11], however, they have been designed to model interations in losed enterprise environments where people have predefined, mostly stati, roles and responsibilities. The area of QoS-aware omposition of Web servies has many similarities to the topis addressed in this work. Web servie ompositions reate value-added servies by omposing existing ones. Here the question arises whih servies to hose for partiipation in a omposite servie, given that there are many available Web servies providing equivalent funtionality. Liangzhao Zeng et al. [12] propose a QoS-aware middleware for seleting Web servies that maximize user 2 Hardware used: Intel Core 2 Quad 2.4 GhZ with 6 GB of RAM satisfation modeled as utility funtions. They define multiple quality riteria, i.e., exeution prie, exeution duration, reputation, suessful exeution rate, and availability. The authors propose servie seletion based on loal optimization and global seletion, onsidering aforementioned quality riteria. In the loal optimization ase servie seletion is done for eah task individually, while the global planning also onsiders the interrelations between servies. Integer Programming is used to solve the global planning problem. Canfora et al. [13] argue that geneti algorithms, while being slower than integer programming, represent an alternative, more salable option. Another work [14] fouses on the optimization of large-sale QoS-aware ompositions at runtime based on QoS speifiation based on onstraint hierarhies. Multiple well-known metaheuristi optimization approahes are applied to solve the optimization problems. The major differene between Web servie omposition and rowdsouring of business proesses is that Web servies whih are to exeute a funtionality an be diretly hosen while humans in the rowd are self-determined and at autonomously. The reent trend towards olletive intelligene and rowdsouring an be observed by looking at the suess of various Web-based platforms that have attrated a huge number of users. Well known representatives of rowdsouring platforms inlude Yahoo! Answers [15] (YA) and the aforementioned AMT [2]. The differene between these platforms lies in how the labor of the rowd is used. YA, for instane, is mainly based on interations between members. Questions are asked and answered by humans, thereby laking the ability to automatially ontrol the exeution of tasks. In ontrast, AMT offers aess to the largest number of rowdsouring workers. With their notion of HITs that an be reated using a Web servie-based interfae they are losely related to our aim of mediating the apabilities of rowds to servieoriented business environments. Aording to one of the latest analysis of AMT [3], HIT topis inlude, first of all, transription, lassifiation, and ategorizations tasks for douments and images. Furthermore, there is also tasks for olleting data, image tagging, and feedbak or advie on different subjets. Related work in the area of rowdsouring also inludes experiments about behavioral aspets [16], impat of inentives on the performane of rowdsouring [l7], and impat of dynami learning environments on the quality of rowdsoured tasks [18]. These studies partially onfirm our assumptions (e.g., that higher inentives generate more interest and produe results faster), however, none of these works fous on internal enterprise rowdsouring, and none of them disuss the ontrolled exeution of business proesses using suh platforms. While this paper fouses on how to take a workflow and optimally onvert the subtasks into rowdsouring tasks, there is also researh about how to map rowd souring tasks to suitable workers. A possibility is to use autioning mehanisms for implementing suh a mapping [19], [2]. All workers 45

that meet the mlfilmum requirements for a partiular are invited to submit a bid to an aution reated for assigning the task. The winning bid is determined by a ombination of the workers' suitability for the task and the bids' pries. For improved reliability and training of workers a single task may be rowd soured multiple times. Another possibility is to use sheduling [2 1] and speifially assign most suitable workers to pending tasks, while allowing workers to be flexible and to hoose periods when they are available for work. VI. CONCLUSIONS AND FUTURE WORK In this paper we presented an approah that allows to adaptively exeute business proesses on top of an internal ompetition-based rowdsouring platform. The main feature that distinguishes our approah from other workflow and proess optimization methods is onsideration of time that it takes a rowd to book a task. We proposed a method for estimating the funtional dependeny of booking time by using statistial data, presented an algorithm for onstruting an optimization problem, and empirially determined the optimal task publishing tehnique. The results show that our model is effetive for adapting the properties of tasks in a rowdsouring platform to adhere to proess deadlines and to minimize the rewards. We disovered that booking time should be onsidered when publishing tasks to ahieve the best adaptable behavior, and that taking booking time into aount in optimization an redue deadline misses up to 14%. The approah an also be used to predit feasibility and expenses for a speified deadline by running a simulation like ours, therefore allowing to expliitly observe the tradeoff between proessing time and assoiated osts. It is fair to notie that the approah assumes booking time to be omparable to exeution time of tasks. So, for example, if there is a strong overall ompetition in a rowdsouring platform and tasks are taken not long after being published, then booking time might have a weaker effet on the proess. It an be also disovered that booking time depends on the weight of a task. Suh a dependeny an be mined from logs and an be handled by our model as well. In a real senario, the results will heavily depend on the auray of booking time dependeny approximation. Also, we do not onsider "unbooking" or the possibility of renegotiating reward/allotted time with a worker when the task is already booked. Suh extensions might be an interesting diretion for future work. Also, it would be interesting to ompare our approah to aution-based rowdsouring (e.g., [19]) and to extend it to support ustom penalty funtions. REFERENCES [I] Spiegel Online (In German), http://www.spiegel.de/wirtshaftlunt ernehmenlo,1518,813388,.html, last aess Sep. 212. [2] Amazon Mehnial Turk, http://www.mturk.om. last aess Sep. 212. [3] p. G. Ipeirotis, "Analyzing the Amazon Mehanial Turk Marketplae," SSRN elibrary, vol. 17, no. 2, pp. 16-21, 21. [4] Z. U, "Polynomial optimization problems, approximation algorithms and appliations," Ph.D. dissertation, The Chinese University of Hong Kong, 211. [5] J. Noedal and S. Wright, Numerial Optimization, 2nd ed., ser. Springer Series in Operations Researh and Finanial Engineering. Springer, 26. [6] I. Gurobi Optimization, "Gurobi optimizer referene manual," 212. [Online]. Available: http://www.gurobi.om [7] A. Agrawal, M. Amend, M. Das, M. Ford, C. KeUer, M. Kloppmann, D. Konig, F. Leymann, R. MUller, G. Pfau et al., "Web servies human task (ws-humantask), version 1.," available at http://inubator.apahe.orglhise/ws-humantask_vl.pdf, 27. [8] M. Kioppmann, D. Koenig, F. Leymann, G. Pfau, A. Rikayzen, C. von Riegen, P. Shmidt, and I. Trikovi, "Ws-bpel extension for peoplebpei4people," Joint white paper, IBM and SAP, 25. [9] F. Leymann, "Workflow-based oordination and ooperation in a servie world," in Cooperative Information Systems (CoopIS '6), ser. LNCS. Springer, 26, pp. 2-16. [1] N. RusseU and W. M. Aalst, "Work distribution and resoure management in bpel4people: Capabilities and opportunities," in International oriferene on Advaned Iriformation Systems Engineering (CAiSE '8). Berlin, Heidelberg: Springer, 28, pp. 94-18. [II] D. Shall, B. Satzger, and H. Psaier, "Crowdsouring tasks to soial networks in BPEL4People," World Wide Web, Springer, 212, 1O.17/s1128-12-18-6. [12] L. Zeng, B. Benatallah, A. Ngu, M. Dumas, J. Kalagnanam, and H. Chang, "Qos-aware middleware for web servies omposition," Software Engineering, IEEE Transations on, vol. 3, no. 5, pp. 311-327, may 24. [13] G. Can fora, M. Di Penta, R. Esposito, and M. L. Villani, "An approah for qos-aware servie omposition based on geneti algorithms," in Geneti and evolutionary omputation (GECCO '5). New York, NY, USA: ACM, 25, pp. 169-175. [14] F. Rosenberg, M. B. MUller, P. Leitner, A. Mihlmayr, A. Bouguettaya, and S. Dustdar, "Metaheuristi Optimization of Large-Sale QoS-aware Servie Compositions," in International Conferene on Servies Computing (SCC'JO). Washington, DC, USA: IEEE Computer Soiety, 21, pp. 97-14. [15] Yahoo! Answers, http://answers.yahoo.om/, last aess Sep. 212. [16] G. Paolai, J. Chandler, and P. Ipeirotis, "Running experiments on amazon mehanial turk," Judgment and Deision Making, vol. 5, no. 5, pp. 411-419, 21. [17] W. Mason and D. J. Watts, "Finanial inentives and the "performane of rowds"," in SIGKDD Workshop on Human Computation (HCOMP '9). New York, NY, USA: ACM, 29, pp. 77-85. [18] J. Le, A. Edmonds, V. Hester, and L. Biewald, "Ensuring quality in rowdsoured searh relevane evaluation: The effets of training question distribution," in SIGIR Workshop on Crowdsouring for Searh Evaluation, 21, pp. 21-26. [19] B. Satzger, H. Psaier, D. Shall, and S. Dustdar, "Stimulating SkiU Evolution in Market-Based Crowdsouring," in 9th International Conferene on Business Proess Management (BPM '11), ser. LNCS. Springer, 211, pp. 66-82. [2] --, "Aution-based rowdsouring supporting skill management," lriformation Systems, 212, 1O.116/j.is.212.9.3. [21] R. Khazankin, H. Psaier, D. ShaU, and S. Dustdar, "Qos-based task sheduling in rowdsouring environments," in International Coriferene on Servie-Oriented Computing (ICSOC '11), ser. LNCS. Springer, 211, vol. 784, pp. 297-311. ACKNOWLEDGEMENT This work reeived funding from the EU FP7 program under the agreement 257483 (Indenia) and from the Vienna Siene and Tehnology Fund (WWTF), projet ICT8-32. 451