Learning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad

Size: px
Start display at page:

Download "Learning Based Admission Control. Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad"

Transcription

1 Learning Based Admission Control and Task Assignment for MapReduce Jaideep Dhok MS by Research (CSE) Search and Information Extraction Lab IIIT Hyderabad

2 Outline Brief overview of MapReduce MapReduce as a Service Admission Control Need Our Approach Simulation setup, Results Task Assignment Background Our approach Evaluation and results Conclusions and Future Directions Saturday, July 03,

3 Big Data User generated content 281 Exabytesof data online in 2009, from 5 Exabytes in 2002 Blogs, Pictures, Social Networking Logs, click stream data, session details Scientific data LHC experiments generate 27 TB data every day Saturday, July 03,

4 MapReduce Functional programming used in distributed processing Proposed (in its current form) by Google in 2004, now de facto standard for processing large data sets. Two functions map (per record), reduce (aggregate) Widely used for data intensive processing Data mining Log analysis User behavior study Saturday, July 03,

5 Mechanism Map GroupBy- Reduce Computation and storage are co-located Centralized architecture Heartbeat mechanism Task (re)launch Data replication Speculative execution, data locality Saturday, July 03,

6 Hadoop Open source MapReduce implementation Most popular Deployed on large clusters Saturday, July 03,

7 Resource Management in MapReduce Admission Control Task Assignment Data Local Execution Speculative Execution Job Scheduling Saturday, July 03,

8 This Thesis Admission Control Should we accept a job for execution in the cluster? Task Assignment Which task to choose for running on a given node? Interdependency: important to solve both problems Saturday, July 03,

9 Background Grids Computational, Data, Service Grid Resource Management Resource Brokering Admission Control Global/Local Scheduling Monitoring Grid Information Services Saturday, July 03,

10 Admission Control

11 Admission Control Deciding if and which request to accept from a set of incoming requests Critical in achieving better QoS Also, important to prevent over committing Saturday, July 03,

12 MR as a Service Web services interface for MR jobs Users search jobs through repositories Select one that matches their criteria Launch it on clusters managed by service provider Service providers rent infrastructure from IaaS provider Saturday, July 03,

13 Related Work Three stage utility functions Millennium Risk Reward Positive Opportunity Aggregate Utility Functions In Hadoop Saturday, July 03,

14 Our Approach Based on Expected Utility Hypothesis from decision theory Accept one job at a time Use pattern classifier to classify incoming jobs Two classes Utility functions for prioritizing Saturday, July 03,

15 Utility Functions Three stage Two deadlines Decay parameters Provison for service provider penalty Saturday, July 03,

16 Feature Vector Given input to the classifier Contains job specific and cluster specific parameters Includes variables that might affect admission decision Saturday, July 03,

17 Feature Variables Cluster Specific 1. Used map slots 2. Used reduce slots 3. Pending maps 4. Pending reduces 5. Finishing jobs 6. Map time average 7. Reduce time average Job Specific 1. Number of maps 2. Number of reduces 3. Mean map task time 4. Mean reduce task time Saturday, July 03,

18 Bayesian Classifier Naive Bayes Assumption Works well in practice Use past events to predict future outcomes Application of Bayes theorem while computing probabilities Incremental Learning efficient w.r.t. Memory usage Simple to implement Supervision rules send feedback to classifier Saturday, July 03,

19 Classifier Supervision Success failure criteria Service provider sets the rule In this paper, load management is considered as a rule Rule constraints Consistency Preferably divide candidates into two linearly separable classes Maintain Cluster Load: Load = Ratio of runnable tasks to available processors Saturday, July 03,

20 Evaluation Success/Failure criteria: Load management Simulation Settings Tasks occupy time chosen from normal distribution For maps only one distributions For reduces, three distributions each for shuffle, sort and reduce phase Jobs submitted accroding to exponential distribution Saturday, July 03,

21 Baseline Myopic Immediately select job that has maximum utility Random Randomly select one job from the candidate jobs Saturday, July 03,

22 Algorithm Accuracy Saturday, July 03,

23 Comparison with baseline Algorithm Random Myopic Our algorithm 0.97 Achieved Load Average Saturday, July 03,

24 Meeting Deadlines Saturday, July 03,

25 Performance with Load Cap Saturday, July 03,

26 Runtime with Load Cap Saturday, July 03,

27 Runtime Distribution Comparison Saturday, July 03,

28 Effect of Job Arrival Rate Saturday, July 03,

29 Effect of Utility Functions Saturday, July 03,

30 Task Assignment for Hadoop

31 Related Work Independent task scheduling Popular heuristic algorithms proposed in literature Opportunistic Load Balancing OLB MET Minimum Execution Time MCT Minimum Completion Time Min-Min Max-Min Duplex Suffrage and X-Suffrage Work Queue Replication Saturday, July 03,

32 Related Work Computational Intelligence based methods Genetic Algorithms Simulated Annealing Genetic Simulated Annealing Swarm Intelligence ACO Ant Colony Optimization PSO Particle Swarm Optimization Saturday, July 03,

33 Learning Based Methods Stochastic Learning Automata Decision Trees Bayesian Decision Networks Dynamic Bayesian Networks Saturday, July 03,

34 Existing Hadoop Schedulers Native Hadoop Scheduler Heartbeat mechanism FAIR Scheduler Capacity Scheduler Dynamic Priority Scheduler Saturday, July 03,

35 Learning Scheduler Saturday, July 03,

36 Features of Learning Scheduler Flexible task assignment based on state of resources Consider job profile while allocating Tries to avoid overloading task trackers Allow users to control assignment by specifying priority functions Incremental learning Saturday, July 03,

37 Goals Maintaining specified level of load Freeing user from some configuration details Reducing job runtime is nota direct objective Saturday, July 03,

38 Using classifier Use a pattern classifier to classify candidate jobs Two classes: goodand bad Good tasks don't overload task trackers Overload: A limit set on system load average by the admin Saturday, July 03,

39 Feature Vector Job features CPU, memory, network and disk usage of a job Static node properties Number of processors, maximum physical and virtual memory, CPU Frequency Dynamic node properties State of resources: As obtained from TaskTrackerStatus.ResourceStatus Number of running map tasks Number of running reduce tasks Saturday, July 03,

40 Job Selection From the candidates labelled as goodselect one with maximum priority Create a task of the selected job Currently can assign only one task at a time Saturday, July 03,

41 After Assignment If in the next heartbeat, status reports that TT is overloaded, consider last assignment as incorrect - => The task labelled as goodwas actually bad Update (train) classifier Saturday, July 03,

42 Overload Rule Theoretically any rule which is a linear combination of resources can be used For convergence, a rule should be consistent Load average was chosen because: It is better indicator of contention at a node Usually CPU or Disk I/O heavy jobs have predictable effect on load Load averages are more effective in achieving load balancing Saturday, July 03,

43 Priority (Utility) Functions Policy enforcement Maps before reduce FIFO: U(J) = J.age Capacity: U(J) = exp(g - C) G = guaranteed % of allocations C = actual % allocated Avoid starvation If priority of all jobs is equal, scheduler will always assign task that has the maximum likelihood of being labelled good. Saturday, July 03,

44 Job Profile Users submit 'hints' about job performance Estimate job's resource consumption on a scale of 10, 10 being the highest. This data is passed at job submission time through job parameters: learnsched.jobstat.map - 1:2:3:4 learnsched.jobstat.reduce Saturday, July 03,

45 Classifier Details Naive Bayes Classifier Easy to implement Can learn incrementally (one sample at a time) Known to work well in a number of cases Assumes feature variables are conditionally independent of each other Saturday, July 03,

46 Classifier Convergence Difficult to achieve 100% accuracy Naive Bayes Assumption Communication delay Age of resource information Accuracy of resource information Saturday, July 03,

47 Advantages Over fixed slot based approach Flexible assignment Especially useful for tasks that consume less resources Does not assign too many tasks of 'heavy' jobs Overload limit can be increased or decreased and number of assigned tasks changes accordingly Currently needs JT restart. Saturday, July 03,

48 Evaluation Evaluation work load TextWriter WordCount WordCount+ 10ms delay URLGet URLToDisk CPU Activity Saturday, July 03,

49 Learning Behavior Saturday, July 03,

50 Load Management Saturday, July 03,

51 Classifier Accuracy Saturday, July 03,

52 Limitations Load average is a damped average -> Rises and falls slowly Might cause over assignment when rising from low load to high load: can be fixed with a limit on number of new tasks Some period of under utilization until load average falls below user limit. Updated only every 5 seconds. Load average is not that effective for network intensive tasks Can assign only one task at a time Learning is slow for tasks with unpredictable behavior Saturday, July 03,

53 Conclusions Feedback informed classifiers can be used effectively Better QoS than naive approaches Less runtime => happy users => more revenue for the service provider Saturday, July 03,

54 Future directions Admission control for HDFS, resource brokering in the cloud and interactive SaaS Extending the task assignment algorithm to global criteria like bandwidth control, power aware scheduling Overload rules that combine several resources: CPU, network, load, memory etc. Saturday, July 03,

55 Acknowledgements Chid and Preeti from Yahoo! India R&D for support Reddyraja Annareddy from Pramati Saturday, July 03,

56 Related Publications Learning Based Opportunistic Admission Control Algorithm for MapReduce as a Service, Jaideep Dhok, Nitesh Maheshwariand VasudevaVarma, In the proceedings of 3 rd India Software Engineering Conference, Mysore Feb *Won best student paper award * Using Pattern Classification for Task Assignment in MapReduce, Jaideep Dhok, and Vasudeva Varma Saturday, July 03,

57 Questions? Saturday, July 03,

58 Thank You Saturday, July 03,