Learning Dynamics Models w/ Terrain Classification

Size: px
Start display at page:

Download "Learning Dynamics Models w/ Terrain Classification"

Transcription

1 Introduction RL for Robotics Experiments Extensions Efficient Learning of Dynamics Models Using Terrain Classification Bethany Leffler Chris Mansley Michael Littman Department of Computer Science Rutgers University International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems, 2008

2 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Robotic Motivation of Navigation Task

3 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Robotic Motivation of Navigation Task

4 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Robotic Motivation of Navigation Task

5 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Model-Based RL

6 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Model-Based RL Dynamics of the agent are learned

7 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Planning is done with respect to the model Model-Based RL Dynamics of the agent are learned Planning is done with respect to the model

8 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Planning is done with respect to the model Assumes that the dynamics fundamentally do not change Model-Based RL Dynamics of the agent are learned Planning is done with respect to the model

9 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Planning is done with respect to the model Assumes that the dynamics fundamentally do not change Model-Based RL Dynamics of the agent are learned Planning is done with respect to the model Assumes that the dynamics might be different at every single state

10 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Exploration vs Exploitation

11 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Environment Model Matching

12 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Environment Model Matching

13 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Environment Model Matching

14 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Additional Assumptions Dynamics Indicator There exists a function that predicts when the dynamics changes This function is typically just a single feature from the state space

15 Introduction RL for Robotics Experiments Extensions RAM Task Relocatable Action Model (RAM) [Sherstov and Stone, 2005] S - States A - Actions R : S R - Reward C - Clusters or Groups of States κ : S C - Clustering Function O - Outcomes t : C A Pr(O) - RAM η : S O S - Next-State Function

16 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space

17 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space Classify States

18 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space Classify States Learn Dynamics

19 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space Classify States Learn Dynamics Generalize

20 Introduction RL for Robotics Experiments Extensions RAM Task Task Description Navigate to Goal Reaching the goal or falling out ends the episode States : Actions : 3 Step Cost : 0.1

21 Introduction RL for Robotics Experiments Extensions RAM Task System Architecture Camera Terrain Classification Localization RAM-Rmax Action

22 Introduction RL for Robotics Experiments Extensions RAM Task Terrain Classification

23 Introduction RL for Robotics Experiments Extensions RAM Task Terrain Classification

24 Introduction RL for Robotics Experiments Extensions Graph Results Cumulative Reward 5 Average Cumulative Reward 0 Cumulative Reward RAM-Rmax with classification RAM-Rmax without classification FittedQ with classification FittedQ without classification Episode

25 Introduction RL for Robotics Experiments Extensions Graph Results Success Rates In the last ten episodes, RAM-Rmax with the cluster information succeeded reaching the goal 96% of the time. With one cluster, it only reach the goal 34% of the time. Fitted Q-Learning was unable to reach the goal with or without cluster information in 20 episodes.

26 Introduction RL for Robotics Experiments Extensions Graph Results Results RAM-Rmax with classification quickly learned a good policy

27 Introduction RL for Robotics Experiments Extensions Graph Results Results RAM-Rmax with classification quickly learned a good policy RAM-Rmax without classification quickly learned a sub-optimal policy

28 Introduction RL for Robotics Experiments Extensions Graph Results Results RAM-Rmax with classification quickly learned a good policy RAM-Rmax without classification quickly learned a sub-optimal policy Adding information to Fitted Q did not produce a policy

29 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way

30 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments

31 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments Empirically demonstrated that the addition of a single extra cluster can radically improve performance

32 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments Empirically demonstrated that the addition of a single extra cluster can radically improve performance More powerful than the simple addition of an extra feature to function approximation methods

33 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments Empirically demonstrated that the addition of a single extra cluster can radically improve performance More powerful than the simple addition of an extra feature to function approximation methods Further reduced the dependency on hand tuning from the previous work resulting in a more automated system

34 Introduction RL for Robotics Experiments Extensions Current Work Future Work Continuous Domains [Brunskill et al., 2008] Instead of representing the model as a set of discrete statistics, learn a Gaussian Use the continous offset or RAM model with Fitted Value Iteration to solve

35 Introduction RL for Robotics Experiments Extensions Current Work Future Work Feature Selection [Li et al., 2008] Which features are good dynamics indicators? We can learn this This enables us to incorporate additional sensors, either alone or in combination

36 Introduction RL for Robotics Experiments Extensions Current Work Future Work Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., and Roy, N. (2008). CORL: A continuous-state offset-dynamics reinforcement learner. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI-08). Leffler, B. R., Littman, M. L., and Edmunds, T. (2007). Efficient reinforcement learning with relocatable action models. In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), pages , Menlo Park, CA, USA. The AAAI Press. Li, L., Littman, M. L., and Walsh, T. J. (2008). Knows what it knows: A framework for self-aware learning. In Proceedings of the Twenty-Fifth International Conference on Mac hine Learning (ICML-08). Sherstov, A. A. and Stone, P. (July 9 13, 2005). Improving action selection in MDP s via knowledge transfer. In Procceedings of the 20th Conference on Artificial Intelligence (AAAI-05), Pittsburgh, USA.