Learning Dynamics Models w/ Terrain Classification
|
|
- Helena Walton
- 5 years ago
- Views:
Transcription
1 Introduction RL for Robotics Experiments Extensions Efficient Learning of Dynamics Models Using Terrain Classification Bethany Leffler Chris Mansley Michael Littman Department of Computer Science Rutgers University International Workshop on Evolutionary and Reinforcement Learning for Autonomous Robot Systems, 2008
2 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Robotic Motivation of Navigation Task
3 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Robotic Motivation of Navigation Task
4 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Robotic Motivation of Navigation Task
5 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Model-Based RL
6 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Model-Based RL Dynamics of the agent are learned
7 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Planning is done with respect to the model Model-Based RL Dynamics of the agent are learned Planning is done with respect to the model
8 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Planning is done with respect to the model Assumes that the dynamics fundamentally do not change Model-Based RL Dynamics of the agent are learned Planning is done with respect to the model
9 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Navigation Traditional Dynamics of the agent are known or learned Planning is done with respect to the model Assumes that the dynamics fundamentally do not change Model-Based RL Dynamics of the agent are learned Planning is done with respect to the model Assumes that the dynamics might be different at every single state
10 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Exploration vs Exploitation
11 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Environment Model Matching
12 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Environment Model Matching
13 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Environment Model Matching
14 Introduction RL for Robotics Experiments Extensions Motivation Navigation Assumptions Additional Assumptions Dynamics Indicator There exists a function that predicts when the dynamics changes This function is typically just a single feature from the state space
15 Introduction RL for Robotics Experiments Extensions RAM Task Relocatable Action Model (RAM) [Sherstov and Stone, 2005] S - States A - Actions R : S R - Reward C - Clusters or Groups of States κ : S C - Clustering Function O - Outcomes t : C A Pr(O) - RAM η : S O S - Next-State Function
16 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space
17 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space Classify States
18 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space Classify States Learn Dynamics
19 Introduction RL for Robotics Experiments Extensions RAM Task RAM-Rmax [Leffler et al., 2007] Overview State Space Classify States Learn Dynamics Generalize
20 Introduction RL for Robotics Experiments Extensions RAM Task Task Description Navigate to Goal Reaching the goal or falling out ends the episode States : Actions : 3 Step Cost : 0.1
21 Introduction RL for Robotics Experiments Extensions RAM Task System Architecture Camera Terrain Classification Localization RAM-Rmax Action
22 Introduction RL for Robotics Experiments Extensions RAM Task Terrain Classification
23 Introduction RL for Robotics Experiments Extensions RAM Task Terrain Classification
24 Introduction RL for Robotics Experiments Extensions Graph Results Cumulative Reward 5 Average Cumulative Reward 0 Cumulative Reward RAM-Rmax with classification RAM-Rmax without classification FittedQ with classification FittedQ without classification Episode
25 Introduction RL for Robotics Experiments Extensions Graph Results Success Rates In the last ten episodes, RAM-Rmax with the cluster information succeeded reaching the goal 96% of the time. With one cluster, it only reach the goal 34% of the time. Fitted Q-Learning was unable to reach the goal with or without cluster information in 20 episodes.
26 Introduction RL for Robotics Experiments Extensions Graph Results Results RAM-Rmax with classification quickly learned a good policy
27 Introduction RL for Robotics Experiments Extensions Graph Results Results RAM-Rmax with classification quickly learned a good policy RAM-Rmax without classification quickly learned a sub-optimal policy
28 Introduction RL for Robotics Experiments Extensions Graph Results Results RAM-Rmax with classification quickly learned a good policy RAM-Rmax without classification quickly learned a sub-optimal policy Adding information to Fitted Q did not produce a policy
29 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way
30 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments
31 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments Empirically demonstrated that the addition of a single extra cluster can radically improve performance
32 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments Empirically demonstrated that the addition of a single extra cluster can radically improve performance More powerful than the simple addition of an extra feature to function approximation methods
33 Introduction RL for Robotics Experiments Extensions Graph Results Conclusions Used a framework that allows us to add prior information in a principled way Showed that this framework reduces exploration in natural environments Empirically demonstrated that the addition of a single extra cluster can radically improve performance More powerful than the simple addition of an extra feature to function approximation methods Further reduced the dependency on hand tuning from the previous work resulting in a more automated system
34 Introduction RL for Robotics Experiments Extensions Current Work Future Work Continuous Domains [Brunskill et al., 2008] Instead of representing the model as a set of discrete statistics, learn a Gaussian Use the continous offset or RAM model with Fitted Value Iteration to solve
35 Introduction RL for Robotics Experiments Extensions Current Work Future Work Feature Selection [Li et al., 2008] Which features are good dynamics indicators? We can learn this This enables us to incorporate additional sensors, either alone or in combination
36 Introduction RL for Robotics Experiments Extensions Current Work Future Work Brunskill, E., Leffler, B. R., Li, L., Littman, M. L., and Roy, N. (2008). CORL: A continuous-state offset-dynamics reinforcement learner. In Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI-08). Leffler, B. R., Littman, M. L., and Edmunds, T. (2007). Efficient reinforcement learning with relocatable action models. In Proceedings of the 22nd Conference on Artificial Intelligence (AAAI-07), pages , Menlo Park, CA, USA. The AAAI Press. Li, L., Littman, M. L., and Walsh, T. J. (2008). Knows what it knows: A framework for self-aware learning. In Proceedings of the Twenty-Fifth International Conference on Mac hine Learning (ICML-08). Sherstov, A. A. and Stone, P. (July 9 13, 2005). Improving action selection in MDP s via knowledge transfer. In Procceedings of the 20th Conference on Artificial Intelligence (AAAI-05), Pittsburgh, USA.