Decentralized Decision Support and Coordination of Autonomous Vehicles Based on Online Data Mining Techniques

Size: px
Start display at page:

Download "Decentralized Decision Support and Coordination of Autonomous Vehicles Based on Online Data Mining Techniques"

Transcription

1 Decentralized Decision Support and Coordination of Autonomous Vehicles Based on Online Data Mining Techniques Maksims Fiosins, Jana Görmer TU Clausthal, Germany Institute for Informatics Business Information Technology Unit Jan Fabian Ehmke TU Braunschweig, Germany Decision Support Group

2 Communic. simulation Traffic simulation 0.1. PLANETS - Introduction Plan and decide in networks of autonomous actors in city traffic Definition of tasks, functions and interfaces of involved layers Illustration by applications of dynamic traffic control Inst. of Transportation and Urban Engineering (Friedrich, TU BS) Dynamic Traffic Management Application scenarios Decision Support Group (Mattfeld, TU BS) Institute of Computer Science (Müller, TUC) (Online) Data Mining Multiagent Networks Application layer Presentation layer Session layer Transportation layer Institute of Communications Technology (Fidler, U H) Car2X Communication Network layer Data link layer Physical layer 2

3 Information model Agent Results Planning data Information model PLANETS SYSTEM ARCHITEKTURE (LOGICAL) TUBS WINFO (Real-Time) Data Mining LSA-Zentrale (übergeordnete Koordinierung der LSA): Berechung von LSA- Rahmenpläne grüne Welle Routenbeeinflussung: Eventorientiertes (Überlastung, Unfall, ) gesteuertes Neuberechnen der Routen gezieltes Umrouten von Verkehrsteilnehmern für verbesserte Verkehrslage LSA-Optimierung: lokale LSA-Optimierung: der lokale LSA LSA-Optimierung: unter Berücksichtigung der lokale LSA d. LSA-Optimierung: unter Rahmenpläne, Berücksichtigung der Pulks lokale LSA d. und Rahmenpläne, Zählungen, LSA-Optimierung: unter Berücksichtigung der Pulks lokale LSA unter Optimierung d. und Rahmenpläne, Zählungen, Berücksichtigung der LSA unter Pulks Berücksichtigung Pulks der d. und Rahmenpläne, Zählungen, und Rahmenpläne, Zählungen, Pulks und Verkehrslage, Pulkbildung: kurz- und langfristige Bildung von Pulks (aufgrund des VM) TUBS IVS VM-Informationen: Bereitstellung aller VM-Informationen (z.b. Restanzeigen, Pulkinformation, Routen, ) Traffic information Scenario data 1st level aggregation Select attributes; Filter data; Add data; Aggregate data; ETL / Preprocessing 2nd levelaggregation Select attributes; Cluster analysis; Data Warehouse 100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Share of Traffic Quality Levels in the Daily Course (average Monday) Efficient information models; Typical number ofvehicles; Data for traffic state calculation; Average number of vehicles; Structure and parameter of daily courses; Group information; Traffic information A B C D E F LUH IKT TUC WINFO Multiagent environment (model of the real world/simulation) Agent communication Categorizing agents vehicles infrastructure Communication results Plans/strategies Forming groups Decision making communication within the platform (Corba) Interface to external application

4 0.2. PLANETS Considered scenarios Target scenario: - Street network (routing) Partial scenarios: - Single intersection - Following intersections Hierarchical approach: - Combination of simple scenarios to the complex one 4

5 Outline 0. PLANETS Project 1. Introduction 1.1. Problem formulation 1.2. State of the art 1.3. Applied methods 2. Modelling 2.1. Traffic participants as autonomous agents 2.2. Agent s information model / Online Data Mining 2.3. Agent local planning 2.4. Agent learning 3. Application example 4. Conclusions and future research 5

6 1.1. Introduction Problem formulation We consider the cooperative behavour of vehicles in city traffic environment Vehicles are autonomous and rational (make decisions according to their goals in their personal interests) City traffic environment is centrally controlled (traffic lights are regulated according to the traffic state) The information about the traffic state is collected from detectors and processed by (online) data mining Communication infrastructure allows for transmitting the information between vehicles (C2C) and infrastructure (C2I) The problem is to improve the system performance ( level of service ) of the controlled area 6

7 1.2. Introduction State of the art Classical methods for the city traffic regulation (TRANSYT, SYNCRO, SCOOT, SCATS etc.) are: - concentrated on the traffic lights regulation - mostly based on the centralized control - do not take the individual properties of vehicles/drivers into account At the present time, the traffic researchers mostly concentrate on the traffic control elements, not vehicles/drivers behavior (Homburger 1996, Bazzan 2009) The study of individual driver behavior are mostly concentrated on mesascopic models for the travel demand planning (Nagel 2009) or adaptive cruise control for autonomous driving (Naranjo 2007, Kolodko 2006) 7

8 1.3. Introduction Applied methods We propose a combination of decentralized behavior of individual participants with centralized control for the system performance improvement The traffic participants are represented as rational autonomous agents The information required for the decision making process is provided by the (online) data mining methods The (cooperative) decision making of traffic participants is based on Markov Decision Processes and the learning is performed by the reinforcement learning methods The learning results are sent back to the traffic lights for future regulation 8

9 2.1. Modelling Traffic participants as autonomous agents The traffic system is a type of social system Each agent plans its decisions in order to maximize its own reward (level 2) In multiagent systems, the agents must cooperate and coordinate their behavior (level 3) In this talk, we concentrate mainly on the level 2 (local planning) 9

10 2.1. Modelling The agent s information model The decision making process is based on agents information model The information model includes - knowledge about system structure, i.e. a digital roadmap (how is the system expected to look like) - the current system state, i.e. the belief and the view of the agent (speed, position, ) - procedures of the knowledge interchange System state is provided by data mining, using huge amounts of centrally collected data Current system state is provided and updated by online data mining - process information online and estimate parameters (traffic state) - compare current system state with expected system state - make forecasting 10

11 2.2. Modelling Online Data Mining (1) Data Mining: Recognition and provision of traffic state Online Data Mining: Online parameterization of information models information gathering Provision of information models sampling Dynamic parametrization of information models? Variances Default measure Information quality information model Data Mining model adaptation 11

12 Tsymbal et al. (2004) 2.2. Modelling Online Data Mining (2) Define and recognize concept shift and concept drift ( change detection ) by analysis of online data streams ( e.g. sliding window) Problem: system noise Concept shift: sudden changes occur e.g. a fair leads to a sudden increase of traffic flows Concept drift: continuous change e.g. new road leads to continous changing of route selection ( changing traffic flows) Forecasting of traffic states based on online data streams 12

13 2.3. Modelling Agent local planning: Markov decision process (1) We model the local decision making of the agents using Markov decision process (MDP) MDP includes: - The state space S = {s 1, s 2,, s n } - The action space A = {a 1, a 2,, a k } - The transition probabilities T(s,a,s ) = P(s t+1 = s s t = s, a t = a) - The reward R(s,a) for performing the action a in the state s The goal is to maximize the total expected reward of the individual agent The agent behavor is defined by a policy (s), which is a (recommended) action for each state s 13

14 2.3. Modelling Agent local planning: Markov decision process (2) If the transition probabilities are known, the policy can be calculated solving the system of equations: ( s) V ( s) arg max max a a s' ( R( s, a) T ( s, a, s') V ( s') T ( s, a, s') V ( s')) If transition probabilities are unknown, the reinforcement learning is usually used s' Action: (speed up) Time t + 1 Action: (speed up) Time t Recom. action (should slow down) State: speed, position, time to green end Recom. action (should speed up) State: speed, position, time to green end policy policy 14

15 2.4. Learning Reinforcement learning The reinforcement learning algorithms can be classified as: - Monte Carlo (MC) methods: learning based on the reward of all the episode - Temporal Difference (TD) methods: step-by-step learning Different reward structures are necessary: TD needs immediate reward, however MC may use the all-episode reward Temporal Difference learning Policy update Policy update Policy update Monte Carlo learning Policy update Policy update Policy update 15

16 2.4. Learning details of algorithms Let Q(s,a) be a state-action value function, which represents the expected reward, starting from the state s and action a; TD learning updates this function as: where Q( s, a) Q( s, a) [ r Q( s', a') Q( s, a)] Q(s,a ) is the next step value function r is current reward and are learning parameters MC learning updates this function as: Q( s, a) avg returns( s, a) where returns(s,a) accumulates the final rewards of the episodes, which include the pair (s,a) 16

17 3. Application example Single regulated intersection Consider a single regulated intersection with one lane in each direction There is a fixed speed limit which cannot be exceeded The traffic light has a fixed green signal time for each direction Times of four next green phases are transmitted to vehicles from the traffic light Idealized knowledge sharing expirience is available to all vehicles The goal is to decrease the fuel consumption by decreasing the number of stops of vehicles This example realized in NetLogo 17

18 3. Application example Agent s information model The state of the vehicle agent is described by the vector <d i, v i, p i >: - d i is the distance to the intersection, - v i is current speed, - p i is a relative position of the vehicle in its green window where [gw i start ; gw i end ] is green phase interval, where the vehicle traps it i is an estimated time at the intersection The possible actions of the vehicle agent: - speed up; slow down; do nothing For the traffic density forecasting the simple AR(2) autoregression model is used: p i gw gw end i end i it gw i start i, dt 1 a1d t a2dt 1 t 18

19 3. Application example Rewards The reward structure for the TD learning: - positive values for p i [1; 0.3]; negative for other - positive values for v i [50; 30]; negative for other The reward structure for the MC learning: - small negative for each time step - big negative for each stop System performance: - fuel consumption (Ferreira 1985): F 0.06D 1.2T del TS where D is total distance T del is delay time of the vehicle TS is a total number of stops 19

20 3. Application example System performance The system performance indicator (smoothed): Green line: TD learning Blue line: MC learning Red line: without learning The fuel consumption (and number of stops) is reduced to above 10% 20

21 3. Application example State-action value functions Q(s,a) v i = 50, p i = 0.5 v i = 50, p i = 1.5 TD learning Red line: slow down Black line: do nothing MC learning 21

22 4. Conclusion and future research The applied approach allows to increase the system performance, taking into account the number of stops ( => reduce the fuel consumption) Other scenarios should be considered: - Following intersections - Road network (routing) Other agents should be included: - Traffic lights, variable speed limits The planning process should be improved (level 3): - Cooperative planning - Group-level planning Online data mining methods should be improved: - Concept shift/concept drift for detecting the model change - Multivariate time series for network state forecasting - Decentralized online clustering 22

23 Decentralized Decision Support and Coordination of Autonomous Vehicles Based on Online Data Mining Techniques Maksims Fiosins, Jana Görmer TU Clausthal, Germany Institute for Informatics Business Information Technology Unit Jan Fabian Ehmke TU Braunschweig, Germany Decision Support Group

24 Appendix 1 details of learning algorithms (to 2.4) The policy (s) can be obtained by maximizing Q(s,a) over all actions: - greedy approach: - - greedy approach: a * ( s) arg maxq( s, a) a arg max Q( s, a) a ( s) a * a with prob.1 a * with prob. A A where A is a number of possible actions 24

25 Appendix 2 Details of traffic state forecasting (to 3.) For the traffic density forecasting the simple AR(2) autoregression model is used: dt 1 a1d t a2dt 1 where d t represents the traffic density at time t The autoregression coefficients are estimated using the Yule-Walker equalities 2 (1 ) a where k is autocorrelation coefficient at lag k Forecasted density d is used in Greenshields equation (variant of the fundamental traffic equation) to update the forecasted speed d v v f (1 ) d j where v f is free speed, d j is jam density a t 25