BEHAVIOR BASED CONTROL AND FUZZY Q-LEARNING FOR AUTONOMOUS MOBILE ROBOT NAVIGATION

Similar documents
Bayesian Inference Driven Behavior Network Architecture for Avoiding Moving Obstacles *

Experiments with Protocols for Service Negotiation

Consumption capability analysis for Micro-blog users based on data mining

Evaluating The Performance Of Refrigerant Flow Distributors

Multi-UAV Task Allocation using Team Theory

Experimental Validation of a Suspension Rig for Analyzing Road-induced Noise

A Two-Echelon Inventory Model for Single-Vender and Multi-Buyer System Through Common Replenishment Epochs

Appendix 6.1 The least-cost theorem and pollution control

Modeling of Suppliers Learning Behaviors in an Electricity Market Environment

Planning of work schedules for toll booth collectors

Coordination in Competitive Environments

Dynamic Mission Control for UAV Swarm via Task Stimulus Approach

1 Basic concepts for quantitative policy analysis

Application of Ant colony Algorithm in Cloud Resource Scheduling Based on Three Constraint Conditions

An Implicit Rating based Product Recommendation System

A Hybrid Intelligent Learning Algorithm in MAS

Extended Abstract for WISE 2005: Workshop on Information Systems and Economics

EHJO/V AOVGE 3.33P/0. A/BttGE 33.33% SCBA

Simulation of the Cooling Circuit with an Electrically Operated Water Pump

Calculation and Prediction of Energy Consumption for Highway Transportation

Evaluating Clustering Methods for Multi-Echelon (r,q) Policy Setting

High impact force attenuation of reinforced concrete systems

A Group Decision Making Method for Determining the Importance of Customer Needs Based on Customer- Oriented Approach

Analysis Online Shopping Behavior of Consumer Using Decision Tree Leiyue Yao 1, a, Jianying Xiong 2,b

Prediction algorithm for users Retweet Times

Identifying Factors that Affect the Downtime of a Production Process

Adaptive Noise Reduction for Engineering Drawings Based on Primitives and Noise Assessment

Incremental online PCA for automatic motion learning of eigen behaviour. Xianhua Jiang and Yuichi Motai*

Construction of Control Chart Based on Six Sigma Initiatives for Regression

The 27th Annual Conference of the Japanese Society for Artificial Intelligence, Shu-Chen Cheng Guan-Yu Chen I-Chun Pan

Interactive Human Intention Reading by Learning Hierarchical Behavior Knowledge Networks for Human-Robot Interaction

RouteMaker Users Guide

ANFIS Based Modeling and Prediction Car Following Behavior in Real Traffic Flow Based on Instantaneous Reaction Delay

Evaluating the statistical power of goodness-of-fit tests for health and medicine survey data

6.4 PASSIVE TRACER DISPERSION OVER A REGULAR ARRAY OF CUBES USING CFD SIMULATIONS

A TABU SEARCH FOR MULTIPLE MULTI-LEVEL REDUNDANCY ALLOCATION PROBLEM IN SERIES-PARALLEL SYSTEMS

Guidelines on Disclosure of CO 2 Emissions from Transportation & Distribution

An Analysis on Stability of Competitive Contractual Strategic Alliance Based on the Modified Lotka-Voterra Model

Customer segmentation, return and risk management: An emprical analysis based on BP neural network

Modeling Social Group Structures in Pedestrian Crowds

TRAFFIC SIGNAL CONTROL FOR REDUCING VEHICLE CARBON DIOXIDE EMISSIONS ON AN URBAN ROAD NETWORK

MODELLING AND SIMULATION OF TEAM EFFECTIVENESS EMERGED FROM MEMBER-TASK INTERACTION. Shengping Dong Bin Hu Jiang Wu

A Multi-Product Reverse Logistics Model for Third Party Logistics

Best-Order Crossover in an Evolutionary Approach to Multi-Mode Resource-Constrained Project Scheduling

WISE 2004 Extended Abstract

Simulation of Steady-State and Dynamic Behaviour of a Plate Heat Exchanger

MALAY ARABIC LETTERS RECOGNITION AND SEARCHING

EVALUATING THE PERFORMANCE OF SUPPLY CHAIN SIMULATIONS WITH TRADEOFFS BETWEEN MULITPLE OBJECTIVES. Pattita Suwanruji S. T. Enns

Method for measuring viscoelastic properties of wood under high temperature and high pressure steam conditions

MULTIPLE FACILITY LOCATION ANALYSIS PROBLEM WITH WEIGHTED EUCLIDEAN DISTANCE. Dileep R. Sule and Anuj A. Davalbhakta Louisiana Tech University

1991), a development of the BLAST program which integrates the building zone energy balance with the system and central plant simulation.

RECEIVING WATER HYDRAULICS ASSIGNMENT 2

Determination of Housing Price in Taipei City Using Fuzzy Adaptive Networks

Supplier selection and evaluation using multicriteria decision analysis

Supplementary Appendix to. Rich Communication, Social Preferences, and Coordinated Resistance against Divide-and-

Floorplanning with IR-drop consideration

An Analytical Model for Atmospheric Distribution. and Transport of Pollutants from Area Source

Optimum Generation Scheduling for Thermal Power Plants using Artificial Neural Network

A New Artificial Fish Swarm Algorithm for Dynamic Optimization Problems

Research on the Process of Runoff and Sediment-production in the Shunjiagou Small Watershed by Applying Automatic Measurement System

LECTURE 9 The Benefits and Challenges of Intercultural Communication

Hierarchized Distributed Control System for a Gas Processing Unit

Elastic Lateral Features of a New Glass Fiber Reinforced Gypsum Wall

A Revised Discrete Particle Swarm Optimization for Cloud Workflow Scheduling

SIMULATION RESULTS ON BUFFER ALLOCATION IN A CONTINUOUS FLOW TRANSFER LINE WITH THREE UNRELIABLE MACHINES

Production Scheduling for Parallel Machines Using Genetic Algorithms

On Advantages of Scheduling using Genetic Fuzzy Systems

Product Innovation Risk Management based on Bayesian Decision Theory

Integration of Rules from a Random Forest

Development and production of an Aggregated SPPI. Final Technical Implementation Report

Study on Productive Process Model Basic Oxygen Furnace Steelmaking Based on RBF Neural Network

A Review of Clustering Algorithm Based On Swarm Intelligence

Evaluation Method for Enterprises EPR Project Risks

Multi-Modular Coordination Control of HTR-PM600 Plant

K vary over their feasible values. This allows

Assignment II Design of Wastewater Discharge from the City of Gothenburg. Design of a Wastewater Discharge from the City of Gothenburg

AISIMAM An Artificial immune system based intelligent multi agent model and its application to a mine detection problem

Content-Based Cross-Domain Recommendations Using Segmented Models

Pricing for Resource Allocation in Cloud Computing

Mathematical models of air-cooled condensers for thermoelectric units

Volume 30, Issue 4. Who likes circus animals?

Evaluation of Quality Management Performance in Office of President using Modified Public Sector Management Quality Award (PMQA) Model

RIGOROUS MODELING OF A HIGH PRESSURE ETHYLENE-VINYL ACETATE (EVA) COPOLYMERIZATION AUTOCLAVE REACTOR. I-Lung Chien, Tze Wei Kan and Bo-Shuo Chen

THE STUDY OF GLOBAL LAND SUITABILITY EVALUATION: A CASE OF POTENTIAL PRODUCTIVITY ESTIMATION FOR WHEAT

Applied Soft Computing

Optimal Issuing Policies for Substitutable Fresh Agricultural Products under Equal Ordering Policy

A Hybrid Neuro-Fuzzy Approach to Intelligent Behavior Acquisition in Complex Multi-Agent Systems

Comparison of robust M estimator, S estimator & MM estimator with Wiener based denoising filter for gray level image denoising with Gaussian noise

The Spatial Equilibrium Monopoly Models of the Steamcoal Market

Emission Reduction Technique from Thermal Power Plant By Load Dispatch

Dynamic Computation Offloading Scheme for Drone-Based Surveillance Systems

Sporlan Valve Company

Qiang Yang and Hong Cheng

Modeling and Simulation for a Fossil Power Plant

A Novel Gravitational Search Algorithm for Combined Economic and Emission Dispatch

Optimization of e-learning Model Using Fuzzy Genetic Algorithm

Churn Analysis of a Product of Application Search in Mobile Platform

Analysis of Emergent Properties in a Hybrid Bio-inspired Architecture for Cognitive Agents

The Credit Risk Assessment Model of Internet Supply Chain Finance: Multi-Criteria Decision-Making Model with the Principle of Variable Weight

Program Phase and Runtime Distribution-Aware Online DVFS for Combined Vdd/Vbb Scaling

Transcription:

572 BEHAVIOR BASE COTROL A FUZZY Q-LEARIG FOR AUTOOMOUS MOBILE ROBOT AVIGATIO Kharul Anam 1,3, Son Kuswad 2,Rusdhanto Effend 3 1 epartment of Electrcal Engneerng, Faculty of Engneerng, Unversty of Jember, Jl. Slamet Ryad 62 Jember emal : kh_anam_sp@elect-eng.ts.ac.d 2 EEPIS-ITS, Kampus Keputh ITS SUkollo emal : sonk@eeps-ts.edu 3 epartment of Electrcal Engneerng, Faculty of Industral Engneerng, ITS ABSTRACT Ths paper presents collaboraton of behavor based control and fuzzy Q-learnng for moble robot navgaton systems. There are many fuzzy Q- learnng algorthms that have been proposed to yeld ndvdual behavor lke obstacle avodance, fnd target and so on. However, for complcated tasks, t s needed to combne all behavors n one control schema usng behavor based control. Based ths fact, ths paper proposes a control schema that ncorporate fuzzy q-learnng n behavor based schema to overcome complcated tasks n navgaton systems of autonomous moble robot. In the proposed schema, there are two behavors whch s learned by fuzzy q-learnng. Other behavors s constructed n desgn step. All behavors are coordnated by herarchcal hybrd coordnaton node. Smulaton results demonstrate that the robot wth proposed schema s able to learn the rght polcy, to avod obstacle and to fnd the target. However, Fuzzy q-learnng faled to gve rght polcy for the robot to avod collson n the corner locaton. Keywords : behavor based control, fuzzy q- learnng 1 ITROUCTIO Moble robot autonomous navgaton system s a one of actve area of robot research. To mplement such a robot system, t s mportant for the system to properly react n an unknown envronment by learnng ts actons through experence. For ths purpose, renforcement learnng methods have been recevng ncreased attenton for use n autonomous robot systems. One method that has been wdely used s Q- learnng. However, snce Q-learnng deals wth dscrete actons and states, an enormous amount of states may be necessary for an autonomous robot to learn an approprate acton n a contnuous envronment. Therefore, Q-learnng can not be drectly used to such a case due to the problems of the curse of dmensonalty. To overcome ths problem, varatons of the Q- learnng algorthm have been developed. fferent authors have proposed to use the generalzaton of statstcal method (hammng dstance,statstcal clusterng)[1], of generalzaton ablty of feedforward eural etworks to store the Q-values[1-3]. Another approach consst n extendng Learnng nto fuzzy envronments [4,5] and was called by fuzzy q-learnng. In ths approach, pror knowledge can be embedded nto the fuzzy rules whch can reduce tranng sgnfcantly. Therefore, ths approach s used n ths paper. Fuzzy Q-learnng (FQL) has been used n varous feld of research, such as robot navgaton[2,3], control system[6], robot soccer[7], game[8], and so on[9]. In moble robot navgaton, FQL has been used to generate tasks for navgaton purposes lke obstacle avodance[10], wall followng[11]. However, most of them was mplemented n sngle task and smple problem. For more complcated problems, t s necessary to desgn a schema control that nvolves more than one FQL to conduct the complcated tasks smultaneously. Ths paper s focused on collaboraton between FQLs and behavor-based control n autonomous moble robot navgaton. The rest of the paper s organzed as follows. Secton 2 descrbes theory and desgn of control schema. Smulaton result s descrbed n secton 3 and concluson s descrbed n secton 4. 2 THEORY A ESIG 2.1 Fuzzy Q-learnng Fuzzy Q-learnng methods may be consdered as an extenson of ts orgnal verson of Q-learnng. Q-learnng [12] s a renforcement learnng method where the learner bulds ncrementally a Q-value functon whch attempts to estmate the dscounted

097 Behavor Based Control And Fuzzy Q-Learnng For Autonomous Moble Robot avgaton - Kharul Anam 573 future rewards for takng acton from gven states. Q-value functon descrbed by followng equaton : Qs ˆ ( t, at) = Qs ( t, at) + α r t+ 1 + γ. V( s t+ 1 ) Qs ( t, at) (1) where r s the scalar renforcement sgnal, α s the learnng rate, γ s a dscount factor. In order to deal wth large contnuous state, generalzaton must be ncorporated n the state representaton. Generalzaton ablty of fuzzy nference system (FIS) can be used to facltate generalzaton n the state space and to generate contnuous acton [10]. Each fuzzy rule R~ s a local representaton over a regon defned n the nput space and t memorzes the parameter vector q assocated wth each of these possble dscrete actons. These Q-values are then used to select actons so as to maxmze the dscounted sum of reward obtaned whle achevng the task. The rules have the form [4]: If x s S then acton = a[,1] wth q[,1] or a[,2] wth q[,2] or a[,3] wth q[,3]... or a[,j] wth q[,j] where the state S are fuzzy labels and x s nput vektor (x 1,., x n ), a[,j] s possble acton and q[,j] s q-values that s correspondng to acton a[,j], and J s number of possble acton. The learnng robot has to fnd the best concluton for each rule.e. the acton wth the best value. In order to explore the set of possble actons and acqure experence through renforcement sgnals, the local acton are selected usng usng an exploraton-explotaton strategy based on the stateacton qualty,.e., q values. Here, the smple ε- greedy method s used for acton selecton: a greedy acton s chosen wth probablty 1-ε, and a random acton s used wth probablty ε. The exploraton 2 probablty s set by ε = where T s the 10 + T number of tral. The exploraton probablty s ntended to control the necessary trade-off between exploraton and control, whch s gradually elmnated after each tral.[10] Let be selected acton n rule usng acton selecton mechanms that was mentoned before and * such as q [,*] = max j J q [, j]. The nfered acton a s : = 1 α ( x) x a (, ) ax ( ) = = 1 α ( x) (2) The actual Q-value of the nfered acton, a, s : 1 ( ) x (, ) (, ) α x q Qxa = = = 1 α ( x) (3) and the value of the states x : 1 ( ) x (, *) (, ) α x q V xa = = = 1 α ( x) (4) If x s a state, a s the acton appled to the system, y the new state and r s the renforcement sgnal, then Q(x,a) can be updated usng equtons (1) and (3). The dfference between the old and the new Q(x,a) can be thought of as an error sgnal, Q= r+ γv( y) Qxa (, ), than can be used to update the acton q-values. By ordnary gradent descent, we obtan : o α ( x) q [, ] = ε x Q = 1 α ( x) (5) Where ε s a learnng rate. To speed up learnng, t s needed to combne Q- learnng and Temporal fference (T(λ)) method[4] and s yelded the elgblty e[,j] of an acton y : α ( x) λγ e [, j] + f j= e [, j] = α ( x) = 1 λγ e [, j] elsewhere (6) Therefore, the updatng equaton (5) become : q [, ] = ε x Qx e [, j]. (7) The algorthm of fuzzy q-learnng as has been expalned before s descrbed below. 1. Observe the state x. 2. for each rule, choose the actual consequence usng e-greedy secelton 3. compute global consequence a(x) and ts correspondng Q-value Q(x,a) 4. Apply the acton a(x). Let y be the new state 5. receve the renforcement r Update q-values.

574 4 th Internatonal Conference Informaton & Communcaton Technology and System 2.2 Behavor Based Control Ths paper consders herarchcal control structure (fg. 1) that showng two layers : hgh level controller and low level controller. HIGH LEVEL COTROLLER Stmulus Behavor 1 Behavor 2 Behavor 3 FQL-behavor 1 FQL-behavor 2 Percepton n n Moble Robot HYBRI COORIATOR n 1 n 2 n n 3 Low level Control Fgure 1. Behavor based Control Schema Hgh level controller s behavor-based layer that conssts of a set of behavors and a coordnator. Ths paper uses hybrd coordnator that was proposed by carreras[13]. The hybrd coordnator takes advantage of compettve and cooperatve approaches. The hybrd coordnator allows the coordnaton of a large number of behavors wthout the need of a complex desgnng phase or tunng phase. The addton of a new behavor only mples the assgnment of ts prorty wth reference to other behavors. The hybrd coordnator uses the prorty and behavor actvaton level to calculate the output of the layer, whch s the desred control acton nput to the low-level control system Therefore, the response of each behavor s composed of the actvaton level and the control acton, as llustrated n Fg. 2[13]. S b Fgure 2. Behavor ormalzaton [5] Before enterng the coordnator, each behavor s normalzed as descrbed n fgure 7. In fgure 7, S s th behavor and r s th result of behavor normalzaton that consst of expected control acton v and actvaton level (degree of behavor), a 0 1. Behavor coordnator uses r behavor responses to compose control acton of entre system. Ths process s executed each samplng tme of hgh level controller. The coordnaton system s composed of set of n nodes. Each node has two nputs and one output. The nputs are domnant nput and non-domnant nput. The response that s connected to domnant r n n 2 nput has hgher prorty than the response that s connected to non-domnant nput. The node output conssts of expected control acton v and actvaton level a. When domnant behavor s fully actvated,.e. a d =1, node output s same as domnant behavor. In ths case, the node behaves lke compettve coordnaton. However, when domnant behavor s partly actvated,.e. 0 < a d < 1, the node output s combnaton of two behavors, domnant behavor and non-domnant behavor. When a d =0, the node output wll behave lke non-domnant behavor. Set of nodes construct a herarchy called Herarchcal Hybrd Coordnaton odes (HHC). S d S nd b d b nd r d r nd n domnan non-domnan r a a d + a nd.(1 - a d ) k k = 1,2,3,.. f (a >1) a = 1 V V d. a d /a + V nd.a nd.(1 - a d ) k / a f ( V >1) V = V / V Fgure 3. Mathematc formulaton of node output [13] The low-level controller s constructed from conventonal control.e. PI controller. The nput s derved from output of hgh-level controller, that s velocty settng that must be accomplshed by motor. Ths controller has responsblty to control speed motor so that the actual speed motor s same or almost same as the velocty settng from hgh-level controller. 2.3 Robot esgn and Envronment model To test our proposed schema, cluttered envronment s created as descrbed n fgure 5. The fgure 5 s. consderd as cluttered envronment because some reasons. The frst, there are many objects wth varous shape and poston. Second, the poston of the target s hded. Ths condton gve a dffculty to robot to fnd the target drectly. The large area of the envronment s 1.6 m x 1.6 m. Fgure 4 descrbe the robot that was used n the testng of proposed schema. The robot has three range fnder sensors, two lght sources and two touch sensors (bumpers). Fgure 4. Robot desgn

097 Behavor Based Control And Fuzzy Q-Learnng For Autonomous Moble Robot avgaton - Kharul Anam 575 Envronment model whch s used n ths paper s showed by fgure 5. Fgure 5. Envronment model for smulaton purpose 2.4 FQL and BBC for robot control Ths paper presents collaboraton between Fuzzy Q-Learnng and Behavor based control. Most of authors have developed fuzzy q-learnng to generate a behavor that s constructed by learnng contnuously to maxmze dscounted future reward. However, most of them only focus on generatng a behavor for smple envronment as showed by eng[10], Mr Jo [11]. For complex envronment, t s necessary to ncorporate FQL n behavor-based schema. Therefore, ths paper proposes behavor based schema that uses hybrd coordnaton node [13] to coordnate some behavors ether from FQl generaton or from behavor that s desgned n desgn step. Proposed schema s adapted from [13] and descrbed n fgure 6. HIGH LEVEL COTROLLER Stmulus Stop Obstacle Avodance -FQL Searchng Target - FQL Wanderng Percepton HYBRI COORIATOR n 1 n n 2 n n 3 n Low level Controlller Fgure 6. Fuzzy Q-learnng n Behavor based Control In fgure, Hgh-level controller conssts of four behavors and one HHC. The four behavors are stop, obstacle avodance-fql, searchng target- FQL, and wandreng. Stop behavor has hghest prorty and wanderng behavor has lowest prorty. Each behavor s developed separately and there s no relaton between behavors. The output of hghlevel controller s speed settng to low level controller and robot headng. The wanderng behavor has task to explore the robot envronment to detect the exstence of target. Actvaton parameter, a tm, s 1 over tme. The output s speed settng that s vary every few seconds. The obstacle avodance-fql behavor s one of behavor that s generated by Fuzzy Q-learnng. Ths behavor has task to avod every object whch s encountered and detected by the rangng fndng sensors. The nput s dstance data between robot and the object from three IR range fnder sensors. Output of the range fnder sensors s nteger value from 0 to 1024. The zero value means that the object s far from the robot. On the contrary, the 1024 value means that the robot has collded the object. The acton set conssts of fve actons: {turnrght, lttle turn-rght, move-forward, lttle turn-left, turn-left}. The renforcement functon s drectly derved from the task defnton, whch s to have a wde clearance to the obstacles. Renforcement sgnal r penalzes the robot whenever t colldes wth or approaches an obstacle. If the robot colldes or the bumper s actve or the dstance more than 1000, t s penalzed by a fxed value,.e. -1. f the dstance between the robot and obstacles s more than a certan threshold, d k = 300, the penalty value s 0. Otherwse, the robot s rewarded by 1. The component of the renforcement that teaches the robot keep away from obstacles s: 1 f collson, d > 1000 s r = 0 f ds > dk 1 otherwse (8) where d s s the shortest dstance provded by any of IR sensor whle performng the acton. The value of actvaton parameter, s proportonal to the dstance between the sensors and the obstacle.. The searchng target behavor has task to fnd and go to target. The goal s to follow a movng lght source, whch s dsplaced manually. The two lght sensors are used to measure the ambent lght on dfferent sdes of the robot. The sensors value s from 0 to 1024.. The acton set conssts of fve actons: {turn-rght, lttle turn-rght, move-forward, lttle turn-left, turn-left}. The robot s rewarded

576 4 th Internatonal Conference Informaton & Communcaton Technology and System when t s faced toward the lght source, and receves punshment n the other cases. 1 f d < 300 s r = 0 f ds < 800 (9) 1 otherwse where d s s the largest value provded by any of lght sensor whle performng the acton. The stop Behavor wll be fully actve when the any of lght sensor value more than 1000. The goal s to stop the robot when t reaches the lght source n certan dstance. 3 SIMULATIO RESULT To test performance of the proposed structure control, eght experments has been conducted. The man goal s the robot has to fnd and get the target wthout any collson wth the object that was encountered and to reach the target n as quck as possble n cluttered envronment fgure 5. From the task defnton, there are three performance ndcators. The Frst s robot ablty to get the target. The second s robot ablty to avod collson wth the obstacle and the last s the tme that was needed by the robot to reach the target. The parameters values that are used n ths paper are : α = 0.0001 ; λ = 0.3 ; γ = 0.9 Fgure 8. Local reward of FQL-obstacle avodance The local reward fgure 8 gves more nformaton about the performance of FQL-obstacle avodance. Robot got many rewards and few penaltes. Fgure 9. Reward accumulaton of FQL-target searchng Fgure 7. Reward accumulaton of FQL-obstacle avodance The performance of FQL-target searchng can be analyzed from fgure 9 and 10. The reward accumulaton tends to go -1. In ths condton, robot was tryng to fnd target and the target was stll outsde scope of the robots. Therefore, n ths step, robot was penalzed by -1. After explorng the envronment, the robot succeed to detect the exstence of the target. Fgure 7 shows the smulaton result for eght trals for reward accumulaton of FQL-obstacle avodance. For all of trals, robot has succeeded to reach the target. But the tme that was spent to reach the target s dfferent. There are one tral that spent more tme than the others. In the tral, the robot have collded more obstacles than the others.

097 Behavor Based Control And Fuzzy Q-Learnng For Autonomous Moble Robot avgaton - Kharul Anam 577 Fgure 9. Local reward of FQL-target searchng Another test that was accomplshed to measure the performance of the FQL s to test the learnng ablty of the robot to get the target from dfferent startng pont. There are three dfferent startng ponts. The result of smulaton s showed by fgure 10. Fgure 11. Robot trajectory from dfferent target poston testng In the frst effort, the robot must get the 1 st target poston. After gettng the target, the target was moved to second poston. Then the target was moved to thrd poston after t got the second poston. Fnally, t got the last poston. The trajectory gves nformaton that the robot was able to track the target poston wherever target s. However, the robot was not able to avod collson wth some walls or obstacle (red crcles). From the fgure 11, the corner postons are the most dffcult poston for the robot to avod t wthout collson. They cause the robot get confuson to decde what acton should be chosen from local dscrete acton that was defned n fuzzy q-learnng. If the robot chooses turn left acton, t wll collde the wall n the left sde. Otherwse, f t choose turn rght acton, t wll collde the wall n the rght sde. Therefore the robot perforce collde the wall. Fgure 10. Robot trajectory from dfferent startng pont testng The trajectory result of fgure 10 gves nformaton that robot was able to reach and get the target although t started from dfferent pont and t was able to avod almost all of obstacles that was encountered. It also gves some ponts that the robot have collded the wall or obstacles (red crcles). Fgure 11 s test of FQL-target searchng. There s only one target but the poston of the target was moved to another place after the robot got the target. Three dfferent target poston s tested and fgure 11 shows the smulaton result. 4 COCLUSIO Ths paper proposes control schema for navgaton system of autonomous moble robot n complcated envronment by ncorporatng the fuzz q-learnng to behavor based control. Two behavors were generated by fuzzy q-learnng by learnng the envronment contnuously. Smulaton results demonstrate that the robot wth proposed schema s able to learn the rght polcy, to avod obstacle and to fnd the target. However, Fuzzy q-learnng faled to gve rght polcy for the robot to avod collson n the corner locaton.

578 4 th Internatonal Conference Informaton & Communcaton Technology and System REFERECE [1]. C. Touzet,"eural Renforcement Learnng for Behavour Synthess", Robotcs and Autonomous Systems, Specal ssue on Learnng Robot: the ew Wave,. Sharkey Guest Edtor, 1997 [2]. Yang, GS, Chen, ER, Wan, C.(2004), Moble Robot avgaton Usng eural Q Learnng, Proceedng of the Thrd Internatonal Conference on Machne learnng and Cybernatcs, Shangha, Cna, Vol. 1,p. 48 52 [3]. Huang, BQ, Cao, GY, Guo, M.(2005),"Renforcement Learnng eural etwork to The Problem Of Autonomous Moble Robot Obstacle Avodance "IEEE Proceedngs of the Fourth Internatonal Conference on Machne Learnng and Cybernetcs, Guangzhou, Vol. 1, p. 85-89 [4]. Jouffe,L,"Fuzzy Inference System Learnng By Renforcement Methods", IEEE Transactons On Systems, Man, And Cybernetcs Part C: Applcatons And Revews, Vol. 28, o. 3, August 1998 [5]. Glorennec, P.Y., Jouffe,L, Fuzzy Q- learnng, Proceedng of the sxth IEEE Internasonal Conference on Fuzzy Sstem, Vol. 2, o. 1, 1997,hal. 659 662 [6]. CharlesW. Anderson1, ouglas C. Httle2, Alon. Katz2, and R. Matt Kretchmar, "Synthess of Renforcement Learnng, eural etworks, and PI Control Appled to a Smulated Heatng Col", Elsever : Artfcal Intellgence n Engneerng, Volume 11, umber 4, October 1997, pp. 421-429(9) [7]. Tomoharu akashma, Masayo Udo, and Hsao Ishbuch, "Implementaton of Fuzzy Q-Learnng for a Soccer Agent", The IEEE Internatonal Conference on Fuzzy Systems, 2003 [8]. Ishbuch, H, akashma, T., Myamoto, H., Ch-Hyon Oh,"Fuzzy QLearnng for a Mult- Player on-cooperatve Repeated Game", Proceedngs of the Sxth IEEE Internatonal Conference on Fuzzy Systems,Volume 3, Issue, 1997 Page:1573-1579 vol.3 [9]. Ho-Sub Seo, So-Joeng Youn, Kyung-Whan Oh, "A Fuzzy Renforcement Functon for the Intellgent Agent to process Vague Goals", 19th Internatonal Conference of the orth Amercan Fuzzy Informaton Processng Socety-AFIPS, 2000, Page(s):29-33 [10]. C. eng, M. J. Er and J. Xu, "ynamc Fuzzy Q-Learnng and Control of Moble Robots", 8th Internatonal Conference on Control, Automaton, Robotcs and Vson, Kunmng, Chna, 6-9th ecember 2004 [11]. Meng Joo Er, Member, IEEE, and Chang eng, "Onlne Tunng of Fuzzy Inference Systems Usng ynamc Fuzzy Q- Learnng", IEEE Transactons On Systems, Man, And Cybernetcs, Vol. 34, o. 3, June 2004 [12]. Watkns C., ayan P.(1992),"Qlearnng,Thechncal ote", Machne Learnng, Vol 8, hal.279-292 [13]. Carreras, M, Yuh, J, Batlle, J, Rdao, P A Behavor-Based Scheme Usng Renforcement Learnng for Autonomous Underwater Vehcles, IEEE Journal Of Oceanc Engneerng, Vol. 30, o. 2, Aprl 2005.