Interactive Human Intention Reading by Learning Hierarchical Behavior Knowledge Networks for Human-Robot Interaction

Size: px
Start display at page:

Download "Interactive Human Intention Reading by Learning Hierarchical Behavior Knowledge Networks for Human-Robot Interaction"

Transcription

1 Interactve Human Intenton adng by Learnng Herarchcal Behavor Knowledge Networs for Human-Robot Interacton J-Hyeong Han, Seung-Hwan Cho, and Jong-Hwan Km For effcent nteracton between humans and robots, robots should be able to understand the meanng and ntenton of human behavors as well as recognze them. s paper proposes an nteractve human ntenton readng method n whch a robot develops ts own nowledge about the human ntenton for an obect. A robot needs to understand dfferent human behavor structures for dfferent obects. To ths end, ths paper proposes a herarchcal behavor nowledge networ that conssts of behavor nodes and drectonal edges between them. In addton, a human ntenton readng algorthm that ncorporates renforcement learnng s proposed to nteractvely learn the herarchcal behavor nowledge networs based on context nformaton and human feedbac through human behavors. e effectveness of the proposed method s demonstrated through play-based experments between a human and a vrtual teddy bear robot wth two vrtual obects. Experments wth multple partcpants are also conducted. Keywords: Human ntenton readng, Developmental nowledge about human ntenton, Herarchcal behavor nowledge networ, Human robot nteracton, nforcement learnng. Manuscrpt receved Feb. 5, 6; revsed Aug. 9, 6; accepted Aug. 5, 6. s wor was supported by the Natonal search Foundaton of Korea (NRF) grant funded by the Korea government (MSIP) (No. 4RAAA555). J-Hyeong Han (hyeong.han@etr.re.r) s wth the SW & Content search Laboratory, ETRI, Daeeon, p. of Korea. Seung-Hwan Cho (cyclopsh@gmal.com) s wth the UX Center, Samsung Electroncs, Seoul, p. of Korea. Jong-Hwan Km (correspondng author, ohm@rt.ast.ac.r) s wth the School of Electrcal Engneerng, KAIST, Daeeon, p. of Korea. I. Introducton In the near future, humans and robots wll lve and wor together n human homes and offces because of the fast development of robot and artfcal ntellgence technologes. To prepare for ths future, natural and ratonal human robot nteracton (HRI) s needed; therefore, research dealng wth HRI problems has been wdely carred out. s paper focuses on how a robot nfers human ntentons when they wor or play together wth an obect usng context nformaton and human behavors. us, human ntenton n ths paper s defned as the behavor a human wants a robot to do usng an obect. To determne the defned human ntenton from human behavors, a robot should learn about the structures of these behavors. Human behavors are herarchcally structured as opposed to sequentally structured n order to acheve a certan goal. s s a typcal feature of human behavors []. Such herarchcal behavor structures dffer when a human acts usng dfferent obects; therefore, a robot needs to learn the herarchcal behavor structures for dfferent obects to read the human ntenton for an obect. To acheve ths goal, we propose a herarchcal behavor nowledge networ (HBKN) that enables a robot to learn herarchcal behavor structures for dfferent obects. e proposed HBKN represents behavor herarches usng drectonal edges between behavor nodes and addng transton probabltes to the drectonal edges. In addton, a human ntenton readng algorthm that ncorporates passve renforcement learnng based on adaptve dynamc programmng s proposed to learn the HBKN and nfer human ETRI Journal, Volume 38, Number 6, December 6 6 J-Hyeong Han et al. 9

2 ntenton. Observed human arm behavors are used as nputs to the proposed algorthm; therefore, we also develop a human arm behavor recognton procedure. e most popular method for human behavor recognton uses computer vson methods [], [3], thus we also use a robot vson system to recognze human behavor. e HRI experments, whch consst of play wth two vrtual obects, that s, a ball and toy car, were carred out usng a teddy bear robot smulator to demonstrate the effectveness of the proposed method. In addton, experments wth multple human partcpants who were not nvolved n developng the system were performed to determne whether the proposed method would wor well wth the general publc. s paper s organzed as follows. Secton II explans the related wor for research bacground. Secton III brefly descrbes the overall data flow and Secton IV presents the detals of the core parts, that s, HBKN and the human ntenton readng algorthm. In Secton V, the expermental envronment s presented and expermental results obtaned for three dfferent scenaros wth multple partcpants are dscussed. Fnally, the concludng remars follow n Secton VI. II. lated Wor ere have been several studes to nfer human ntentons or goals. ese studes can be classfed accordng to ther approaches as follows. Smulaton theory s one of the domnant human mnd readng theores, whch suggests that humans use ther own mental mechansms to predct the mental processes of others, le smulaton does [4]. us, there have been several studes n whch a robot nfers a human ntenton n the same manner as a human by applyng smulaton theory. Gray and others presented acton parsng and goal nference algorthms by consderng the robot tself as a smulator [5], and Breazeal and others developed robot emboded cognton ncludng mndreadng slls [6]. Jansen and others developed a computaton model to mtate an artfcal agent that nfers the ntenton of a demonstrator [7]. Hatt and others presented an effectve method to deal wth human varablty durng HRI by employng the theory of mnd and ACT-R [8]. Buldng probablstc models, ncludng Bayesan nference models, of human ntenton s one of the most popular ways to nfer human ntenton. Schrempf and others developed a general model for user ntenton estmaton usng hybrd dynamc Bayesan networs [9] and proposed a method n whch a robot selects a tas based on the estmated user ntenton []. Kelley and others developed an ntenton recognton framewor based on Bayesan reasonng usng the context nformaton of an obect []. Wang and others developed an ntenton-drven dynamcs model that s a probablstc model for the process of ntentonal movements []. e artfcal neural networs approach, whch mmcs the human bran, s another promsng way to nfer human ntenton. Bcho and others devsed a control archtecture based on a close percepton-acton lnage usng dynamc neural felds [3] and a dynamc neural feld-based model [4]. Learnng a model usng machne learnng methodology based on data labeled by human ntenton s an ntutve and effectve approach to human ntenton readng. Strabala and others presented a ey dscrmnatve feature learnng method usng SVM methodology for predctng the human ntenton to hand over an obect [5]. In addton to tradtonal machne learnng methods, there have been ntal trals to apply deep learnng to the human ntenton readng problem. Kelley and others proposed a deep learnng archtecture for predctng human ntenton wth respect to obects and presented ts ntal results [6]. Yu and Lee developed a deep dynamc neural model for human ntent recognton based on human motons [7]. III. Overall Data Flow for Interactve Human Intenton adng Fgure shows the overall data flow of the proposed method. e obect nformaton and human arm behavor nformaton are sensed by a sensng module such as an RGB-D camera. e percepts are then passed to the attenton module. A robot needs to focus ts attenton on a certan obect and/or human n the envronment n order to nteract wth them because there can be several obects and/or humans. Hence, the nterest factor (IF) for each perceved obect, whch ncludes humans, s defned n the attenton module as follows: when O s movng, IF O () ()/ t e when O s not movng, where O s the -th perceved obect (ether an obect or a human), and τ s a tme constant. Furthermore, (t) s ntally zero, ncrements by one at each tme step when O s not movng, and returns to zero when O moves. When the obect or human moves, the correspondng IF mmedately becomes Envronment/human Vson system Sensng Actuator Robot smulator Percepton Attenton Human arm behavor recognton Behavor selecton Robot arm behavor set Human ntenton readng Fg.. Overall data flow. Herarchcal behavor nowledge networ (On) Herarchcal behavor nowledge networ (O) Herarchcal behavor nowledge networ (O) B B B B B B B B 3 3 J-Hyeong Han et al. ETRI Journal, Volume 38, Number 6, December 6

3 one. e robot gves ts attenton to the obect or human that has the hghest IF. When the obect or human stops, IF s decreased by τ. e perceved human arm behavor nformaton and nformaton about the obect that s the focus of the robot s attenton are transferred to the human arm behavor recognton module and human ntenton readng module, respectvely. e human arm behavor recognton module recognzes the observed arm behavor usng the robot s arm behavor set and a dynamc tme warpng (DTW) algorthm. e robot behavor wth the mnmum fnal DTW dstance (FDTW), that s, the most smlar one, s recognzed as the human behavor [8]. e robot arm behavor set module provdes the possble robot arm behavors, such as wavng both hands (), wavng one hand (), pontng (), touchng (), pushng (Pu), graspng (Gr), releasng (), and throwng (). e recognton result s transferred to the human ntenton readng module. In the human ntenton readng module, a robot smultaneously develops the HBKN of the obect and nfers the human ntenton for the obect usng the proposed human ntenton readng algorthm. e detals of the HBKN and human ntenton readng algorthm are explaned n Secton IV. e human ntenton s dentfed by the proposed algorthm, and then the behavor selecton module selects the best behavor consderng the nferred ntenton and t s performed usng the actuator module. IV. HBKN and Human Intenton adng Algorthm In ths secton, the HBKN and human ntenton readng algorthm are descrbed n more detal. e robot develops ts own nowledge by learnng the HBKNs of obects through nteracton wth a human.. HBKN Because dfferent obects have dfferent herarchcal behavor structures for the achevement of a goal and the human ntenton for dfferent obects mght be dfferent, each obect needs to have ts own developed HBKN. A robot can then nfer the human ntenton for an obect by usng an HBKN, whch s defned as follows. Defnton : An HBKN conssts of behavor nodes and drectonal edges to represent the herarchcal relatons between behavor nodes. () n(perceved obects) = n(hbkn). () An HBKN conssts of several classes (Class ) and each class conssts of behavor nodes B, whch denotes the -th behavor n the -th class. HBKN. U B. () Each behavor node has utlty Class + Class Class Class 3 Class Class B B Fg.. General descrpton of an HBKN and an example of an ntal HBKN, where B denotes the -th behavor n Class and the dfferent dashed edges represent dfferent transton probabltes. B P B q n B m B B B B (v) Each drectonal edge between behavor nodes has. l HBKN TF B, B and transton transton frequency probablty. l HBKN TP B, B. Fgure shows the general HBKN structure and an example of an ntal HBKN. As shown n Fg., HBKN conssts of several classes, each class conssts of several behavor nodes, and there are drectonal edges wth transton probabltes between the behavor nodes. In Fg., Class, Class, and Class 3 respectvely comprse behavors such as,, and that can be done wthout an obect; behavors such as Gr, Pu, and that can be done wth an obect; and behavors such as and that can be done after the robot obtans an obect. After the behavors n Class 3 are complete, the robot does not hold the obect anymore.. Human Intenton adng Algorthm e proposed human ntenton readng algorthm ncorporates passve renforcement learnng wth adaptve dynamc programmng so that the robot nfers human ntentons by learnng HBKNs nteractvely through human behavors and feedbac. To apply the renforcement learnng scheme, the behavor nodes n HBKN are treated as states and each behavor node has a utlty and reward. e utltes HBKN. U B and and rewards of behavor nodes. HBKN R B are ntally zero. In addton, the drectonal edges n the HBKN are treated as the transton model, and each edge has a transton probablty that s calculated from the transton frequences. Transton frequency ETRI Journal, Volume 38, Number 6, December 6 J-Hyeong Han et al. 3

4 l HBKN. TF B, B s the number of transtons from B to B. e ntal transton frequency value s one f an edge l exsts between behavor nodes, and zero otherwse. Usng the transton frequences, the transton probablty s calculated as l l HBKNTP. B, B HBKNTF. B, B / HBKNTF. ( B, B). In Fg., the dfferent styles of dashed edges between behavor nodes represent the dfferent transton probabltes. Fgure 3 shows the flow charts of the proposed human ntenton readng algorthm. e pseudo code of each functon n the man algorthm s provded respectvely n Algorthms 5. e man algorthm can be dvded nto two parts: when the robot smultaneously learns HBKN for an obect and nfers the human ntenton (Fg. 3) and when the robot nfers the human ntenton usng the learned HBKNs (Fg. 3). e smultaneous HBKN learnng and human ntenton readng algorthm (Fg. 3) starts when the robot focuses ts attenton on a perceved obect O. Because there s only one perceved obect, the obect s automatcally dentfed as a human ntenton obect (HIO). e algorthm then checs the HBKNs. If a prevously learned HBKN(HIO) exsts, t loads HBKN(HIO); otherwse, t creates the ntal HBKN(HIO). e algorthm runs repeatedly untl the human ntenton (HI) s nferred. Frst, t perceves a human acton (HA) such as a human arm behavor or feedbac. If the HA s a recognzed human behavor or postve feedbac for the robot s behavor, t updates the transton probablty for each edge and then updates the utltes of the behavor nodes. Otherwse, t ust updates the utltes of the behavor nodes. e utltes of all the behavor nodes are updated n the same manner usng the Bellman equaton [9]: ( ) ( ), ' ( ' ) ' B UB RB TP BB UB, () where U(B) s the utlty of behavor node B, R(B) s the reward for B, TP(B, B')s the transton probablty from B to B', and η s the dscount factor. (e pseudo code for updatng the transton probablty and behavor node utlty are provded respectvely n Algorthms and.) It then searches for a canddate behavor nowledge ln (CBKL) and the current status (curstatus) n the updated HBKN(HIO). (e pseudo code for searchng for the HBKN s provded n Algorthms 3 and 4.) e nformaton of curstatus ndcates that the robot s n one of the three states: ) t nfers the human ntenton, ) t needs to perform tral behavors to obtan human feedbac, or ) t s confused. In the frst state, the behavor node wth the maxmum utlty of all the behavor nodes n HBKN(HIO) has been nferred as the HI. As the robot has nferred the human ntenton, the algorthm s termnated. e second state ndcates that there s more than one possble next behavor Fg. 3. Get the obect O HIO, HI, CBKL NULL If O = HIO O If HBKN(HIO) else HBKN Load Create ntal HBKN(HIO) HBKN(HIO) else Get HA If HA = B HA= sfeed ( B ) HBKN(HIO) [Algorthm ] UPDATE_TRANSITION_PROBABILITY(HBKN(HIO), B ) HBKN(HIO) [Algorthm ] UPDATE_BEHAVIORNODE_UTILITY(HBKN(HIO), HA) (CBKL, curstatus) [Algorthm 3, 4] SEARCH_HBKN(HBKN(HIO), HA) If curstatus = TB(NB) If curstatus = Confused If curstatus = Human_Inten Performng tral behavor for NB Increasng UL HI arg max B (HBKN(HIO).U(B)) Get the obect O HIO, HI, CBKL NULL If O Load HBKN ( O) Get HA (ntento, curstatus) [Algorthm 5] INFER_INTENT_FROM_HBKN (HBKN( O), HA) If curstatus = Confused Increasng UL If (ntento NULL) & curstatus = Human_Intent HIO ntento HI arg max B (HBKN(HIO).U(B)) Man human ntenton readng algorthms: smultaneous HBKN learnng and human ntenton readng algorthm and human ntenton readng algorthm wth learned HBKNs. (NB); therefore, the robot needs to perform a tral behavor for each possble NB and must wat for postve or negatve human feedbac. e receved human feedbac s consdered as an HA and the algorthm s repeated untl t meets the termnaton condton, that s, t has nferred the human ntenton. In the last state, the robot s confused; therefore, t has not obtaned suffcent nformaton from the current human behavor or feedbac to nfer the human ntenton. en, uncertanty level (UL) ncreases. e UL maes the robot as for more nformaton and s defned as follows: 3 J-Hyeong Han et al. ETRI Journal, Volume 38, Number 6, December 6

5 c t UL when UL, otherwse, ( ) when a robot s confused, where c s a postve constant and (t) s ntally zero, ncreases by one at each tme step when the robot s confused, and returns to zero when the confuson s resolved. erefore, when the robot s confused, the UL ncreases proportonally to the square of the tme step. When UL exceeds the predefned threshold value U, the robot ass for more nformaton from the human. After recevng new nformaton, such as a human behavor or feedbac, UL s reset to zero and the algorthm s repeated untl t meets the termnaton condton. e human ntenton readng algorthm wth learned HBKNs (Fg. 3) starts when the robot focuses ts attenton on perceved obects (O = {O, O,..., O n }). Because the robot has already learned the HBKNs of the perceved obects, the algorthm loads these HBKNs and receves the HA. e algorthm runs repeatedly untl HI s nferred wth the dentfed HIO. (e pseudo code for nferrng HIO and HI from the HBKN s provded n Algorthm 5.) If HIO s nferred, then the behavor node that has the maxmum utlty of all the behavor nodes n HBKN(HIO) s nferred as HI, and the algorthm s termnated. If the functon stll returns that the robot s confused, then UL ncreases and the algorthm repeats untl t meets the termnaton condton. Functon UPDATE_TRANSITION_PROBABILITY: Algorthm shows the pseudo code of ths functon. It frst updates the transton frequences of all the edges HBKN.TF based on the actvated behavor node B and then updates the transton probabltes of all edges HBKN.TP. e functon dentfes the lower class (Class ) behavor nodes maxubs that have the maxmum utlty n ther class. ere are two cases based on the number of maxubs. In the frst case, there s only one maxub, and ths case has two subcases. If only one lower class behavor node B that has TF to B already exsts, B needs to be performed before B. us, the TFs from maxub to B and from B to B are ncreased by one. Otherwse, the TFs from maxub to B are ncreased by one. In the other case, there s more than one maxub. In ths case, the functon dentfes the behavor nodes maxtfbs that have the maxmum TF to B of maxubs and TFs from all the maxtfbs to B are ncreased by one. Fnally, the functon updates the transton probabltes of all the edges, HBKNTPBB. (, ) by dvdng HBKN. TF( B, B ) by the sum of the TFs from B to all the connected behavor nodes. After updatng the transton probabltes, the functon returns the updated HBKN. (3) Algorthm. Transton probablty update functon Input: networ (HBKN), actvated behavor ( B ) Output: updated transton probablty of HBKN (HBKN.TP) functon UPDATE_TRANSITION_PROBABILITY(HBKN, B ) maxub arg max (HBKN.U ( B B )) f n(maxub) = then f (n ({ B. (, ) }) ) & ( HBKN TE B B B maxub) then HBKN.TF(maxUB, ) B ++, HBKN.TF ( B, B ) ++ else HBKN.TF(maxUB, B ) ++ end f else f n(maxub) > then maxtfb arg max maxub (HBKN.TF( maxub, B )) (HBKN.TF( maxtfb, B ) ++ end f B, B, HBKN.TP(B, B ) HBKN.TF(B, B )/ B B HBKN.TF(B, B ) return HBKN end functon Algorthm. Behavor node utlty update functon Input: networ (HBKN), current human behavor or feedbac (HA) Output: updated utlty of HBKN (HBKN.U) functon UPDATE_BEHAVIORNODE_UTILITY(HBKN, HA) f (HA = B ) (HA = sfeed ( B )) then HBKN.R ( B ) else f HA = NegFeed ( B ) then HBKN.R ( B ) else f HA = RghtFeed ( B ) then HBKN.R ( B ) end f B, HBKN.U(B) HBKN.R(B) + B (HBKN.TP(B, B ) HBKN.U(B )) return HBKN end functon Functon UPDATE_BEHAVIORNODE_UTILITY: Algorthm shows the pseudo code of ths functon, whch updates the utltes of HBKN.U(B), whch s all the behavor nodes n HBKN, based on the receved human behavor or feedbac HA. e reward for each behavor node, HBKN. R( B ), s determned based on HA. If HA s recognzed as a behavor or postve feedbac of behavor nodes B, then HBKN. R( B ) gets. If HA s a negatve feedbac for B, then HBKN. R( B ) gets. If HA s the rght ntent feedbac for B, then HBKN. R( B ) gets. Note that,, and have to satsfy the nequalty condtons, > e functon then updates the utltes of all the behavor nodes usng (). Functon SEARCH_HBKN: Algorthm 3 shows the pseudo code of ths functon, whch searches for the canddate ETRI Journal, Volume 38, Number 6, December 6 J-Hyeong Han et al. 33

6 Algorthm 3. HBKN search functon Input: networ (HBKN), canddate behavor nowledge ln (CBKL), current human behavor or feedbac (HA) Output: updated CBKL, current state of human ntent readng process (nfer the ntent correctly, confused, or tral behavors) functon SEARCH_HBKN (HBKN, prevcbkl, HA) CBKL FIND_CBKL(HBKN) f HA = RghtFeed ( B ) then return CBKL, Human_Intent else f (HA = B ) ( HA = sfeed ) then f prevcbkl = CBKL then return CBKL, Confused else pnb {B HBKN.TP ( B, B), B} NB arg max pnb (HBKN.U(pNB) HBKN.TP ( B, pnb)) f NB = NULL then return CBKL, Confused else f n(nb) = then SEARCH_HBKN(HBKN, CBKL, NB) else f n(nb) > then return CBKL, TB(NB) end f end f end f end functon Algorthm 4. CBKL fndng functon Input: networ (HBKN) Output: upload canddate behavor nowledge ln (CBKL) functon FIND_CBKL(HBKN),, CBKL NULL whle CBKL[] = NULL do f n({frst B arg max (HBKN.U ( B ))}) B = then CBKL[] frst B end f + end whle whle n ({next B arg max B (HBKN.TP(CBKL[], B))}) = do +, CBKL[] next B end whle return CBKL end functon Algorthm 5. Inferrng ntenton from HBKN functon Input: HBKN( O), current human behavor or feedbac (HA) Output: result of nferrng human ntended obect, current state of process (nfer the ntent correctly or confused) functon INFER_INTENT_FROM_HBKN (HBKN( O), HA) obectno n(o), obcbkl( O) NULL for = obectno do obcbkl(o ) FIND_CBKL(HBKN(O )) end for f (HA obcbkl(o )) && (HA obcbkl(o, )) then return O, Human_Intent else return NULL, Confused end f end functon behavor nowledge ln CBKL and current status from the updated HBKN. Frst, t calls the CBKL search functon and obtans the updated CBKL for the current updated HBKN. e detals of CBKL search are explaned n functon FIND_CBKL along wth Algorthm 4. If HA s RghtFeed( B ), that s, the rght ntent feedbac for B, then the functon returns the current status as Human_Intent because the robot has correctly nferred the human ntenton. In contrast, f HA s a recognzed human behavor B or sfeed( B ), the postve feedbac for B, then the functon starts to search the HBKN. If the returned CBKL from functon FIND_CBKL and the prevous CBKL are the same, the canddate behavor nowledge ln has not been updated. In ths case, the functon returns the current status as Confused because the robot needs more nformaton. Otherwse, t dentfes the next behavor nodes NB that have the maxmum value of the product of utlty HBKN.U(possNB) and HBKN. TP( B, possnb), whch s the transton probablty from B to the possnbs that have nonzero transton probabltes. ere are three cases based on the number of NBs. In the frst case, there s no NB. s means that there are no more possble next behavor nodes n the HBKN and the human has not gven the rght ntent feedbac; therefore, the functon returns the current status Confused. In the second case, there s only one NB; therefore, the NB becomes the next behavor node and the functon calls tself recursvely. In the last case, there s more than one NB. In ths case, the robot needs to now whch one s the human ntended next behavor; therefore, the robot performs tral behavor for all the NBs to get human feedbac. Functon FIND_CBKL: Algorthm 4 shows the pseudo code of the functon, whch fnds the CBKL, that s, the lst of behavor nodes n order of herarchy based on the current learned HBKN. Frst, the functon searches the HBKN from the lowest to the hghest class untl there s only one behavor node that has the maxmum utlty n ts class. If t fnds a behavor node that satsfes the above condton, t stops searchng and saves that node as the frst element of CBKL. After the frst element s found, the CBKL s sequentally flled wth the behavor nodes that have the maxmum transton probablty from the last element of the CBKL. s repeats untl there s ether no such behavor node or more than one such behavor node, and the functon returns the updated CBKL. Functon INFER_INTENT_FROM_HBKN: Algorthm 5 shows the pseudo code of ths functon, whch nfers the HIO and HI for the obect from the learned HBKNs when the robot s attenton s on more than one obect. e functon obtans the canddate behavor herarches for all obects (obcbkl) by callng the CBKL search functon. If HA s n the obcbkl for only one obect, then the obect s nferred as 34 J-Hyeong Han et al. ETRI Journal, Volume 38, Number 6, December 6

7 () (c) (l) (d) (m) (e) (n) (f) (o) (g) (p). Expermental Envronment Pu Pu Gr.33.4 In ths experment, frst a human and a robot played together wth a ball. Fgure 4 shows ey snapshots of the expermental results, and Fgure 5 shows the ncrementally developed HBKN of the ball along wth the updated transton probabltes between behavor nodes. Fgure 6 shows the updated utlty values of the behavor nodes for the HBKN of the ball, and Fgure 7 shows the UL and IF of the robot durng the experment. Frst, a vrtual ball appeared n front of the vrtual teddy bear robot (Fg. 4) and the IF of the ball ncreased, whch caused the robot to pay attenton to the ball. Because t was the frst tme the robot had perceved the ball, the robot created a new ntal HBKN of the ball (Fg. 5), and the utlty values of all behavor nodes were zero (Fg. 6). ETRI Journal, Volume 38, Number 6, December 6 Hgh herarchy Pu Gr Low herarchy Gr Pu Pu Gr.5.4. Pu Gr Low herarchy Hgh herarchy HBKN.TP Gr Human Intenton adng sults A. Experment Human and Robot Playng wth a Ball or a Toy car (r). ree experments were performed to show the effectveness of the proposed method. In experments, the robot developmentally learned the HBKNs of a ball or toy car, respectvely, through nteracton wth a human. In experment, the robot was n a confusng stuaton wth both obects, and solved t usng the HBKNs learned n the prevous experments. (q) () Fg. 4. Key snapshots of experment. In the experments, a vson system was employed for percevng the human arm behavors and feedbac. e start and end of human arm behavors and human feedbac were defned by color patches to smplfy the expermental setup. e parameters were set to τ = 5, c = /55, U =.7, η =.9, =, β =, and = 5, n (), (), (3), and the behavor node utlty update functon. (h) HBKN.TP To show the effectveness of the proposed method, experments were carred out wth a human and a vrtual teddy bear robot, who played wth two vrtual obects, a ball and toy car. In addton, the same experments were carred out wth multple partcpants who were nether nvolved n developng the proposed method nor gven any pror nowledge about the expermental setup. () Gr Pu Pu Gr HBKN.TP V. Experments Hgh herarchy the HIO, and the functon returns the HIO and current status as Human_Intent. Otherwse, f there s an HA for multple obects or no obects n obcbkls, the functon returns no obect and the current status Confused because the robot needs more nformaton. Pu Gr Low herarchy (c) Fg. 5. HBKN(ball). TP and the developed HBKN structure for the ball at each step of experment. e gray cells represent the transton probabltes from lower level behavor nodes to hgher level behavor nodes. Transtons for Fgs. 4 (d), Fgs. 4(e) (n), and (c) Fgs. 4 (o) (r). e human partcpant performed behavor to show her ntenton, that s, gve me the ball (Fg. 4), and the IF of the human ncreased. e IF of the human changed contnually accordng to her behavors, as shown n Fg. 7. e robot recognzed the human arm behavor correctly (Fg. 4(c)) because the FDTW value of was the lowest. e reward of node was HBKN.R() =, and the robot learned HBKN J-Hyeong Han et al. 35

8 HBKN.U Pu Gr (c) (e) (f) () (l) (o) (r) Fg. 6. Utlty values of behavor nodes HBKN(ball). U at each step of experment. as HBKN.U() = (Fg. 6(c)). ere were three possble next behavor nodes from node, and they had the same HBKN.TP and HBKN.U. erefore, the robot performed tral behavors for the, Pu, and Gr nodes to obtan human feedbac (Fgs. 4(d) ()). Because she gave postve feedbac for the tral behavor of and negatve feedbac for the other tral behavors, the reward of each behavor node was HBKN.R() = and HBKN.R(Pu) = HBKN.R(Gr) =. e robot learned HBKN by updatng the utltes and transton probabltes usng the proposed algorthm (Fg. 5 and Fgs. 6(e), (g), and ()). Note that HBKN.TP(, ) was updated as.5 because node had the bggest utlty of the behavor nodes n the class below. Because HBKN.TP(, ) became.5, HBKN.TP(, Pu) and HBKN.TP(, Gr) were updated to.5. Because there were no more possble next behavor nodes from node, UL was ncreased (Fg. 7) and the robot ased for more nformaton when t exceeded U (Fg. 4()). e human performed behavor to show the same ntenton (Fg. 4()) and the robot recognzed t correctly agan (Fg. 4(l)). e reward of node was then HBKN.R() = and the utlty values of the HBKN were updated (Fg. 6(l)). Because the CBKL was not updated from the prevous one, UL stll remaned one and the robot ased for more nformaton agan (Fg. 4(m)). e human performed behavor to show her ntenton, that s, throw the ball to me (Fg. 4(n)), and the robot recognzed t correctly (Fg. 4(o)). e reward of node was HBKN.R() =, and the robot learned the HBKN (Fg. 5(c) and Fg. 6(o)). Note that a new transton HBKN.TP(, Gr) = was created and HBKN.TP(Gr, ) was updated to.67 because the current behavor node had only one prevously connected lower class behavor node, that s, Gr, and had the maxmum utlty of all the behavor nodes n the class below. Because HBKN.TP(Gr, ) became.67, the human ntenton as throwng the ball to her and wated for her feedbac for confrmaton (Fgs. 4(p) and (q)). She gave the rght ntent feedbac (Fg. 4(r)). e reward for was then Uncertanty level Interest factor..5 () () (q) (r) () () (m) (n) (r) Tme UL Human IF Ball IF Fg. 7. Uncertanty level and nterest factor of experment. e letters ndcate the subfgures of Fg. 4. Hgh herarchy Gr Pu Pu Gr Low herarchy Fg. 8. HBKN(toycar). TP and the developed fnal HBKN for the toy car. e gray cells represent the transton probabltes from lower level behavor nodes to hgher level behavor nodes. Pu Gr HBKN.R() = 5, and the robot learned the fnal HBKN of the ball (Fgs. 5(c) and 6(r)). Note that a human and a robot also played together wth a toy car n the same manner as a ball. Fgure 8 shows the fnal HBKN of the toy car along wth the updated transton probabltes between behavor nodes. As shown n Fg. 8, HBKN of the toy car had dfferent behavor herarches compared to HBKN of the ball. B. Experment Human and Robot Playng wth a Ball and Toy Car In ths experment, a human and a robot played together wth both the ball and toy car. Fgure 9 shows the ey snapshots of the experment. Fgure shows the UL and IF of the robot durng the experment. Frst, both the ball and the toy car appeared n front of the robot (Fg. 9). As the robot had already nteracted wth both of them, t dd not need to create new HBKNs, so t loaded the prevously developed HBKNs of the ball and the toy car. Because the IFs of both obects ncreased (Fg. ), the robot was confused regardng whch obect t should pay attenton to, and t needed more nformaton. erefore, UL ncreased HBKN.TP 36 J-Hyeong Han et al. ETRI Journal, Volume 38, Number 6, December 6

9 Uncertanty level Interest factor..8.6 (c) (d) (e) (c ) (d ) (e ) Fg. 9. Key snapshots of experment (c) (e). UL (c ) (e ) Human IF Ball IF Ball IF.4 (c) (e). (c ) (e ) Tme Fg.. Uncertanty level and nterest factor of experment. e letters ndcate the subfgures of Fg. 9. (Fg. ) and the robot ased for more nformaton when t exceeded U (Fg. 9). e robot tred to nfer the HIO and human ntenton usng the proposed algorthm. e CBKL of each obect from HBKNs was {,, Gr, } for the ball and {, Pu} for the toy car. ere were two cases accordng to the human feedbac. In the frst case, the human partcpant performed behavor, the robot recognzed t correctly (Fgs. 9(c) and (d)), and UL decreased because the robot obtaned the new nformaton (Fg. ). Because behavor exsted only n the ball CBKL, the robot nferred that the HIO was the ball rather than the toy car and that the HI was throwng the ball (Fg. 9(e)). In the other case, the human partcpant performed behavor Pu, the robot recognzed t correctly (Fg. 9(c')(d')), and UL decreased because the robot obtaned the new nformaton (Fg. ). Because behavor Pu exsted only n the toy car CBKL, the robot nferred that the HIO was the toy car rather than the ball and that the human ntenton was pushng the toy car (Fg. 9(e')). 3. Human Intenton adng sults for Multple Partcpants Experments wth multple human partcpants were conducted to determne whether the proposed algorthm would wor well wth people who were not nvolved n developng the system. Each partcpant played wth the robot usng the two obects, a ball and toy car, and we ased the partcpants to thn of dfferent ntentons for them before the experments were conducted. e robot nferred each partcpant s ntenton by learnng the partcpant s own HBKNs of the ball and the toy car usng the proposed algorthm. Table shows the human ntenton readng results for multple partcpants. Tables and show the partcpants ntentons, ther behavor sequences for nteracton wth the robot n the experments, the fnal utltes of the behavor nodes from the learned HBKNs, and the ntenton readng results when playng wth the ball and the toy car, respectvely. As shown n the tables, the robot nferred the partcpants ntentons correctly n all cases except one. e ncorrect case occurred when partcpant performed a behavor (Pu) that dffered from hs ntenton (to grasp the toy car). In ths case, the robot had learned the partcpant s personalzed HBKN for the toy car based on the recognzed Pu behavor. Usng the learned HBKN, the robot nferred the human ntenton as pushng the toy car because the partcpant gave t ncorrect nput. After the robot pushed the toy car, the obect left the robot, and the experment ended for the frst tral. e partcpant realzed why the robot dd not nfer hs ntenton correctly and the robot nferred hs ntenton correctly n the second tral. After each partcpant played wth the robot usng both obects, the thrd experment (the confusng stuaton wth two obects) was conducted. At ths pont, the robot had learned each partcpant s HBKNs for the ball and toy car from the prevous experments. Table (c) shows each partcpant s ntenton, nteracton behavor sequence, and ntenton readng result. By usng the proposed algorthm and learned HBKNs, the robot nferred all of the partcpants ntended obects and ntentons correctly. VI. Concluson s paper proposed an nteractve human ntenton readng method based on developmental nowledge. In ths system, the robot develops ts own nowledge by learnng an HBKN through nteractons wth a human. e HBKN was proposed to learn the herarchcal behavor structures for dfferent obects and the robot can nfer the human ntenton for an obect by usng learned HBKNs. e proposed human ntenton readng algorthm ncorporates passve renforcement learnng wth adaptve dynamc programmng so that t learns HBKNs nteractvely usng context nformaton and human feedbac through human behavors. e effectveness of the proposed method was demonstrated through play-based experments between a human subect and vrtual teddy bear robot wth two ETRI Journal, Volume 38, Number 6, December 6 J-Hyeong Han et al. 37

10 Table. Human ntenton readng results for multple partcpants: ball obect, toy car obect, and (c) both obects. No. Intenton Interacton behavor sequence Fnal U sult P Push Pu, HI(Pu) ( ) Rght P row, HI() ( ) Rght P3 Push,, Pu, Pu, HI(Pu) ( ) Rght P4 row, HI() ( ) Rght P5 row, Gr,, HI() ( ) Rght P6 lease, Gr,, HI() ( ) Rght P7 Grasp, Neg(Pu), s(gr), Gr, HI(Gr) ( ) Rght P8 row Gr,, HI() ( ) Rght P9 row, HI() ( ) Rght P row, Neg(), Neg(Pu), Neg(Gr),,,,, HI() ( ) Rght No. Intenton Interacton behavor sequence Fnal U sult P row Gr,, HI() ( ) Rght P Grasp st:, Pu ( ) Wrong nd:, Gr, HI(Gr) ( ) Rght P3 lease,, HI() ( ) Rght P4 Push Gr, Pu, HI(Pu) ( ) Rght P5 Push Gr, Pu, HI(Pu) ( ) Rght P6 Push, Neg(Pu), Neg(Gr),, Pu, HI(Pu) ( ) Rght P7 lease, Neg(Pu), s(gr),, HI() ( ) Rght P8 Push, Neg(Pu), s(gr),, Pu, HI(Pu) ( ) Rght P9 Push, Neg(), Neg(Pu), s(gr), Pu, HI(Pu) ( ) Rght P Push Pu, HI(Pu) ( ) Rght (c) No. Intenton Interacton behavor sequence sult P Push the ball,, Pu Rght P row the ball Rght P3 Push the ball Pu Rght P4 row the ball Rght P5 row the ball Rght P6 Push the car, Pu Rght P7 lease the car,, Gr, Rght P8 row the ball Rght P9 Push the car, Pu Rght P row the ball, Rght vrtual obects, a ball and a toy car. ree dfferent scenaros demonstrated that the robot could correctly nfer human ntentons usng the proposed method, even n confusng stuatons. In addton, experments wth multple partcpants who were not nvolved n developng the proposed method were conducted, and the robot could nfer all the partcpants ntentons successfully except n the case n whch a partcpant gave ncorrect behavor as an nput. 38 J-Hyeong Han et al. ETRI Journal, Volume 38, Number 6, December 6

11 ferences [] M.M. Botvnc, Herarchcal Models of Behavor and Prefrontal Functon, Trends Cogntve Sc., vol., no. 5, May 8, pp. 8. [] L. Zhang and D. Zhang, Vsual Understandng va Mult-feature Shared Learnng wth Global Consstency, IEEE Trans. Multmeda, vol. 8, no., Feb. 6, pp [3] L. Zhang et al., LSDT: Latent Sparse Doman Transfer Learnng for Vsual Adaptaton, IEEE Trans. Image Process., vol. 5, no. 3, Mar. 6, pp [4] P. Carruthers and P.K. Smth, eores of eores of Mnd, New Yor, USA: Cambrdge Unversty Press, 996. [5] J. Gray et al., Acton Parsng and Goal Inference Usng Self as Smulator, IEEE Int. Worshop Robot Human, Interactve Commun., Nashvlle, TN, USA, Aug. 3 5, 5, pp. 9. [6] C. Breazeal et al., An Emboded Cognton Approach to Mndreadng Slls for Socally Intellgent Robots, Int. J. Robot. s., vol. 8, no. 5, May 9, pp [7] B. Jansen and T. Belpaeme, A Computatonal Model of Intenton adng n Imtaton, Robot. Auton. Syst., vol. 54, May 6, pp [8] L.M. Hatt et al., Accommodatng Human Varablty n HumanRobot Teams through eory of Mnd, Proc. Int. Jont Conf. AI, Barcelona, Span, July 6,, pp [9] O.C. Schrempf et al., A Novel Approach to Proactve HumanRobot Cooperaton, Proc. IEEE Int. Worshop Robot Human Interactve Commun., Nashvlle, TN, USA, Aug. 3 5, 5, pp [] A.J. Schmd et al., Proactve Robot Tas Selecton Gven a Human Intenton Estmate, Proc. IEEE Int. Symp. Robot Human Interactve Commun., Jeu, p. of Korea, Aug. 6 9, 7, pp [] R. Kelley et al., Context-Based Bayesan Intent cognton, IEEE Trans. Auton. Mental Develop., vol. 4, no. 3, Sept., pp [] Z. Wang et al., Probablstc Movement Modelng for Intenton Inference n Humanrobot Interacton, Int. J. Robot. s., vol. 3, no. 7, 3, pp [3] E. Bcho et al., Integratng Verbal and Nonverbal Communcaton n a Dynamc Neural Feld Archtecture for Human-Robot Interacton, Fronters Neurorobot., vol. 4, no. 5,. [4] E. Bcho et al., Neuro-Cogntve Mechansms of Decson Mang n Jont Acton: A Human-Robot Interacton Study, Human Movement Sc., vol. 3, no. 5, Oct., pp [5] K. Strabala et al., Learnng the Communcaton of Intent Pror to Physcal Collaboraton, Proc. IEEE RO-MAN, Pars, France, Sept. 9 3,, pp [6] R. Kelley et al., Deep Networs for Predctng Human Intent ETRI Journal, Volume 38, Number 6, December 6 wth spect to Obects, Proc. Annu. ACM/IEEE Int. Conf. Human-Robot Interacton, Boston, MA, USA, Mar. 5 8, pp [7] Z. Yu and M. Lee, Human Moton Based Intent cognton Usng a Deep Dynamc Neural Model, Robot. Auton. Syst., vol. 7, Sept. 5, pp [8] J.-H. Han and J.-H. Km, Consderaton about the Applcaton of Dynamc Tme Warpng to Human Hands Behavor cognton for Human-Robot Interacton, n Robot Intell. Technol. Appl., Swtzerland: Sprnger Internatonal Publshng, 3, pp [9] R.E. Bellman, Dynamc Programmng, Prnceton, NJ, USA: Prnceton Unversty Press, 957. J-Hyeong Han receved the BS and PhD degrees n electrcal engneerng from Korea Advanced Insttute of Scence Technology, Daeeon, p. of Korea, n 8 and 5, respectvely. Snce 5, she has been wth ETRI, Daeeon, where she s currently a searcher. Her research nterests nclude human ntent recognton, human robot nteracton, ntellgent robotcs, and smart factory. Seung-Hwan Cho receved the BS degree n computer scence and the PhD degree n electrcal engneerng from KAIST n 5 and 3, respectvely. Snce 3, he has been wth UX Center, Corporate Desgn Center, Samsung Electroncs, Seoul, p. of Korea, where he s currently a Senor Desgner. Jong-Hwan Km receved the PhD degree n electroncs engneerng from Seoul Natonal Unversty, p. of Korea, n 987. Snce 988, he has been wth the School of Electrcal Engneerng, KAIST, where he s leadng the Robot Intellgence Technology Laboratory as Professor. Dr. Km s the Drector for KoYoungKAIST AI Jont search Center. Hs research nterests nclude InT (Intellgence Technology), ntellgent nteractve technology, ubqutous and genetc robots, and humanod robots. He has authored 5 boos and 5 edted boos, ournal specal ssues and around 4 refereed papers n techncal ournals and conference proceedngs. He has delvered over nvted tals on computatonal ntellgence and robotcs ncludng over 5 eynote speeches at the nternatonal conferences. He was an Assocate Edtor of the IEEE Transactons on evolutonary computaton and the IEEE computatonal ntellgence magazne. J-Hyeong Han et al. 39