Evaluating effects of workload on trust in automation, attention allocation and dual-task performance

Size: px
Start display at page:

Download "Evaluating effects of workload on trust in automation, attention allocation and dual-task performance"

Transcription

1 Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1799 Evaluating effects of workload on trust in automation, attention allocation and dual-task performance Meng Yuan Zhang and X. Jessie Yang Department of Industrial & Operations Engineering University of Michigan, Ann Arbor This study aimed to examine how workload and automation aid type affected operators trust in automation, attention allocation and dual-task performance. With a simulated surveillance task, participants monitored the picture steaming from an unmanned ground vehicle (UGV) while planning the paths of two unmanned aerial vehicles (UAVs). The analysis of experimental results indicated that workload affected operators attention allocation and dual-task performance, but not their trust in automation. As workload increased, attention allocation on the automated task decreased and on the concurrent task increased. Moreover, an increasing workload led to longer response time for the automated task. For the concurrent task, higher workload harmed task performance accuracy but resulting in shorter response time. Copyright 2017 by Human Factors and Ergonomics Society. DOI / INTRODUCTION An important feature of the future combat system (FCS) is soldiers supervisory control of multiple unmanned vehicles (Barnes & Evans III, 2010). In the past, these unmanned vehicles were often perceived as passive agents receiving orders from soldiers through teleoperation. Yet, as the automation capabilities of the robots grow, there is an increasing possibility that they might function as full-fledged team members. One major design challenge of the complex human-robot partnership is related to soldiers trust in the automated robots, primarily due to two reasons. First, the automation is often times imperfect, especially in environments of high uncertainty and ambiguity. For example, an interview with Hunter/Shadow unmanned aerial vehicle (UAV) pilots showed that the automation failure rate was estimated to be around 10 percent (Wickens & Dixon, 2002). Second, the calibration between the automation s true ability and human operators trust in automation is often imperfect (Sheridan & Parasuraman, 2005; Wickens, Levinthal, & Rice, 2010). Soldiers could over-trust or under-trust an automated system, resulting in suboptimal decision making and task performance. Existing research has mainly addressed trust in and dependence on automation independently, but failed to examine the relationship between the two. A few studies, however, implied the existence of a complex relationship between trust and dependence. Researchers from the U.S. Army Research Laboratory conducted several experiments to examine the performance of robot operators at a simulated military crew-station in either a single-tasking or a multitasking environment and reported a significant complacency effect when the operators workload was high they depended on the automated system more than they should have done (Chen, 2010; Chen & Terrence, 2009). Although trust was not measured in these studies, it is likely that the operators trust in the automated system did not increase with increasing workload. Rather, operators chose to depend on the automated system deliberately as they did not have enough cognitive resources to perform the task manually. In a series of studies, Wickens and colleagues (2002; 2006; 2010) investigated the issues of unreliable automation in supervisory control of unmanned aerial vehicles (UAVs). Participants were involved in navigating a simulated UAV through a series of mission legs, to perform a manual searching task and an automationassisted system failure detection task. The results showed significant effects of automation reliability when the manual task was complex, but no effects when the manual task was easy. The authors concluded that there was an influence of workload on automation compliance and task performance. The above-mentioned studies suggest a complex relationship between trust, dependence and task performance. However, this relationship has not been examined thoroughly. The present study, therefore, aimed to examine how workload, under different types of automation aid, affected trust in automation, task performance and attention allocation through a simulated surveillance task. Attention allocation between the dual-task could be considered as a measure of dependence as greater dependence on automation aids would lead to a decrease in attention in the corresponding automated task. H1: workload has a significant effect on attention allocation and task performance, but not on trust in automation. H2: automation aid type has a significant effect on attention allocation, task performance and trust in automation. METHODOLOGY Experimental Task A desktop-based simulation program was developed benchmarking prior research (Wickens, Levinthal, & Rice, 2010). In the simulated task, a human operator performed two tasks simultaneously (Figure 1). The operator monitored the picture streaming from an unmanned ground vehicle (UGV) and detected potential threats in the environment. At the same time, the operator planned the paths for two unmanned aerial vehicles (UAVs), denoted as UAV_A and UAV_B. The two tasks were shown on two separate displays with an angle of 120 degrees between them such that performing the two tasks required an overt shift of attention.

2 Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1800 While performing the UGV task, operators monitored the display depicting photo streaming of the raw environment data, aiming to report potential threats as accurately and as quickly as possible. This detection task was supported by an automated threat detector such that when the detector recognized an enemy, both a visual and an auditory alarm went off (Figure 2). This automated aid had varying degree of reliability and functioned at one of five levels: a nonautomated baseline (BL), a 67 percent reliable aid with false alarms (67FA), a 67 percent reliable aid with misses (67M), a 67 percent reliable aid with equal numbers of false alarms and misses (67MPFA) and a 100 percent reliable aid (100A). The UAV displays showed the locations and paths of two UAVs. Operators navigated the two UAVs through a series of branching points resembling a decision tree. UAV flight between waypoints was automated, but operators were required to select the next correct branch once reaching a new waypoint. In order to select the next correct waypoint, operators added the last two digits of the x and y coordinates of the UAV, and selected the northernmost waypoint when the sum was greater than or equal to 100 and the southernmost waypoint otherwise. When UAVs reached a waypoint, the circle stayed indefinitely in an idle mode until the operator selected the next waypoint. Workload was manipulated through the time interval between waypoints. The time interval between two waypoints was 7.5seconds in the high workload condition and 15seconds in the low workload condition. experimental design was a 2 by 5 mixed design, with workload as a within-subject factor and automation aid as a between-subject factor. Participants were randomly assigned to one of the automation aid conditions and completed both workload conditions as follows: Workload Low High Automation Aid Type 100A 67FA 67M 67MPFA BL Dependent variables. Every condition consisted of 18 trials, each lasting 30 seconds. After each trial, operators reported their trust towards the automated threat detector and their self-confidence in performing the task on their own. The trust and confidence scores for the 18 trials were aggregated and averaged. n Trust = 1 n Trust i i=1 n Confidence = 1 n Confidence i i=1 Attention allocation was measured by the participants proportional dwell time on the UGV, the UAV_A and the UAV_B displays. The UGV task performance was measured by the number of correct identifications and rejection of threats as well as the time taken to identify present enemies. The UAV task performance was measured by the number of correct selection of waypoints and the accumulation of idle time. Experimental Procedures and Apparatus Figure 1: Illustration of the UGV display, and the UAV_A and UAV_B displays Figure 2: Illustration of the automated detector for the UGV task Participants 80 undergraduate and graduate students participated in the study and all of them gave signed informed consents. The current paper reports the results of 40 participants with 8 participants randomly chosen from each condition (Age = 21.1, SD = 4.7). Among the 40 participants, 9 participants were female and 31 were male. Participants mean experience with similar games was 5.025, SD = on a 1-9 Likert scale, where 1 denoted no experience and 9 denoted significant amount of experience. Experimental Design Independent variables. There were two independent variables: workload (2 levels), and automation aid (5 levels). The Before the experiment, each participant was briefed on the objectives and procedure of the experiment and gave informed consent. Next, participants filled in their demographic information. Afterwards, each participant was calibrated and put on a Tobii eye glasses 2 eye tracker for the purpose of measuring their dwell time on the task related displays (Figure 1). Each participant then underwent a practice session to familiarize with the simulated task. The practice session included 4 trials on the UGV task alone, 4 trials on the UAV task alone, and 4 trials on the combined task. For the UGV task and the combine task, participant experienced 1 hit, 1 false alarm, 1 miss and 1 correct rejection from the automated threat detector. After the practice session, every participant performed two trial blocks, each consisting of 18 trials with low or high workload UAV task. The sequence of high and low workload conditions were counterbalanced. Between the two trial blocks, participants were given a short break. RESULTS Here we report the results with 40 participants, with 8 of them randomly selected from the each of the five conditions: BL, 67FA, 67M, 67MPFA and 100A. Two-way repeated analysis of variance (ANOVA) were conducted, with workload as the within subject factor and automation aid as the between subject factor.

3 Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1801 Trust in automation The baseline condition was excluded in the analysis of trust, as no automated aid was present under that condition. A two-way repeated ANOVA revealed non-significant effects of workload (F(1,28) =.009, p =.923) and automation aid type (F(3,28) = 2.249, p =.105) on trust in automation , p <.001) as well as the UAV_B display (F(1,35) = , p <.001). Figure 3: Mean value and SE of trust for each automation aid and workload condition Self-confidence The two-way repeated ANOVA revealed a significant effect of workload (F(1,35) = , p =.001) on participants self-confidence scores. High workload led to lower selfconfidence in completing the two tasks. The effect of automation aid type (F(4,35) =.708, p =.592) on selfconfidence was not significant. Figure 4: Mean value and SE of self-confidence for each automation aid and workload condition Attention allocation Analysis of percentage dwell time indicated that as workload increased, there was a significant decrement of percentage dwell time on the UGV display (F(1,35) = , p <.001) and correspondingly, a significant increment of percentage dwell time on the UAV_A display (F(1,35) = (c) Figure 5: Mean value and SE of percentage dwell time on UGV display, UAV_A display and (c) UAV_B display

4 Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1802 Figure 6 presents a visualization of the pattern described above, where attention shifted from the left screen (UGV task) to the right screen with the UAV task. As a result, the UGV task window became less concentrated with the red color. Figure 7: Mean value and SE of UGV task performance measured as correct identification of the true state of the world, and response time of identifying a potential threat Figure 6: Example of heat map about attention allocation under low workload condition and high workload condition Dual-task performance UGV task performance. There was no significant effects of workload (F(1,35) = 1.184, p =.284) nor automation aid type (F(4,35) = 1.208, p =.325) on the number of correct identifications. Analysis of UGV task response time revealed a significant effect of workload (F(1,35) = 6.457, p =.016), that higher workload increased the response time of identification. Meanwhile, there was no significant effect of automation aid condition (F(4,35) =.279, p =.889) on response time. UAV task performance. As workload increased, there was a significant drop of UAV task performance measured as percentage of correct branch selection (F(1,35) = 4.12, p =.05). Interestingly, increasing workload facilitated UAV task performance in terms of average idle time (F(1,35) = 8.066, p =.007). Decrease in response time may be explained by participants increased attention allocation on the UAV task. There were no significant effects of automation aid on percentage of correct selection (F(4,35) =.814, p =.814), nor on average idle time (F(4,35) =.283, p =.887).

5 Proceedings of the Human Factors and Ergonomics Society 2017 Annual Meeting 1803 Likewise, the current study considered attention allocation as a measure of dependence on the automated aid; however, increase in attention to the concurrent UAV task is expected due to increase in its workload. For this reason, participants dependence behavior could be more appropriately measured by the time taken to switch attention to the UGV task after each alarm in a further study. REFERENCES Figure 8: Mean value and SE of UAV task performance measured as correct selection of way points, and average idle time DISCUSSION and CONCLUSION This study aimed to examine how workload and automation aid type affected automation trust and dependence and dual-task performance. With a simulated surveillance task, a human-subject experiment was conducted. H1: Analysis with data from 40 participants supported H1 in that workload affected attention allocation and dual-task performance. Specifically, as workload increased, attention allocation on the automated task reduced and attention allocation on the concurrent task increased. In addition, an increasing workload harmed the automated task performance in terms of response time. For the concurrent UAV task, higher workload led to lower accuracy in the selection of waypoints, but quicker response time. This interesting pattern may be due to participants increased attention allocation on the concurrent UAV task. The increased attention on the UAV task allowed for quicker response time while there may have been a speed-accuracy trade off harming the accuracy of performance. More importantly, we found no evidence for the effect of workload on trust in automation, suggesting that workload may moderate the relationship between participants subjective trust and dependence behaviors. Barnes, M. J., & Evans III, A. W. (2010). Soldier robot teams in future battlefields: An overview. In M. J. Barnes & F. Jentsch (Eds.), Human Robot Interactions in Future Military Operations (pp. 9-29). Hampshire England: Ashgate Publishing. Chen, J. Y. C. (2010). Robotics operator performance in a multi-tasking environment. In M. J. Barnes & F. Jentsch (Eds.), Human-Robot Interactions in Future Military Operations (pp ): Ashgate Publishing. Chen, J. Y. C., & Terrence, P. I. (2009). Effects of imperfect automation on concurrent performance of military and robotics tasks in a simulated multi-tasking environment. Ergonomics, 52(8), Dixon, S. R., & Wickens, C. D. (2006). Automation reliability in unmanned aerial vehicle control. Human Factors, 48, Sheridan, T. B., & Parasuraman, R. (2005). Human- Automation Interaction. Reviews of Human Factors and Ergonomics, 1(1), Wickens, C. D., & Dixon, S. (2002). Workload demands of remotely piloted vehicle supervision and control. Savoy, IL: University of Illinois, Aviation Research Laboratory. Wickens, C. D., Levinthal, B., & Rice, S. (2010). Imperfect reliability in ummaned air vehicle supervison and control. In M. J. Barnes & F. Jentsch (Eds.), Human Robot Interactions in Future Military Operations. Hampshire England: Ashgate Publishing. H2: No evidence was found to support H2 as there was no evidence suggesting that automation aid type had significant effect on attention allocation, task performance or trust in automation. This lack of effect from automation type could be explained by the experimental design. Since the UAV path selection task was a discrete task as opposed to a continuous one, participants did not appear to have a difficulty in switching their attention between the two task screens. As a result, performance on the UGV task was not significantly different in terms of accuracy across all conditions. Thus, improvements may be made in testing the significance of automation aid type by designing a continuous concurrent task.