CNN-based People Recognition for Vision Occupancy Sensors

Size: px

Start display at page:

Download "CNN-based People Recognition for Vision Occupancy Sensors"

Steven Lamb
5 years ago
Views:

1 (JBE Vol. 23, No. 2, March 2018) (Regular Paper) 23 2, (JBE Vol. 23, No. 2, March 2018) ISSN (Online) ISSN (Print) a), a), a) CNN-based People Recognition for Vision Occupancy Sensors Seung Soo Lee a), Changyeol Choi a), and Manbae Kim a) PIR(pyroelectric infra-red). PIR %. Abstract Most occupancy sensors installed in buildings, households and so forth are pyroelectric infra-red (PIR) sensors. One of disadvantages is that PIR sensor can not detect the stationary person due to its functionality of detecting the variation of thermal temperature. In order to overcome this problem, the utilization of camera vision sensors has gained interests, where object tracking is used for detecting the stationary persons. However, the object tracking has an inherent problem such as tracking drift. Therefore, the recognition of humans in static trackers is an important task. In this paper, we propose a CNN-based human recognition to determine whether a static tracker contains humans. Experimental results validated that human and non-humans are classified with accuracy of about 88% and that the proposed method can be incorporated into practical vision occupancy sensors. Keywords : occupancy sensor, camera, CNN, people recognition, tracking a) (Department of Computer and Communications Eng., Kangwon National University) Corresponding Author : (Manbae Kim) manbae@kangwon.ac.kr Tel: ORCID: () (No. 2017R1D1A3B ). This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2017R1D1A3B ). Manuscript received Ocotober 19, 2017; Revised January 24, 2018; Accepted March 6, Copyright 2016 Korean Institute of Broadcast and Media Engineers. All rights reserved. This is an Open-Access article distributed under the terms of the Creative Commons BY-NC-ND ( which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited and not altered.

2: (Seung Soo Lee et al.: CNN-based People Recognition for Vision Occupancy Sensors). (light on), (light off) (occupancy sensor, motion sensor). PIR(pyroelectric infra-red) [1,2].

2 2: (Seung Soo Lee et al.: CNN-based People Recognition for Vision Occupancy Sensors). (light on), (light off) (occupancy sensor, motion sensor). PIR(pyroelectric infra-red) [1,2]. PIR (thermal temperature),.. 1), 2) (radiation), 3). PIR [3~6]. PIR,,,. Benezeth CAPTHOM [3].. Han PIR [4]. Nakashima [5]. Amin PIR [6]. PIR.., (drifting),..,. 1., 1(a).,. 1(b) 4,, 3. (a) 1.., Fig. 1. The detected stationary bounding boxes marked in blue. The moving boxes are marked in red. (convolutional neural network, CNN). (b),.,.....,..,. CNN,... 2.

(JBE Vol. 23, No. 2, March 2018). (Motion History Image, MHI).,, CNN CNN... MHI (2). i f MAX otherwise, (decay). >T MHI,.. MHI. =255, MHI=255.. 3 MHI. MHI. 2. Fig. 2. The overall block diagram of the proposed method.

3 (JBE Vol. 23, No. 2, March 2018). (Motion History Image, MHI).,, CNN CNN... MHI (2). i f MAX otherwise, (decay). >T MHI,.. MHI. =255, MHI= MHI. MHI. 2. Fig. 2. The overall block diagram of the proposed method.., (motion history image, MHI). (search window).. 3ⅿ,.. Bovick MHI [7]. MHI. (1),. 3. MHI Fig. 3. Subsequent images and their MHIs. 8x8 (block) ( 4(b)), (5) MHI ( 4(c)). k. 4(c) 4(d) MHI. 3x3 MHI

2: (Seung Soo Lee et al.: CNN-based People Recognition for Vision Occupancy Sensors),.,.. 4.

Saving the sum of all pixel MHIs into a 8x8 block after decomposition of a search window, n TP n..,,.

5-label classification CNN (L5-CNN) Table 1.

4 2: (Seung Soo Lee et al.: CNN-based People Recognition for Vision Occupancy Sensors),., x8 block, MHI Fig. 4. Saving the sum of all pixel MHIs into a 8x8 block after decomposition of a search window, n TP n..,,. I t I t-1 n, BB n. T, BB ρ. ρ MHI. (CNN) 2,. CNN,. (pooling).,,..,., label classification CNN (L5-CNN) Table 1. Labels and objects for 5-label classification CNN (L5-CNN) Label L1 L2 L3 L4 L5 Object floor, wall chair desk, whiteboard bookshelf, computer, box human 5. MHI Fig. 5. Resulting tracking images obtained by the MHI energy-based tracking method CNN, label classification (L5-CNN) L 1. L 2. L 3.,, L 1. L 5 (

(JBE Vol. 23, No. 2, March 2018) 6). 2-label classification (L2-CNN) L 1 L 4 L 1, L 5 L 2 ( 7).

Xavier [9], (stochastic gradient descent). 2. Table 2. Parameter values used in neural networks 6.

Training images belonging to [L 1, L 5] in L5-CNN Layer Activation function input layer 64 x 64

5 (JBE Vol. 23, No. 2, March 2018) 6). 2-label classification (L2-CNN) L 1 L 4 L 1, L 5 L 2 ( 7).. 64x64 grayscale 4,096. (feature map) ReLU [8]... ReLU. n Softmax. 2. Xavier [9], (stochastic gradient descent). 2. Table 2. Parameter values used in neural networks 6. L5-CNN 15 Fig. 6. Training images belonging to [L 1, L 5] in L5-CNN Layer Activation function input layer 64 x 64 convolution layer 3 x 3 x 16 ReLU pooling layer 2 x 2 max pooling hidden layer 50 ReLU output layer n Softmax 7. L2-CNN Fig. 7. Training images belonging to [L 1, L 2] in L2-CNN m 8. Fig. 8. Basic network structure of the proposed neural network

6 2: (Seung Soo Lee et al.: CNN-based People Recognition for Vision Occupancy Sensors). 720x480 RGB. Windows10 MATLAB R2017b.,. L 1 L 4 64x64 (sliding window). L 5. (L 1 L 4 ) (L 5 ) 2,000.. CNN , 100, %, 20%. 3 L5-CNN 3. L5-CNN. n-l CNN n, l Table 3. Accuracy measured by no. of convolution filters and hidden layers in L5-CNN. n is no. of convolution layers and l is no. of hidden layers in n-l CNN CNN type No. of conv layer No. of hid layer Accuracy 1-1 CNN % CNN % 2-1 CNN % CNN % 3-1 CNN % CNN % 4-1 CNN % CNN %. L5-CNN 4-1 CNN 62.8%, 1-1 CNN 50%. L5-CNN. L2-CNN. L2-CNN L5-CNN. L 1 L 4 L 1 L 2, L 2 (crop) (data augmentation). L 1 L 4 L 1 L 2 L 1. 1,500 3, L2-CNN. 3 L5-CNN. 1-1 CNN 83.17% 3-2 CNN 88.17%.,,. 4. L2-CNN. n-l CNN n, l Table 4. Accuracy measured by no. of convolution filters and hidden layers in L2-CNN. n is no. of convolution layers and l is no. of hidden layers for n-l CNN CNN type No.of conv layer No. of hid layer Accuracy 1-1 CNN % CNN % 2-1 CNN % CNN % 3-1 CNN % CNN % 4-1 CNN % CNN % SVM(Support Vector Machine) [12]. SVM

7 (JBE Vol. 23, No. 2, March 2018). SVM L2-SVM 5 L5-SVM. 3, ( 5) L5-SVM 51.6% 1-1 CNN 1-2 CNN 1.6%, 8.0%. 2-1 CNN 4-2 CNN L5-SVM 9% L2-SVM 60.5%. L2-CNN 1-1 CNN 83.17%, 88.17%. 1-1 CNN SVM. PIR.. 5. CNN SVM Table 5. The comparison of classification accuracy for CNN and SVM 5 Labels 2 Labels Classifier Accuracy Classifier Accuracy L5-SVM 51.60% L2-SVM 60.50% 1-1 CNN 50.00% 1-1 CNN 83.17% 1-2 CNN 43.60% 1-2 CNN 84.67% 2-1 CNN 52.80% 2-1 CNN 83.00% 2-2 CNN 56.80% 2-2 CNN 83.00% 3-1 CNN 60.00% 3-1 CNN 87.17% 3-2 CNN 50.40% 3-2 CNN 88.17% 4-1 CNN 62.80% 4-1 CNN 86.00% 4-2 CNN 55.60% 4-2 CNN 86.67% 9. L5-CNN 3,,,. L2-CNN 3,.. [10,11]. CNN. Classification Error images Label Ground truth L 4 L 4 L 5 L 2 L 2 L 2 Predicted L 1 L 2 L 2 L3 L 4 L 5 (a) Classification Error images Label Ground truth L 2 L 2 L 2 L 1 L 1 L 1 Predicted L 1 L 1 L 1 L 2 L 2 L 2 (b) 9.. (a) L5-CNN, (b) L2-CNN Fig. 9. Images with classification error and their labels. (a) L5-CNN and (b) L2-CNN

Amft, A distribted PIR-based approach for estimating people count in office environments, IEEE conf on Computational Science and Engineering, 2012. [3] Y. Benezeth et al.

8 2: (Seung Soo Lee et al.: CNN-based People Recognition for Vision Occupancy Sensors)..., hand-crafted feature,.. 85% CNN SVM. CNN PIR.. (References) [1] P. Liu et al. Occupancy inference using pyroelectric infrared sensors through hidden markov model, IEEE Sensors Journal, 16(4), Feb [2] F. Wahl, M. Milenkovic, and O. Amft, A distribted PIR-based approach for estimating people count in office environments, IEEE conf on Computational Science and Engineering, [3] Y. Benezeth et al. Towards a sensor for detecting human presence and characterizing activity, Energy and Buildings, 43, [4] J. Han and B. Bhanu, Fusion of color and infrared video for moving human detection, Pattern Recognition, 40, [5] S. Nakashima, Y. KItazono, L. Zhang, and S. Serikawa. Development of privacy-preserving sensor for person detection, Procedia, 2, [6] I. Amin, A. Taylor, F. Junejom, A. Al-Habaibeh, and R. Parkin, Automated people-counting by using low-resolution infrared and visual cameras, Measurement, 41, [7] A. Bobick and J. Davis, "The recognition of human movement using temporal templates," IEEE Trans. Pattern Recognition and Pattern Analysis, Vol 23, No. 3, Mar [8] A. L. Maas, A. Y. Hannun, A. Y. Ng, Rectifier Nonlinearities Improve Neural Network Acoustic Models, Proc. of the 30th Int. Conf. on Machine Learning, Atlanta, USA, June [9] X. Glorot and Y. Bengio, Understanding the difficulty of training deep forward neural networks, Int Conf. Artificial Intelligence and Statistics, Society for Artificial Intelligence and Statistics, [10] J. Gil and M. Kim, Real-time People Occupancy Detection by Camera Vision Sensor, Journal of Broadcast Engineering, Vol 22, No. 6, Nov [11] T. Tian, R. Schmidt, H. Liu, A. Hampapur, and M. T. Sun, Robust Detection of Abandoned and Removed objects in complex surveillance videos, IEEE Trans. Sym. Man and Cybernetics - Part C: Applications and Reviews, Vol. 41, No. 5. pp , Sep [12] C-C. Chang and C-J. Lin, LIBSVM: A library for support vector machines, ACM Tran. on Intelligent Systems and Technology, Vol 2, No. 3, pp. 27:1-27:27, : IT ~ : - :,, : : : ~ 1996 : ETRI / ~ 2011 : IT ~ : - ORCID : - :, 3D,

9 (JBE Vol. 23, No. 2, March 2018) : : University of Washington, Seattle : University of Washington, Seattle ~ 1998 : ~ : IT ~ : - ORCID : - : 3D,,