view gait recognition: a survey

  ddos防御     |      2023-03-23 11:21

Gait recognition aims at determining a person"s identity by the way he or she walks. Compared with face recognition, fingerprint recognition, iris recognition and other biometric methods, gait recognition can be performed at a distance and does not require special acquisition equipment, high image resolution, or subject cooperation. Moreover, one’s gait is difficult to be hidden or disguised. Based on the above unique advantages, gait recognition has a broad application prospect in security monitoring, investigation and evidence collection, and daily attendance. In these practical applications, the performance of gait recognition is easily affected by covariates such as view-point variations, occlusions and segmentation error, among which view-point variations are one of the main factors affecting the gait recognition performance. The intra-class differences of different view-points are often greater than the inter-class differences of the same view-point. Therefore, improving the robustness of cross-view gait recognition has become a hot topic with a high research and application significance. Based on extensive research, this paper provides a review of existing cross-view gait recognition methods. Firstly, we briefly introduce the research background of the field from the perspectives of basic concepts, data acquisition methods, application scenarios, development history and existing reviews. Then the focus of this paper is drawn to the review of video-based cross-view gait recognition methods. On this basis, the popular cross-view gait databases are sorted from data type, sample size, view-point number, acquisition environment and other covariates, and the characteristics of these databases are analyzed in detail. Then, cross-view gait classification methods are carefully presented. Unlike most existing review papers that classify gait recognition methods by the basic steps such as data acquisition, feature representation and classification, this paper focuses on methods that solve the cross-view recognition problem. Specifically, four cross-view gait recognition methods are analyzed from the perspective of feature representation and classification, i.e., 3D gait information construction, view transformation model (VTM), view-invariant feature extraction, and the deep learning-based methods. For 3D gait information methods, the gait information is extracted from multi-view gait videos and used to construct 3D gait models. These methods have good robustness to large view changes, but they often require expensive and complex configurations of calibrated high-resolution multi-camera systems with extensive computation and frame synchronization, all of which limit their application to real surveillance scenarios. For VTM methods, singular value decomposition (SVD)-based and regression-based view transformation models are introduced to local and global features. Although the VTM minimizes the error between the transformed gait features and the original gait features, the discriminative analysis is ignored. For view-invariant feature extraction methods, manual feature extraction, discriminative subspace learning methods and metric learning are compared Among the discriminative subspace learning methods, the Canonical Correlation Analysis (CCA)-related methods are highlighted. Despite the advantages of these methods, it is sometimes difficult to find a robust view-invariant subspace or metric for features in the case of large view differences. Deep learning methods for cross-view recognition consist mainly of convolutional neural network (CNN), recurrent neural network (RNN), auto encoder (AE), generative adversarial network (GAN), 3D convolutional neural network (3D CNN) and graph convolutional network (GCN). In order to further understand the performance of different cross-view gait recognition methods, some representative state-of-the-art methods on CASIA-B, OU-ISIR LP and OU-MVLP databases are compared and analyzed. It’s found that the methods using 3D CNN or multiple neural network architectures, which represent gait features with a sequence of silhouettes, achieve good performance. Meanwhile, deep neural network methods based on body model representation also show excellent performance under the condition with only view variations. Finally, future research directions are discussed for cross-view gait recognition, including the establishment of large-scale gait databases containing complex covariates, cross-database gait recognition, self-supervised learning methods for gait features, disentangled representation learning methods for gait features, further developing model-based gait representation methods, exploring new methods for temporal feature extraction, multimodal fusion gait recognition and improving the security of gait recognition systems.