Introduction

Person identification can use several human parts or traits and are classified as primary biometric and soft biometric traits [1]. Primary biometric traits are fingerprint [2], hand [3], body [4], gait [5], face [6], and voice [24]. Soft biometric traits such as androgenic/arm’s hair patterns, gender, age, weight, skin marks, height, and color (skin, hair, and eye) are used along with the primary biometric traits to obtain improved accuracy [1].

Often, the evidence collected is in the form of digital images and captured in uncontrolled situations [7]. As most perpetrators cover their faces, the only available information in these images will be their hands. Though hands are primary biometric traits, these have less variability when compared to faces. The facial features are generally more complex and visible, making it a more robust biometric trait for identification. With the advent of more sophisticated and advanced digital cameras and better resolution closed-circuit television (CCTV) cameras in public places, several security systems have used hand vein patterns and androgenic hair patterns for person identification [8].

There are several methods to recognize humans from primary and soft biometric traits. Afifi [9] used a two-stream convolutional neural network and support vector machine classifier for hand-based person identification. They considered the subjects’ both hands as the same, which is not usual and is less accurate. Baisa et al. [3] proposed global and part-aware deep feature representation learning for hand-based person identification. Similarly, several other deep learning architectures such as part-based convolutional baseline (PCB), multiple granularity network (MGN), pyramidal representations network (PyrNet), attentive but diverse network (ABD-Net), omni-scale network (OSNet), discriminative and generative learning network (DGNet), dual part-aligned representations network (P2Net), and interaction and aggregation network (IANet) are used for person identification from digital images, but all these networks need to train entirely when a set of data comes for person re-identification [6, 8]. In case of serious crimes, new criminals will get added over time, and for each new addition of criminals, training the entire database again is a very tedious task. There are very few works on person identification from arm’s or androgenic hair patterns [10,11,12]. The existing methods used grayscale, local binary patterns (LBP), and histogram of oriented gradients (HOG). These techniques used hand-crafted features for person identification. It is evident from the literature that the state-of-the-art deep learning techniques perform better than these machine learning techniques, which use hand-crafted features. The earlier methods related to Person re-ID (re-IDentification) extracted local descriptors, low-level features or high-level semantic attributes, and global representations through sophisticated but time-consuming hand-crafted features. In addition, the hand-crafted feature representation failed to perform better when image variants such as occlusion, background clutter, pose, illumination, cultural and regional background, intra-class variations, cropped images, multipoint view, and deformations were present in the data. However, deep neural networks were introduced in person re-ID in 2014, which completely changed the feature extraction methodology. Deep learned features perform better in end-to-end learning and are robust to the image variants. The improved feature representation in the deep learning architectures makes it more popular than machine learning methods for person re-ID [6, 8, 13, 14].

To address the following issues, we proposed and implemented a novel architecture based on Siamese networks to identify the person based on their arm’s hair patterns. Since there exists no standard database dedicated for arm’s hair pattern recognition, we created and analyzed arm’s hair pattern person identification with several state-of-the-art deep learning architectures.

The key contributions of this paper are as follows:

  • Person identification with a novel color threshold (CT)-twofold Siamese network architecture using arm’s androgenic hair patterns.

  • Created a database with images of person’s hand for person identification collected from Indian subjects.

Rest of the paper is organized as follows. The next section discusses the literature work on existing methods of person identification. The third section describes the proposed methodology. The fourth section discusses the experimental results, and finally, the last section concludes the paper with future directions.

Literature survey

Person re-identification from the arm’s androgenic hair comes under the closed world person re-id. Here, a single modality is used with bounding boxes, sufficient and correct annotated data exists, and the query exists in the gallery. The three standard components in a closed world re-id system are feature representation learning, deep metric learning, and ranking optimization [

Fig. 1
figure 1

Sample right hand raw image of the created database

Fig. 2
figure 2

Sample left hand raw image of the created database

The collected high-resolution images are reduced to lower resolutions of around 244 \(\times \) 244 based on the preprocessing steps and the deep learning architecture used. Hence, this study’s range of image resolutions is between 40 and 12.5 dpi (dots per inch). After reducing the resolution of the images, we observed that some cropped image quality has drastically deteriorated. To address this issue, we divided the image into two or more parts (with the same person id), and hence the total number of images increased from 343 to 424. All the obtained 424 images do not contain any tattoos or external markings made on their hands.

The naming convention used for the collected images is shown in Fig. 3. The first three digits correspond to the subject, and hence this becomes the unique part of the image name w.r.t. each subject. The following two digits are either 00 or 11, representing right hand and left hand, respectively. The last two digits are the sequence numbers representing the sequence of images taken for each hand. The naming convention is used so that the training and validation process becomes smooth when we use functions like data generators while using deep learning architecture.

Fig. 3
figure 3

Naming convention used in the created database

Preprocessing

The database of criminals in forensic analysis is generally created in controlled environments. The created database contains images from the controlled environment as well, but the crime scene data comes from uncontrolled situations. Therefore, it has different angles, resolutions, illuminations, and so on. To make both the database and the deep learning architecture more robust, we have used data augmentation techniques and preprocessing.

Rotation range, height shift range, width shift range, zoom range, fill mode, horizontal flip, channel shift range, and zca whitening are the eight different data augmentations techniques used in this study. The corresponding augmentation values/parameters are 40, 0.2, 0.2, 0.2, nearest, true, 20 and true, respectively. The data augmentation techniques used are from the standard literature [13, 14, 36, 44]. We used all the data augmentation techniques given in the TensorFlow documentationFootnote 2 except the color space transformations. Color Space Transformations change the color of the hand, and it is not advisable to use for person re-ID as per the existing literature. Since it alters one of the unique features, the hand’s skin tone, it is not recommended for person re-ID identification. Regarding the values used in the data augmentation, we used the standard values from the literature and cross-verified them manually by the empirical study. The standard values perform the best even in person re-ID.

The proposed architecture uses both the color image as well as the thresholded image as input. The following steps were used to convert the color image to the thresholded image.

  • Step 1—GrayScaled Image:Footnote 3 The input color image is first converted to a grayscale image. The Sobel edge detector is used to smoothen the grayscale image.

  • Step 2—Black-hat transforms operation:Footnote 4 It is used in digital image processing and morphology to extract small elements and details from given images where all the objects which are white on a dark background are highlighted as shown in Fig. 4. The settings used in this study such as anchor, iterations, borderType, const and borderValue are set to Point\((-1,-1)\), 1, BORDER_CONSTANT, Scalar and morphologyDefaultBorderValue(), respectively.

  • Step 3—Binary thresholding:Footnote 5 It is used to get the thresholded image where the pixel value is set to 255 if it is greater than the threshold or considered zero.

Fig. 4
figure 4

Sample image snapshot for results of Preprocessing steps

Figure 4 shows a sample screenshot of all the preprocessing steps followed. The output images present in Fig. 4 is a sample output of a portion of a single input color image. After the preprocessing, all those images are stored with the same name as that of the input color image in a separate folder. Figure 5 shows the sample thresholded image of the subject 002.

Fig. 5
figure 5

Sample thresholded image {subject : 0020003} after the preprocessing

The input image given in preprocessing step is manually cropped for the hand part in the image. Though we did not send the complete picture for preprocessing, to understand the cropped and removed part for the image shown in Fig. 5, the uncropped color image of the same subject is given as an input for preprocessing, and the output is shown in Fig. 6. Here the parts which are not used for the computation are shown as cropped and unused. Only the part which contains the arm’s hair (the middle part in the image) is considered for the computation, which is also shown in Fig. 5.

Fig. 6
figure 6

Sample thresholded image {subject : 0020003} after the preprocessing for an uncropped color image as input

After performing preprocessing, we obtained another set of 424 thresholded images. We applied eight different data augmentation techniques on these 424 color or actual images and 424 thresholded images. A total number of 6784 ((424 (color) \(\times \) 8) + (424 (thresholded) \(\times \) 8) images were obtained from the color and thresholded images. The manual verification was performed using two different human observers to avoid bias (mostly in crop** and discarding unrelated areas, the similarity of two images after augmentation or thresholding, and discarding the distorted images after augmentation or after lowering image resolutions). And we calculated the inter-rater reliability using Cohen’s kappa, and we observed that both the human observers agree with a \(\kappa \) value of 0.96 (this \(\kappa \) value calculation includes all the steps whenever the human observers are used).

Table 2 Created database details

After data augmentation and thresholding, all the images were cross-verified manually again. The images that do not contain hair parts like after crop** in augmentation, some images contain the cropped part of the hand which is close to the wrist and did not contain much hair there, all such images were discarded. After discarding the images in this step, the total number of images obtained is 6500 (284 images are discarded in this step). The complete details of the created database are given in Table 2.

Fig. 7
figure 7

Complete flow of the proposed work