Introduction

Auricularia auricula is a large edible fungus used both as medicine and food with high health value1,2,3,4. The rapid development of the Auricularia auricula industry greatly promotes its export and sales. The production and processing industries of Auricularia auricula have a positive role in promoting economic efficiency, which is of great significance to the development of forest economy and the increase of farmers’ income. In addition to the conventional features such as color, hue, shape, and size, Auricularia auricula exhibits other features as wrinkles, diseases, and damage, which directly impact its appearance quality and result in undesireable taste. Describing and quantifying these features accurately is challenging, and currently, only experienced workers can achieve reliable appearance quality classifiation outcomes. China is the world’s largest Auricularia auricula producer, accounting for more than 90% of the world’s output. In 2018, two adjacent provinces, Heilongjiang Province and Jilin Province, produced 476,000 tons of Auricularia auricula, accounting for 70.6% of the national total5. The planting area is extensive, but the key planting area is relatively concentrated, which has led to few studies on Auricularia auricula. As far as we know, there have been no studies on the appearance quality of Auricularia auricula.

Therefore, Auricularia auricula with different appearance qualities needs to be graded manually. At present, it is mainly done by manpower which is labor-consuming and inefficient. Due to the contingency, misjudgment, and discontinuity of manual grading, it is of practical significance to realize the automatic appearance quality classification of Auricularia auricula. In this way, we can also improve the classification consistency and accuracy, while reduce labor costs. After years of efforts by researchers, deep learning has made significant progress and applications in many fields6,7. In view of the superior performance of deep learning methods in image recognition and other fields8,9, it should also have good application value in the appearance quality classification of black fungus, and can effectively solve the core problems of machine vision classification of Auricularia auricula, and greatly improve the classification quality. However, to our knowledge, the appearance feature attributes of Auricularia auricula has not yet been studied from the perspective of machine vision, especially through deep learning methods. By making full use of shallow feature information to realize target detection, this paper tries to establish a multiscale feature fusion detection model based on Faster RCNN, which is called improved Faster RCNN, to fulfill the automatic classification of Auricularia auricula based on their appearance quality.

Target detection is an important research part of current machine vision, and how to improve its accuracy and speed is the current research focus10. The faster regional-based convolutional neural network (Faster RCNN) algorithm is a result of merging region proposal network (RPN) and Fast-RCNN algorithms into a single network. And it has good qualities in terms of accuracy and speed. Therefore, the algorithm has a large number of applications. Wan and Goudos applied Faster RCNN directly for multi-class fruit detection using a robotic vision system11. They found that the system achieved higher detecting accuracy and lower processing time than the traditional detectors. For detection and classification applications, various improvements to Faster RCNN have been proposed. Faster RCNN was improved with a deep reinforcement learning model and was used for intelligent video anomaly detection and classification12. By using high-resolution network as the backbone network, Faster RCNN was improved to detect hydroponic lettuce seedlings’ status13. By adversarial occlusion network, the Faster RCNN was improved to utilize in underwater target detection, and the increase of mAP is 2.6% compared with the standard Fater RCNN network14. However, Faster RCNN has not been used in detecting, grading, and classifying Auricularia auricula.

The research on quality evaluation of Auricularia auricula mostly focused on the study of quality components such as total saccharide content15, amino acids16, and volatile components17. And the evaluation methods were mainly based on electronic tongue and nose17, near-infrared technologies15, and acid hydrolysis16. However, there was no research on evaluating and classifying the appearance quality of Auricularia auricula by machine vision. To make this deep learning technology more suitable for the specific field of appearance quality evaluation of Auricularia auricula, we proposed a multiscale feature fusion detection model to improve the standard Faster RCNN.

In this study, 2000 dried Auricularia auricula of three classes are graded according to the national standard of appearance quality, and 6000 images of Auricularia auricula samples from 3 different perspectives are collected. To improve the quality classifying accuracy and real-time performance, an appearance quality classification method for Auricularia auricula is constructed based on an improved Faster RCNN framework. The improved Faster RCNN method is compared with the other 4 different algorithms. At the same time, the influence of complex conditions and image resolution on the Auricularia auricula detection is also explored. And some suggestions and methods are finally proposed to reduce these possible negative effects.

Materials and methods

Experimental data

Data collection

For data acquisition, the primary equipment was the Huawei Honor V30 Pro Android smartphone, which employed automatic white balance and optical focus settings to capture images of Auricularia auricula. The data acquisition platform for Auricularia auricula image data included a tripod to stabilize the camera and a bracket to position the lighting lamp and relevant accessories. These devices set up on a horizontal tabletop. Throughout the data collection process, the camera lens maintained a distance of 50 cm from the tabletop. The lighting was a ring LED light specifically designed for machine vision applications. Under vertical illumination conditions, the light was fixed on the tripod, and the camera lens was positioned directly above the central hollow area of the LED ring light. When capturing images of Auricularia auricula illuminated obliquely, the light was placed on the light bracket near the camera. By adjusting the height, angle, and color of the light on the bracket, Auricularia auricula image data under various complex lighting conditions could be obtained. During the photography process, a sheet of A4 paper was placed directly beneath the camera lens on the tabletop, and the classified Auricularia auricula samples were individually placed on the paper for image capture.

2000 dried Auricularia auricula of three classes, which were graded according to the national standard of appearance quality of dried Auricula auricula products GB/T 6192-201918, were selected as the experimental materials in this paper.

Following the standard of image collection, images of each Auricularia auricula sample were collected from the front, back, and side perspectives. Finally, 6000 images of Auricularia auricula samples from the three different perspectives were collected. This dataset has been made publicly available at https://github.com/liyang005/Auricularia_auricula-dateset. Because of the too large original image size of \(2736\times 2648\) pixels, the image size is uniformly adjusted to the size of \(800\times 600\) to facilitate our model training. Figure 1 shows three classes of dried Auricularia auricula products which were graded by GB/T 6192-201918.

Figure 1
figure 1

Three classes of dried Auricularia auricula products graded based on GB/T 6192-201918. (a) Graded as the 1st-level, (b) the 2nd-level, and (c) the 3rd-level.

Image labeling

Figure 2
figure 2

Schematic diagram of Auricularia auricula labeling. Framing the Auricularia auricula by a rectangular box.

In this study, the collected images were labeled, as depicted in Fig. 2. By framing the Auricularia auricula in each image, its corresponding coordinates were obtained, representing the upper left corner and the lower right corner of the rectangular box enclosing the Auricularia auricula. The labeling process followed the VOC

$$\begin{aligned} k = \lfloor k_0+\log _2\left( \sqrt{\text {wh}}/I\right) \rfloor \end{aligned}$$
(1)

\(\lfloor x \rfloor \), denoted the floor function of a real number x, is defined to be the greatest integer that is less than or equal to x. I is the size of the pre-training graph, \(k_0\) is the reference value (the level of the pre-training ROI), representing the output of layers, and \(\text {w}\) and \(\text {h}\) are the width and height of ROI.

Scores for the assessment of the performances of learning models

To evaluate object detection models used in this paper, the average precision (AP)30 was used, which involves the following definitions.

True positive (TP) is a given Auricularia auricula’s grade being correctly identified as its labeled grade. Precision represents how much of the target detected by the model is the real target object; Recall represents the proportion of all real targets detected by the model. AP is determined by precision and recall30.

Intersection over Union (IoU)31 is the overlap of the two regions occupying the total area of the two regions, which measures the degree of overlap of these regions.

The object detection result was divided into four situations by the relationship between the value of IoU and the confidence threshold, which are shown in Table 1. True is defined by the IoU value of a target box in the data set greater than 0.5. False is defined by the IoU value of all target boxes in the data set greater than 0.5. Positive is defined by the detected rectangular frame being greater than the confidence threshold. Negative is defined by the detected rectangular box being less than the confidence threshold. True positive (TP) is IoU greater than 0.5 and detected, true negative (TN) is IoU greater than 0.5 and not detected, false positive (FP) is a false target but the rectangular box score is greater than the confidence level, false negative (FN) is a false target and the rectangular box score is also less than the confidence level.

Table 1 Conceptual definition table of forecast results.

The precision in the precision–recall (P–R) curve can be calculated by formula (2),

$$\begin{aligned} \text {Precision}=\frac{\text {TP}}{\text {TP}+\text {FP}} \end{aligned}$$
(2)

The recall is shown in formula (3).

$$\begin{aligned} \text {Recall}=\frac{\text {TP}}{\text {TP}+\text {FN}} \end{aligned}$$
(3)

Training platform

The improved Faster RCNN Auricularia auricula classification model was run on an AMD R5 3600 central processor (CPU) and NVIDIA GeForce RTX 2070 graphics card (GPU) equipped with Windows10 64-bit operating system. Memory was of dual channel 16 GB 3200 MHz. The test code was written in Python 3.5.6 and the code editor was Visual Studio Code. The deep learning framework used is TensorFlow-GPU 1.13.2, and the development environments of the GPU parallel computing framework were CUDA10.0 and CUDN7.4.1.5.