Introduction

Thips are used in almost all logic electronic and electrical equipment. The soldering process perfectly combines the chip with the bare board to form functionally different integrated circuit boards1. The quality of the solder relates to the functional integrity of the entire board. Therefore, checking the quality of the solder is a key part of ensuring the quality and efficiency of IC production. Hidden defects inside the chip solder joints are one of the biggest hidden dangers of chip solder quality. With the development of non-destructive testing technology, the use of X-rays to irradiate the solder joints with radioactivity can effectively observe the morphology of the different structures inside the solder joints, which makes it possible to expose defects such as bridging, false soldering, and voids in the solder joints2. Solder joint cavity phenomenon is one of the most common hidden defects in solder hidden defects, If the cavity area is too large, or the number of cavities will seriously affect the quality of the solder, so it is very important to accurately detect the cavity.

The use of X-ray technology to inspect welded joints for hidden defects has gained general acceptance, and some researchers have made outstanding contributions to the advancement of this field. For example, Gao et al.3, through the pre-processing of X-ray images such as Gaussian filtering, contrast stretching, etc., the use of the Canny operator for edge detection to locate the edge region of the weld joint cavity, and finally the use of digital image processing algorithms to extract the edges of the bubble region inside the weld ball, to achieve high-quality detection results, although by taking some pre-processing methods for X-ray images and at the same time, applying the classical Although some pre-processing methods are adopted for the X-ray image and the classical Canny operator is also used to segment the hard-to-segment cavity edges, the fuzzy zones of the cavity edges are of different widths and shapes, which are difficult to be segmented accurately by the traditional image processing methods. ** method to increase the training dataset while balancing the ratio of background and foreground training samples. The comparison image before and after image crop** is shown in Fig. 4.

Figure 4
figure 4

Crop** cleaning process of the original image.

Observing Fig. 4, it can be seen that, firstly, the edge-side non-chip-based X-ray images in the big picture are cropped out; then, the remaining parts are equally divided into 6 parts, and if there is no bubble pixel in the cropped image, it is discarded; finally, the ratio of bubble pixels to the cropped image is calculated, and the images with a ratio of less than 5% are discarded. After the above operations, 4128 trainable images are finally obtained.

Experimental setup and model training

This paper mentions that the training and evaluation of all the models were carried out on the open-source toolkit PyTorch via the Pycharm platform on a computer with a 12th Gen Intel® Core™ i5-12600KF 3.70 GHz CPU and 16 GB of installed RAM. They are using an NVIDIA GeForce RTX on Windows 10 3070 GPU (with 8 GB of RAM) to run all the semantic segmentation models involved in the experiments. After non-stop exploratory experiments to summarise the important hyperparameters in the semantic segmentation model, the model achieves stable and reliable performance when the model parameters are set as in Table 1.

Table 1 Experimental setup.

During model training, the number of images processed at one time (Batch Size) is set to 4, the optimizer selects SGD and sets the initial learning rate to 0.007, the learning rate decreases by cosine "COS", and the weight decay rate is set to 0.0001, to avoid model overfitting. If the validation loss does not decrease for 3 epochs, the training is stopped. After many experiments, when the model is iterated for 100 generations, the loss value tends to converge successfully, so the number of training iterations is set to 100 to meet the requirements.

Experimental results and discussion

The experimental model and manufacturing engineering deployment schematic proposed in this paper are shown in Fig. 5.

Figure 5
figure 5

Chip void defect inspection complete process.

Three phases of data acquisition, model training, and engineering deployment of the mature model are included in Fig. 5. In the data acquisition phase, it is done by an industrial X-ray collector; in the training phase, the PCB-DeepLabV3 model is trained by feeding the model with artificially labeled chip X-ray images until the model converges; in the engineering deployment phase, the successfully trained model is deployed in a chip manufacturing industrial scenario to complete the automated quality inspection.

To verify the importance of the dataset to the deep learning model, the original data and the cropped and cleaned data are input into the DeeplabV3 model respectively and are trained under the same experimental platform and settings respectively, after the training is completed, the two mature models are obtained and evaluated for their performance. Semantic segmentation is pixel-level classification, and its common evaluation metrics are three metrics such as mIOU (mean intersection over union), MPA (mean pixel accuracy), and CPA (Class Pixel Accuracy), where MPA and CPA correspond to the average accuracy in the classification model (mAP) and Precision (Precision) in the classification model, Recall indicator is not a commonly used indicator in semantic segmentation model, but in this paper, to illustrate the completeness of the empty pixel segmentation, so the Recall indicator is chosen, and the test results are shown in Fig. 6.

Figure 6
figure 6

Impact of dataset on deep learning model performance.

The bar chart in Fig. 6 shows that the model trained on the cropped and cleaned dataset substantially outperforms the model trained on the original dataset in the four metrics of mIOU, MPA, CPA, and Recall, which shows the importance of crop** and cleaning the original dataset for deep learning models. The principle of this is explained as follows: firstly, the sum of pixels of the six small images after crop** is smaller than that of the complete large image, and the non-chip-based X-ray image on the edge side of the large image is discarded, which is not useful for training the segmentation of foreground region and exacerbates the proportion of foreground and background samples in the training; however, the serious imbalance in the proportion of the samples makes it easier for the model to ignore the classes with a lower proportion and thus make the model invalid; then, the original dataset is cropped and cleaned to make the model more effective. Then, images with no foreground pixels after crop** and images with a lower proportion of foreground pixels are discarded to further reduce the foreground-to-background sample ratio, and a more balanced class-to-sample ratio helps the model learn the features in the foreground, which is important for the model to learn the semantics of segmenting the pixels in the empty region.

The cleaned LED chip X-ray dataset is input into the DeeplabV3 model and trained according to the experimental platform prepared in Sect. “Dataset expansion and cleaning”. After the training is completed, the nine LED chip X-ray images are tested, and the test results are shown in Fig. 7.

Figure 7
figure 7

X-ray images of 9 kinds of LED chips in DeeplabV3 model test result display.

Observing Fig. 7, it can be seen that the DeeplabV3 model has a certain segmentation ability for 9 kinds of LED chip void defects. Due to the formation of fuzzy virtual edges between the voids and the soldered surface, and the low contrast between the FC-GB2 model chip voids and the base surface of the soldered joints, the above factors make it difficult to accurately segment the void defects of different LED chips. Through the actual testing of the DeeplabV3 model, it is found that most types of cavity segmentation are incomplete, especially in the difficult to segment FC-GB2 model chip cavity segmentation effect is not ideal, and model failure problems. By introducing the improved PCB-DeeplabV3 model, the model is trained in the same experimental environment and the same samples are used to test the model performance. The test results are shown in Fig. 8.

Figure 8
figure 8

X-ray images of 9 kinds of LED chips in PCB-DeeplabV3 model test result display.

Figure 8 the only variable of the experiment is the semantic segmentation model, and observation of Fig. 8 reveals that the segmentation performance of the improved PCB-DeeplabV3 model is significantly improved, and the segmented void area region is larger than that segmented by the DeeplabV3 model for most of the chip types, and the void with smaller area is also detected, especially in the case of FC-GB2 model chip with low contrast, which can still be successfully segmented, indicating that the proposed AMTPNet network helps the DeeplabV3 model to improve the segmentation performance for hard-to-segment voxels and low-contrast pixels. The next experiment compares the performance of the improved PCB-DeeplabV3 model with DeeplabV3, U-net27, SegNet28, and PSPNet29, so that it is easier to choose the appropriate model for deployment in real engineering applications. In this part of the experiment, the VTF2 model chip is selected as the basic experimental object, and concerning the manually labeled images, the comparison of the segmentation performance of different models in the most critical void defect detail part is shown in Fig. 9.

Figure 9
figure 9

Comparison of local details of typical bubble defect segmentation results by different methods.

In Fig. 9, when the detection model is VTF2, the area segmented by Deeplabv3, U-net, SegNet, PSPNet and the method in this paper is compared with the manually labelled area, and the changes in the detail part are observed to see the better detection capability of the PCB-DeeplabV3 detection results compared to the above models, which is mainly reflected in the increase in the segmented area and area Coherence, for example, Deeplabv3, U-net model lost a lot of empty defect areas, SegNet model appeared serious incoherence phenomenon, PSPNet model, although the detection of the bubble area greatly increased, but has exceeded the actual bubble defect area, segmentation of excessive, the method in this paper can basically accurately detect the bubble defects, and difficult to segmentation of the virtual edge of the spatial structure of a better spatial deconstruction ability, which also indicates that the model in this paper can handle better in ambiguous edge details and has better performance in resolving edge pixels.

To verify the performance impact of the channel attention mechanism (CAM), the two-space pyramid pooling network (TSP), and their fused network structure (AMTPNet) on the semantic segmentation model, the following ablation experiments provide a point-by-point comparison of the improved parts by the control variable method. This experiment uses the dataset enhanced by the crop** and cleaning method to train the model, using Deeplabv3 as the baseline model. Model a is the channel attention mechanism alone; model b is the two-space pyramid pooling network alone; and model c is the fusion network structure that considers both, and the results of the ablation experiments are shown in Table 2.

Table 2 Effect of different modules on model performance.

In this experiment, the void is used as a positive sample, and all the background except the void is used as a negative sample, separating the positive sample void from the negative sample background pixels, which is essentially a binary classification problem. Observing the mPA, mIOU, CPA, and Recall metrics data for models a, b, and c in Table 2, the addition of the CAM and TSP modules to the baseline model DeeplabV3 improves all metric values to a certain extent, indicates that the individual modules all play an active role in the model; when using the AMTPNet structure, the mIOU and mPA are once again improved, It shows that the channel attention mechanism (CAM) and the two-space pyramid pooling network (TSP) have good combinability and promote each other for better performance. For binary classification problems, especially when the model focuses on the party that is in the minority in the binary classification, the accuracy (mPA) loses its significance in judging, and the more commonly used evaluation metrics are the precision (CPA) and the recall (Recall). The mIOU, CPA, and Recall evaluation metrics in Table 2 are highly variable, and the CPA metrics improve more substantially when the baseline model is used with different models a, b, and c. This suggests that the Channel Attention Mechanism (CAM) and the Twin Spatial Pyramid Pooling Network (TSP) practically improve the accuracy of the model in segmenting pixels representing voids and that the combination of the two, the AMTPNet structure fusion performance is better, and the improvement of Recall metrics also proves the positive effect of models a, b, and c on the overall network performance improvement.

To verify that this paper's model also has advantages in the post-deployment inference process, this paper's method is compared with other major classical models, and two indexes, mIOU and FPS, are used as evaluation criteria. Where mIOU refers to the average of the intersection and merge ratio of each class in the dataset, and the intersection and merge ratio is the ratio of the intersection and merge of the true label and the predicted value of the class, the larger the ratio, the more accurate the segmentation is; FPS refers to the number of frames transmitted per second, which can measure the model inference speed, and the larger the value is, the higher the speed of the model inference is. Table 3 records the mIOU and FPS values for U-net, SegNet, PSPNet, DeeplabV3, and the method in this paper under the same experimental conditions.

Table 3 Values of mIOU and FPS for different models.

As can be seen from Table 3, the method proposed in this study outperforms other methods in both mIOU and FPS metrics. Among them, the average intersection and merger ratio reaches 81.69%, and the model inference speed reaches 78.81 FPS, which indicates that the AMTPNet network improves the accuracy of model segmentation, especially on the fuzzy edge structure that is more difficult to segment, and at the same time, the backbone feature extraction part adopts the lightweight MobileNetV2 network, which accelerates the model inference speed and improves the performance of the model for practical deployment applications.

Conclusion

In this study, based on the Deeplabv3 network model, MobileNetV2 is introduced to optimize the network parameters so that the model is reduced in size and easy to deploy, and based on this, attentional multi-scale two-space pyramid pooling network (AMTPNet) is embedded into the low-level features, which facilitates the fusion of the edge details and extraction of the model. In the process of model training, the data set is enlarged and the proportion of positive and negative samples is reduced by crop** and cleaning the original data so that the model can be adequately trained in a small data set while eliminating the negative impact of the imbalance of the proportion of positive and negative samples. Through the above series of experiments and analyses, and comparing the improved PCB-Deeplabv3 with the current excellent U-net, SegNet, PSPNet, and DeeplabV3 classical semantic segmentation models, the following conclusions can be drawn:

  1. (a)

    When training the deep learning model, the image crop** and cleaning method designed in this paper greatly enhances the training dataset so that the model is fully trained to fit the target successfully, and there is a substantial improvement in all 4 metrics of mIOU, MPA, CPA, and Recall.

  2. (b)

    The AMTPNet network proposed in this study has a better effect on the extraction of detailed information that is difficult to segment at the edge, has a better generalization ability in identifying X-ray images of different chip models, and successfully detects void defects.

  3. (c)

    The PCB- Deeplabv3 model proposed in this paper has better segmentation performance compared with Unet, SegNet, PSPNet, DeeplabV3 model, especially in the blurred edges and low-contrast LED chip X-ray void defect images, the PCB- Deeplabv3 model proposed in this paper has better performance, and reaches 81.69% average accuracy in the mIOU and FPS indexes reaching 81.69% average accuracy and 78.81 frame rate, respectively, surpassing the performance of other classical models.