1 Introduction

Deterioration accumulation is inevitable during the life-cycle service of bridges subjected to harsh environments, and the failure of bridges will result in considerable losses of both human life and property. Monitoring the bridge condition and detecting their damages are essential to ensure their serviceability and safety. Traditionally, visual inspection conducted by experienced inspectors is the main method adopted for this mission (Xu and ** model among pixel size, shooting distance, and focal length, based on which the actual width of the cracks could be obtained. The results of the verification experiments showed that the recognition precision has achieved at 0.01 mm.

Counting the proportion of the pixels belonging to diseases in all pixels is a workable method to quantify damages like corrosion. Wang et al (2020) proposed a standardized structural health evaluation method and based on it to quantify the damages in the photos of a steel box girder, which were synthesized into panoramas by image stitching technology, and a U-Net was employed to segment the diseases in it. For bolt losing quantification, the Hough line transform -based image processing algorithm was designed to estimate the bolt angles according to the bolt images cropped by R-CNN (Huynh et al 2019). Huynh (2021) designed an autonomous vision-based bolt-looseness detection method with a Faster R-CNN-based bolt detector, an automatic distortion corrector, an adaptive bolt-angle estimator, and a bolt-looseness classifier. Then, the method was applied in a realistic joint of the Dragon Bridge in Danang, Vietnam.

3.2 SHM-GL

3.2.1 Vibration monitoring

Apart from visible damages, vision-based methods are also efficient ways to provide vibration signals to identify invisible damages. Deng et al (2020c) developed an intelligent non-contact remote sensing method in which a uniaxial automatic cruise acquisition device was designed to collect image sequences from bridge surface before they were inputted into a three-dimensional (3D) CNN to identify the envelope spectrum of the holographic deformation. Then, the deflection curvature difference was used to identify the change of damage location and degree. Their experiments demonstrated that the holographic deformation is higher sensitive in damage identification than the limited number of measuring points.

Furthermore, cable forces estimation of urban bridges, according to the drone-captured video, has been realized by Zhang et al (2021). Firstly, a pre-trained FCN was adopted to identify bridge cables and further extract their displacement. Then, EMD was employed for extracting cable vibration signals and eliminating the effect of drone motion. Finally, natural frequencies of the cables were obtained by performing Fourier analysis on extracted cable vibration and further adopted for cable force estimation.

In traditional vision-based vibration measurement methods, template matching algorithm and corner detection algorithm are usually used to track and locate the target, but they are sensitive to the quality of images, which often is poor due to insufficient illumination or fog. Xu et al (2021) thus proposed a distraction-free displacement measurement approach by integrating DL-based Siamese tracker with correlation-based template matching. The DL-based Siamese tracker applied deep feature representations and learned similarity measures for image matching and also considered adaptive template updates with time. The method was then implemented on a short-span footbridge and a long-span road bridge, where its potential to handle challenging scenarios including illumination changes, background variations, and shade effects, was demonstrated. Shao et al (2021) combined the MagicPoint network and the SuperGlue network to achieve target-free full-field 3D vibration displacement measurement and demonstrated the combination’s accuracy compared with traditional sensors, while the combination is more cost effective. Furthermore, they (Shao et al 2022) employed a phased-based video motion magnification algorithm to achieve a higher accuracy of tiny vibrations at the submillimeter level.

3.2.2 Component identification

After various damages are detected, the rating of a structure needs to be provided by a comprehensive assessment in which importance of different components should be considered (Zhu et al 2010). This requires spatially relating identified damages with structural elements. However, inspection images, especially captured by aerial inspection platforms, usually contain complex scenes, wherein structural elements mix with a cluttered background. Extracting structural elements from complex images and sorting them is thus meaningful for SHM.

With a small dataset labeled by inspectors, Karim et al (2021) transferred a Mask R-CNN to segment multi-class bridge components from the videos captured by an UAV. False negatives were recovered by the temporal coherence analysis and a semi-supervised self-training method was developed to engage experienced inspectors in refining the network. The model’s performance reached 91.8% precision, 93.6% recall, and 92.7% F1-score.

Point clouds in 3D space can also provide sufficient information for this purpose. Kim et al (2020) extracted a high-resolution set of point clouds from the full-scale bridge by subspace partition and employed PointNet to classify the points in each subspace. Kim and Kim (2020) compared the performance of three DL models, PointNet, PointCNN, and dynamic graph CNN (DGCNN), in the classification of a point cloud of the bridge components and found that the mean interval over the unit of DGCNN was 86.85, which is higher than the others (see Fig. 10).

Fig. 10
figure 10

Identification results of points clouds in the research of Kim and Kim (2020)

3.2.3 External load

Moving vehicles are one of the main sources of live loads on bridges, and gathering their information is essential for SHM. Bridge weigh-in-motion that exploits bridge components, e.g., decks, girders, and vertical stiffeners, as weighting scales, is the most frequently adopted solution for this purpose, and DL brings efficient solutions for some of its drawbacks.

Zhang et al (2019b) proposed a novel methodology for the mission, in which a Faster R-CNN transferred from ImageNet was employed to detect different types of vehicles frame by frame. Multiple objects tracking algorithm tracked vehicles among different frames and generated the information sequence about each vehicle’s coordinate, type, lane number, and frame number. Then, the image calibration method based on moving standard vehicles was developed to calculate the vehicle length and speed. After acquiring the parameters, the spatiotemporal information could be obtained by vehicle location and the hypothesis of constant speed (see Fig. 11).

Fig. 11
figure 11

The framework for obtaining the spatiotemporal information of vehicles by Zhang et al (2019b)

However, the weight of vehicles cannot be obtained using the method proposed by Zhang et al (2019b). Jian et al (2019) combined CV with the influence line theory to acquire the time-spatial distribution of the vehicle loads on bridges. YOLO V3 was used to identify vehicle positions, types, and axle numbers. Then, vehicle weight was calculated by combining the strain influence line calibrated by field tests and the strain time-history. However, since only three scenarios of vehicle distribution were taken into consideration, the method may face obstacles in complicated traffic scenarios. To overcome this problem, a least square-based identification method that can utilize the redundant strain data measured by a network of strain sensors was proposed to distinguish complicated traffic modes and reduced the recognition errors through solving the overdetermined inverse influence equations (Pathirage et al 2019).

An approach for obtaining spatiotemporal information of vehicles on bridges based on 3D bounding box reconstruction was also proposed by Zhu et al (2021), in which CNN and YOLO were used to detect vehicles and get their 2D bounding box. A 3D bounding box reconstruction method based on the relationship between 2D and 3D bounding box was then developed to get the size and position of vehicles, and the spatiotemporal information of the vehicle could be finally obtained by using multiple objects tracking algorithm.

4 Application of DL in real bridges

The capability of DL encourages the exploration of various approaches that are able to overcome the challenges in traditional SHM, but most of them were verified just in simulation or laboratory. It cannot be denied that more details, like the platform used to collect images and the programs with user interface, need to be taken into consideration for promoting the application of these methods in practice (Xu 2018). This section summarized some efforts devoted to dealing with important details and the systems with DL that have been applied in actual structures.

A framework for autonomous bridge inspection using a UAV was proposed and applied to the Pahtajokk Bridge by Mirzazade et al (2021). Planning the most efficient flight path that could cover the damaged field with the minimum number of images was the first step. Then, three CNN models, SegNet, Inception v3, and U-Net, were trained to conduct bridge component detection, damage area recognition, and crack segmentation, respectively. The third step was to generate a dense point cloud for the damaged areas via intelligent hierarchical dense structure from motion and align it to the overall point cloud for the construction of the digital model of the bridge. Finally, damages were quantified based on the global coordinates of the detected damages.

Kruachottikul et al (2021) described a DL-based visual defect inspection system for reinforced concrete bridges, which consisted of four components. A mobile phone that could take photos was the first part. The second part identified images with defects via a modified ResNet-50, and the defects was classified using another modified ResNet-50 in the third part. Finally, damage severity was quantified by an ANN in the last part. The system’s accuracy for defect detection, classification, and severity prediction were 90.4%, 81%, and 78%, respectively, which had been accepted by Thailand’s Department of Highways for practical use.

Jang et al (2021) developed a ring-type climbing robot system composed of multiple cameras, a climbing robot, and a control computer. The raw images captured under close-up scanning conditions were proposed through feature control-based image stitching, DL-based semantic segmentation, and Euclidean distance transform-based crack quantification algorithms, based on which a digital crack map of the target bridge pier could be established. The test results conducted on the Jang-Duck bridge in South Korea revealed that the method successfully evaluated cracks of the bridge pier with a precision of 90.92% and recall of 97.47%.

Considering the difficulty to approach some parts of bridges by workforce, such as the bottom of decks, He et al (2022) proposed a smart unmanned surface vessel (USV) system for damage detection (see Fig. 12). A novel anchor-free network, CenWholeNet, which focused on center points and holistic information, was proposed, and a parallel attention module was introduced into the model innovatively. For the platform, a USV system without the global positioning systems (GPS) navigation, supporting real-time transmission of lidar and video information was designed.

Fig. 12
figure 12

The system developed by He et al (2022)

Vehicle-assisted monitoring is a promising alternative for rapid and low-cost bridge health monitoring compared with instrumentation installed on bridges. Sarwar and Cantero (2021) developed an indirect bridge monitoring system, in which a DAE was trained by the vertical acceleration responses of a fleet of vehicles passing over a healthy bridge. Then, the Kullback-Leibler divergence between the measured and the reconstructed signals was used for damage detection and severity quantification.

Mobile devices such as smartphones can be not only a sensing platform but also a computing platform to conduct on-site damage detection. However, due to the limited computing resources of mobile devices, the size of the DNN needs to be reduced. Ye et al (2022) developed pruned crack recognition network by reducing DNN size via the pruning method and designed a DL-based crack detection program for smartphones. In order to conduct crack detection by Internet of Things (IoT) devices in real-time, Kim et al (2021) proposed OleNet by fine-tuning the hyperparameters of LeNet-5. Compared with other pretrained DL models, including VGG16, Inception, and ResNet, OleNet achieved the maximum accuracy of 99.8% in the minimum computation. Shrestha and Dang (2020) developed a program integrated with CNN to realize accurate and real-time bridge vibration classification according to the multi-channel time-series signals acquired by the built-in accelerometers of smart phones.

5 Conclusions

In this paper, the applications of DL models in SHM, particularly damage detection of bridges, have been summarized systematically. It is easy to find that the excellent capability of DL models in addressing obstacles in the traditional SHM methods of the bridges has been demonstrated by the applications not only in laboratories but also in real bridges. Each of the DL models promotes the realization of a more intelligent SHM. However, it cannot be denied that drawbacks exist in every method. Some of the challenges can be listed as follows:

  1. 1.

    Most of the current studies consider only one type of monitoring data in damage detection. If this type of monitoring data is anormal, the damage detection will fail no matter how good the damage detection method is.

  2. 2.

    Although several attempts have been conducted to realize the targets by unsupervised learning, most of the applications still rely on pre-defined damage scenarios and training data, which pose a considerable requirement of engineering experience and labor.

  3. 3.

    The conditions of laboratories, where the majority of methods were validated, are idealized. The robustness of DL models needs to be further enhanced to combat environmental interference in practice, such as the vibration induced by external loads and motion blur when UAVs are employed.

  4. 4.

    The weak connection between the two levels of vision-based SHM results in difficulties in comprehensive condition assessment, for which visible defects and invisible damages need to be considered at the same time.

After considering the limitations listed above and recent achievements in DL, the following directions are promising and worthy to be further investigated:

  1. 1.

    Fusing multiple types of information collected by SHM system: With advances of multiple types of sensors, the SHM system can provide multiple types of structural information. Fusing and leveraging the multiple types of information in structural condition assessment via DL methods is a promising way to enhance the methods’ practicality.

  2. 2.

    Building larger training databases collected from the real world: Training DL models with the data containing actual interference is an efficient path to improve their robustness, and the availability of advanced sensors and UAVs nowadays makes it possible to build larger databases consisting of real samples.

  3. 3.

    Utilization of mobile and IoT devices: Mobile devices, such as smartphones, can be not only a sensing platform with various built-in sensors, including magnetometer, gyroscope, accelerometer, and GPS, but also a computing platform. Leveraging them by deploying lightweight DL models makes on-site damage detection available. In addition, the IoT devices, which emerge with the innovation in data transmission and cloud-based computation, provide an efficient way to obtain and integrate different types of structural data, which will prompt a cost-minimized and automatic SHM.

  4. 4.

    Digital twin: In order to make a reliable assessment reflecting the true condition of structural elements, an ensemble of multi-scale DL models is needed to interpret and integrate the data from both the local level and global level of SHM. Digital twin that tries to replicate physical entity in digital world (Lin et al 2021) provides a powerful platform for this mission, in which various damages can be reconstructed and evaluated at the same time. Integrating SHM and digital twin may be a promising way to realize the smart civil structure, even smart city.