Introduction

With the continuous development of the power industry, the scale and complexity of power equipment has increased. This trend has led to increased demands on the safety and stability of the equipment. Maintenance of substations’ power system such as hang grounding rod up to the high-voltage line for safety and stability testing, is extremely important. Traditionally, the grounding rod hanging and removing tasks are conducted by human, which poses certain safety risks. In order to improve both the safety of personnel and the efficiency of power maintenance, researchers have proposed some operation robots. Robots, which serve as intelligent solutions for high-voltage line maintenance, have the potential to replace manual operations. By integrating advanced robotic technologies, these systems effectively improve maintenance efficiency and ensure operator safety. However, power maintenance robots still face several technical challenges, including issues related to the level of automation and sensing capabilities. Japan was the first country to design and apply robots for power strip operations. In 1989, Kyushu Electric Power Company in Japan developed two generations of power strip robots: Phase I [1] and Phase II [2], which can be controlled by the operator in the cockpit to perform the operations, and by 2006, the Japanese power strip robots have been developed to the third generation called Phase III [3]. In 2010, Allan et al. [4] introduced a new robot prototype for the construction of hydroelectric overhead distribution lines in Quebec. The robot includes a climbing system, a cross arm operating system, and a bolt system. In 2013, the Hydro-Québec Research Institute in Canada integrated a Kinova Jaco manipulator into a power inspection robot vehicle [5], enabling the execution of internal disconnect switch operations through robotic manipulation. In 2015, in China, the Yunnan Power Grid Chuxiong Power Supply Bureau introduced a 500 kV high-altitude grounding wire hanging and removing robot. This robot, which designed to replace manual labor, is equipped with an electronically controlled operating handle for remote control of a three-coordinate robot during operations. In 2019, Tian et al. [6] proposed an automatic hanging grounding wire robots for the voltage level below 500 kV. The industrial computer and PLC (Programmable Logic Controller) system constitute the electrical control system of the live working robot in the substation. However, due to the need to complete multiple tasks, the mechanical structure of the robot is relatively complex. Therefore, it increases the difficulty of robot maintenance and has the drawback of low flexibility in operation. Palli et al. [7] studied a robot of automatized switchgear wiring in 2020, using a 4-degree-of-freedom robotic manipulator to complete the insertion and connection tasks of equipment wires inside the substation switchgear through visual positioning technology and end force feedback system. The wiring task of the switchgear requires high precision, which puts higher demands on the robot system. Furthermore, in 2022, an automatic ground wire hooking robot is designed by Qingyu Pan et al. [6], both using a type of crawler-type. A 3D camera is employed for target recognition to get the accurate 6D pose of target in this paper. A comparison of the robot designed in this paper with other related robots is shown in Table 1.

Table 1 Product comparison for electric power substation operation robots

Hand-eye calibration should be conducted when a camera is newly installed or has been contacted accidentally. The situations where the camera is accidentally moved or touched include instances where it is contacted by wild animals in outdoor operation, is contacted during the robot is transit from one substation to another substation, is contacted during packed robot up by canvas to prevent robot during nighttime or inclement weather, all the situations will lead to a loosened connection or deviation between the camera and its mechanical parts. The traditional hand-eye calibration relies on dedicated calibration boards, and there are three methods to solve hand-eye calibration: step-by-step method [10], synchronous method [11], and iterative method [12]. When a loosened connection or deviation occurs between the camera and its mechanical parts, typically, humans are required to intervene manually to perform the hand-eye calibration. This process is not fully autonomous and can disrupt the operation. However, if the robot could conduct hand-eye calibration on its own, it would achieve true autonomy and reach the stage of self-maintenance. Hence, in this paper, to enhance the autonomy of robots, we introduce and propose a hand-eye self-calibration system within the robot framework for the task of grounding rod hanging and removing.

The hand-eye self-calibration, also known as extended hand-eye calibration, was initially proposed by Andreff et al. [13]. These methods eliminate the need for a known scene or a calibration board and derive linearized computational formulas. Heller et al. [14] proposed utilizing branch-and-bound methods to minimize the objective function under epiploic line constraints. By employing linear programming to determine algorithm boundaries, this method achieves global optimality with respect to the \(L_{\infty }\) norm. Zhi et al. [15] proposed a hand-eye calibration algorithm that can be used in natural scenes without the need for other known calibration boards. The algorithm extracts image feature points, performs feature point matching, and finally obtains the transformation matrix using Singular Value Decomposition (SVD). However, this method has a longer computation time. Wang et al. [16] proposed a method for hand-eye calibration based on quaternion parameterization and eigenvector eigenvalues, offering a comprehensive analytical solution. This approach eliminates the need for Singular Value Decomposition (SVD) calculations, streamlining the computational process, but this method is only used when rotation and translation are partially separated and is not applicable in complex cases.

In order to improve the algorithm of hand-eye self-calibration and minimize errors in the calibration process for grounding rod hanging and removing robot, we propose a hand-eye self-calibration algorithm based on multi-stage objective function optimization on the basis of the method proposed by Zhi [15], and use the minimization of re-projection error method to further reduce the error. Experiments were conducted on the true value simulation dataset and the practical grounding rod hanging and removing robot platform respectively. The experimental results proved that the algorithm in this paper has better robustness and effectiveness.

The main contributions of this paper can be summarized as follows:

  1. (1)

    To address challenges like low autonomy and limited operating range of the substation operation robot, a grounding rod hanging and removing robot is designed by integrating various cutting-edge technologies to efficiently carry out specific tasks.

  2. (2)

    A multi-stage objective function optimization based hand-eye self-calibration algorithm is proposed in this paper, to address the connection displacement between the 3D camera and its mechanical parts and improve the efficiency of current calibration methods.

  3. (3)

    Our designed system and proposed algorithm are applied to dataset verification and practical robot verification, demonstrating the efficiency and effectiveness of our method.

Software and hardware structure of grounding rod hanging and removing robot

A. Hardware structure of grounding rod hanging and removing robot

Our designed hardware structure of grounding rod hanging and removing robot is shown in Fig. 1. In this system, the various components of the robot work together to finish specific task requirements. The system consists of an all-terrain mobile chassis that allows the robot to navigate different work areas and adapt to different environmental conditions. Above the mobile chassis is a floating base lifting platform, which raises the manipulator for aerial operations. The manipulator uses AUBO i10, which is installed on the floating base lifting platform, enabling to finish the execution of tasks like hanging and removing grounding rod. The end-effector is designed by ourselves which is located at the end of the manipulator. Meanwhile, equipped on the end joint of manipulator is a MECHMIND NANO 3D camera, which can capture the 6D pose of both the wire and the grounding rod handle. Additionally, the panoramic camera captures comprehensive scene information for environmental data acquisition.

Fig. 1
figure 1

Hardware of our designed robot

The grounding rod is the core executive component of the robot, which is designed according to the needs of the application scenarios. Unlike the simple hook-like rod introduced by Li et al. [17], our designed grounding rod is shown in Fig. 2. The main body of our designed grounding rod is made of aluminum alloy to ensure conductivity. The grounding rod needs to be gripped and tightened simultaneously, so it adopts an outer support rod combined with an inner transmission rod structure. The inner transmission rod can drive the clam** block to tighten the grounding point. Both the outer support rod and the inner transmission rod are made of insulating epoxy resin material to ensure insulation on the grounding side and the clam** side, effectively avoiding the risk of interference from residual charges during grounding on the clam** side and other parts such as the manipulator.

Fig. 2
figure 2

Structural diagram of the grounding rod

Combined with the grounding rod, we designed the end-effector of manipulator. The structural diagram of the end-effector is shown as Fig. 3. Figure 3a, b are the front and top views of the end effector, respectively. It features a double-ended T-screw drive incorporating optical shaft guidance. Additionally, a miniature brushless geared motor serves as the power source. Steering is achieved through the use of a spur gear to minimize the size of the clam** mechanism. The main body of end effector is made of aluminum alloy to ensure conductivity and minimize weight. The total weight of it is approximately 5 kg.

Fig. 3
figure 3

Structural diagram of the end-effector

B. Software structure of grounding rod hanging and removing robot

The software component of the grounding rod hanging and removing robot includes mobile base control module, visual recognition module and manipulator control module. The mobile base control module is responsible for moving the mobile chassis to the desired location near the operation area for hanging and removing the grounding rod, utilizing path planning techniques. It then controls the floating base to reach a suitable height that the manipulator can access the operation area. The visual recognition module is designed for pose estimation of the hanging or removing point of the ground rod. It begins with a 3D camera mounted at the end of the manipulator capturing RGB-D images of the scene. Subsequently, a LA-Net [18] based 3D recognition method is employed to extract the region of point cloud data corresponding to the area of interest for hanging or removing. Following this, the center of gravity and its normal direction are calculated to determine the pose of the hanging or removing point. For the manipulator control module, trajectory planning and inverse kinematics of the manipulator utilize input from the pose of the hanging or removing point to grasp or get close to the relative area. Subsequently, it executes commands to open or close the grounding rod as required (Fig. 4).

Fig. 4
figure 4

The software structure of our designed system

C. Overall process of grounding rod hanging and removing robot

The operational process of the grounding rod hanging and removing robot can be divided into two parts: hanging the ground rod and removing the ground rod. The operation processes for hanging and removing the ground rod are illustrated in Figs. 5 and 6, respectively. Before beginning any process, the robot system needs to obtain the transformation matrix from the camera coordinate system to the end-effector coordinate system. This allows the robot system to translate the recognized pose of the object into the robot base coordinate system, enabling precise control of the robot to reach the desired destination. In the process of hanging the ground rod, the sequence begins with controlling the mobile chassis to move to the specified position. Then, the manipulator grasps the grounding rod from the preset fixed position, moves to the predetermined operational position, and lifts the floating base to position the manipulator at the operating location where it can access the operation point. After that, the 3D camera is utilized to capture the target scene and compute the pose of the hanging point. This calculated pose is then converted into the robot base coordinate system based on the hand-eye calibration result. Subsequently, the robot autonomously plans its trajectory to hang the grounding rod at the target position. For the removal process, since the rod is already hanging on the high-voltage wire, unlike the hanging process where the target is to recognize the hanging point of the high-voltage wire, the recognition target here is the handle of the grounding rod. The task involves releasing the rod and then returning it to a preset position.

Fig. 5
figure 5

The flowchart for hanging grounding rod

Fig. 6
figure 6

The flowchart for removing grounding rod

Proposed hand-eye self-calibration algorithm

Before the automatic operation of grounding rod hanging and removing, it is essential to estimate the pose transformation relationship between the robot camera coordinate system and the end-effector coordinate system through hand-eye calibration. We propose a hand-eye self-calibration algorithm based on the optimization of a multi-stage objective function, which reformulates hand-eye self-calibration into an optimization problem with multiple stages according to the mathematical model of hand-eye calibration. The overall flowchart of the multi-stage objective function-based algorithm is shown in Fig. 7. In the pose acquisition stage, the Oriented FAST and Rotated BRIEF (ORB) algorithm [19] is employed for feature point extraction, and a domain scoring mechanism-based algorithm is introduced to discard points with matching errors. In the objective function solving stage, Singular value decomposition (SVD) is utilized, and an optimization method centered on minimizing reprojection error is introduced to refine the results.

Fig. 7
figure 7

The overall flowchart of algorithm based on multi-stage objective function

(1) Mathematical model

The traditional hand-eye calibration schematic is shown in Fig. 8. According to Fig. 8 the mathematical model of hand-eye calibration can be obtained as shown in Eq. (1):

$$ A_{ij}X = XB_{ij} $$
(1)

where \(A_{ij}\) denotes the pose transformation of the camera coordinate system from position i to j, \(B_{ij} \) denotes the pose transformation of the end of the manipulator from position i to j, and \(X\) denotes the solved hand-eye transformation.

Fig. 8
figure 8

The mathematical model of hand-eye calibration

In order to improve the problem of poor flexibility and accuracy of traditional hand-eye calibration, an algorithm based on multi-stage objective function optimization solution is proposed. This approach reformulates self-calibration as a multi-stage optimization problem. According to the schematic diagram of hand-eye calibration in Fig. 8, Eq. (1) is rewritten, and the unique solution is solved according to multiple “pairs of hand-eye sequences”, which is expressed in Eq. (2):

$$ \mathop {\arg \min }\limits_{X} C\left( {A_{ij},B_{ij},X} \right) $$
(2)

where C represents the minimization objective function, and X denotes the hand-eye transformation matrix. The structural nature of the manipulator determines \(B_{ij}\), while \(A_{ij}\) is obtained through the extraction and matching of scene feature points. To minimize errors and enhance accuracy, a matching point judgment mechanism is introduced after the initial matching of feature points. With the application of this mechanism, Eq. (2) can be reformulated as follows in Eq. (3):

$$ \mathop {\arg \min }\limits_{X} C\left( {f_{i},f_{j},S_{ij},B_{ij},X} \right) $$
(3)

where \(f_{i}\) denotes the description of the feature points in the ith view, \(f_{j}\) denotes the description of the feature points in the jth view, and \(S_{ij}\) denotes the matching score between the ith view and the corresponding matching points in the jth view. We optimize the objective function to solve the hand-eye transformation matrix X to reduce the error.

(2) Feature extraction

In the feature extraction module, commonly used methods include Scale Invariant Feature Transform (SIFT), Oriented FAST and Rotated BRIEF (ORB) [19], and Accelerated-KAZE (AKAZE). Among these methods, ORB stands out as the fastest, combining FAST (Features from Accelerated Segment Test) for fast feature extraction and BRIEF (Binary Robust Independent Elementary Features) for binary descriptor, providing a quick and robust feature extraction algorithm with both scale and rotation invariance. However, it tends to concentrate feature points in regions of high texture complexity, leading to inaccuracies in bit pose estimation. To address this, we adopt the quadtree structure from ORB-SLAM [20] when store FAST corner points. The BRIEF descriptor vector is defined as shown in Eqs. (4) and (5):

$$ f_{n}\left( p \right) = \sum\limits_{1 \le i \le n} 2^{i - 1} \tau \left( {p;x_{i},y_{i}} \right) $$
(4)
$$ \tau \left( {p;x_{i},y_{i}} \right) = \left\{ \begin{gathered} 1, \ldots p\left( x \right) < p\left( y \right) \hfill \\ 0, \ldots p\left( x \right) \ge p\left( y \right) \hfill \\ \end{gathered} \right. $$
(5)

where \(p\left( x \right)\) represents the pixel intensity in the image sampling region \(p\),\(\tau\) denotes the Gaussian distribution at the center of the image block, and \(n\) denotes the dimension of the pixel point. The lack of rotational invariance of the BRIEF descriptor in the face of image rotations leads to degradation of the matching performance. To overcome this problem, ORB employs the orientation information of the previous key points to rotate the image block. In order to generate more discriminative descriptors, ORB explores all possible \(\tau\) values through greedy search and filters out \(\tau\) values with high variance, superior mean, and independence of each other to construct the final description vector. This ensures that the features possess rotational invariance. This improvement makes ORB more robust in dealing with image rotation. The final \(f\) in Eq. (3) is obtained through this feature extraction method.

(3) Feature matching

In SIFT, ORB, and other methods, after completing feature extraction, the subsequent step typically involves violent matching, where the relationship between descriptors is calculated and the quality of the match between two key points is assessed. However, it is difficult to judge the correct match and the incorrect match in violent matching. Currently, many algorithms utilize the RANdom Sampling And Consensus (RANSAC) algorithm, but the wrong match needs to be eliminated in advance. This not only leads to computational challenges due to a large number of iterations but also poses stability issues.

Considering that image pixel points exhibit motion smoothness, the majority of matches within the correct matching points domain are usually accurate, whereas most of the matches within the incorrect matching points domain are inaccurate. Leveraging this characteristic, a domain-based scoring mechanism is proposed following ORB coarse matching [21]. This mechanism aims to reduce incorrect matches and enhance correct matches by assigning scores to the matching points and their respective domains.

The matching model is shown in Fig. 9. Following the extraction of feature points in two images \(I_{a}\) and \(I_{b}\), violent matching is executed. For a matching point \(x_{i}\), small domains a and b are identified in images \(I_{a}\) and \(I_{b}\), respectively. Simultaneously, a scoring method \(S_{i}\) is introduced. If \(x_{i}\) is matched accurately, the count of matched points in the small domains a and b is higher. Conversely, if is \(x_{j}\) matched inaccurately, the corresponding small neighborhoods have more incorrectly matched points. As shown in Fig. 9, the number of correctly matched points in the domain corresponding to matching \(x_{i}\) is 3, so \(S_{i}\) = 3. In contrast, the number of correctly matched points in the domain corresponding to matching \(x_{j}\) is 0, yielding \(S_{j}\) = 0.

Fig. 9
figure 9

The illustration of matching model

According to the above principle, the image is partitioned into small grids, and the matching of domains for each grid is conducted. The expression for the matching score \(S_{ij}\) is outlined in Eq. (6):

$$ S_{ij} \sim \left\{ {_{{{\kern 1pt} B{\kern 1pt} {\kern 1pt} \left( {Kn{\kern 1pt} ,{\kern 1pt} p_{f} } \right),\quad x_{ij} = fals{\kern 1pt} e}}^{{{\kern 1pt} B{\kern 1pt} {\kern 1pt} \left( {Kn{\kern 1pt} ,{\kern 1pt} p{\kern 1pt}_{t} } \right),\quad {\kern 1pt} x_{ij} = true}} } \right. $$
(6)

where \(x_{ij}\) represents the matching points in the grid, K is the number of grids surrounding the matching points, n is the number of matching points in the domain, \(p_{t}\) is the probability of correct matching points, and \(p_{f}\) is the probability of incorrect matching points. Figure 10 shows the distribution of matching scores, indicating that the matching scores follow a binomial distribution. Accordingly, the mean and variance of the matching scores, denoted as \(S_{ij}\), are introduced as shown in Eq. (7). By setting thresholds between correct and incorrect matching and introducing the parameter P, the quantification of matching scores \(S_{ij}\) is expressed in Eq. (8):

$$ \left\{ \begin{gathered} \left\{ {m_{{{\kern 1pt} t}} = Knp_{t} ,s_{t} = \sqrt {Kn{\kern 1pt} t\left( {1 - p_{t} } \right)} } \right\},x_{ij} = true \hfill \\ \left\{ {m_{{{\kern 1pt} f}} = Knp_{f} ,s_{f} = \sqrt {Knp_{f} \left( {1 - p_{f} } \right)} } \right\},x_{ij} = false \hfill \\ \end{gathered} \right. $$
(7)
$$ P = \frac{{m_{t} - m_{f} }}{{s_{t} + s_{f} }} = \frac{{Knp_{t} - Knp_{f} }}{{\sqrt {Knp_{t} \left( {1 - p_{t} } \right)} + \sqrt {Knp_{f} \left( {1 - p_{f} } \right)} }} $$
(8)

where \({m}_{t}\) and \({s}_{t}\) denote the mean and variance of correct matches, and \(m_{f}\) and \(s_{f}\) denote the mean and variance of incorrect matches, respectively. P is positively related to the number of surrounding grids and the number of regional feature point matches. The correct matches and incorrect matches can be distinguished more obviously by maximizing the P value. According to Eq. (9), determine whether the matching points are correct or not:

$$ P\left\{ {i,j} \right\}\, \in \,\left\{ {_{{{\kern 1pt} false,\quad Others}}^{{{\kern 1pt} true,\quad {\kern 1pt} S_{ij} \ge \tau }} } \right. $$
(9)

where \(\tau\) represents the threshold for correct and incorrect matches, and the value of \(\tau\) can be determined through the total number of features within the grid. This method can reduce incorrect matches and obtain more correct matches.

Fig. 10
figure 10

Distribution map of matching scores

(4) Hand-eye transformation matrix solution

Through the extraction and matching of scene feature points, the pose transformations between images can be obtained. Thus, the relative pose \(a_{ij}\) of cameras at different poses is obtained. Next, by combining the corresponding relative poses of the end effector to form a sequence pair and performing singular value decomposition, we can obtain the initial hand eye transformation matrix. The process of solving the hand eye transformation matrix X is as follows:

First, the translational part of the hand-eye calibration is separated from Eq. (1) and expressed in an oblique symmetric matrix, resulting in the result shown in Eq. (10):

$$ \hat{t}_{{A_{ij} }} R_{{A_{ij} }} t_{X} = \hat{t}_{{A_{ij} }} R_{X} t_{{B_{ij} }} + \hat{t}_{{A_{ij} }} t_{X} $$
(10)

where \(\hat{t}_{{A_{ij} }}\) denotes the oblique symmetric matrix of the translational part of \(A_{ij}\), \(R_{{A_{ij} }}\) denotes the rotational part of \(A_{ij}\), \(t_{X}\) denotes the translational part of the desired hand-and-eye transformation X, \(R_{X}\) denotes the rotational part of the hand-and-eye transformation X, and \(t_{{B_{ij} }}\) denotes the translational part of \(B_{ij}\).

Then, Eq. (11) is utilized to solve

$$ vec(XYZ) = (X \otimes Z^{T} )vec(Y) $$
(11)

where X, Y and Z denote spatial matrices, \(\otimes\) denote the Kronecker product, and \(vec\) means an operation to vectorize a matrix.

Next, the rotational and translational parts of the hand-eye calibration equation are expressed in Eq. (12) using Eq. (11):

$$ \underbrace {{\left[ \begin{gathered} I_{9} - R_{{A_{ij} }} \otimes R_{{B_{ij} }} \quad \quad 0_{9 \times 3} \hfill \\ \quad \hat{t}_{{A_{ij} }} \otimes t_{{B_{ij} }}^{T} \quad \quad \hat{t}_{{A_{ij} }} \left( {I_{3} - R_{{A_{ij} }} } \right) \hfill \\ \end{gathered} \right]}}_{N}\left[ \begin{gathered} vec\left( {R_{x} } \right) \hfill \\ \quad t_{x} \hfill \\ \end{gathered} \right] = \left[ \begin{gathered} 0_{9 \times 1} \hfill \\ 0_{3 \times 1} \hfill \\ \end{gathered} \right] $$
(12)

Equation (12) has a total of 12 unknowns. The \(12 \times 12\) matrix in the left equation is named N. Therefore, for i movements the following \(12i \times 12\) matrix T can be constructed as shown in Eq. (13).

$$ T = \left( {N_{1}^{T} N_{2}^{T} \cdots N_{i}^{T} } \right)^{T} $$
(13)

After that, the singular value decomposition of the matrix T in Eq. (13) is performed. And the rotation matrix \(R_{x}\) satisfies the determinant equal to 1, so the unique solution x can be solved.

(5) Optimization results

After obtaining the hand eye transformation matrix, due to the fact that the camera coordinate system and the end-effector coordinate system of the manipulator are not on the same scale, an optimization method is proposed to enhance the accuracy of the results. This optimization is centered around minimizing reprojection errors, taking into account the overall structure of the projection camera system. When the base coordinate system of the manipulator coincides with the world coordinate system, the transformation \(A_{j}\) from the camera to the base coordinate system is represented by the transformation \(B_{j}\) from the end of the manipulator to the base coordinate system and the hand eye transformation matrix X, as shown in Eq. (14). The objective function C is minimized, as illustrated in Eq. (15):

$$ A_{j} = B_{j} X^{ - 1} $$
(14)
$$ \mathop {\arg \min \;C = }\limits_{X} \mathop {\arg \min }\limits_{X} \sum\limits_{i = 1}^{m} {\sum\limits_{{{\kern 1pt} j = 1}}^{{{\kern 1pt} n}} {v_{{{\kern 1pt} i{\kern 1pt} j}} } } \left\| {P_{{{\kern 1pt} C_{{i{\kern 1pt} j}} }} - \left( {XB_{{{\kern 1pt} j}}^{ - 1} P_{{{\kern 1pt} W_{{i{\kern 1pt} j}} }} } \right)} \right\|_{2}^{2} $$
(15)

where \(m\) denotes the number of feature points, \(n\) denotes the number of views, and \(v_{{{\kern 1pt} i{\kern 1pt} j}}\) denotes whether the ith feature point can be observed in the jth frame view. \(P_{{{\kern 1pt} C_{{i{\kern 1pt} j}} }}\) denotes the coordinates of the ith feature point in the jth frame view under the camera coordinate system, which can be obtained by the camera. \(P_{{{\kern 1pt} W_{{i{\kern 1pt} j}} }}\) denotes the coordinates of the ith feature point in the jth frame view under the world coordinate system, which can be obtained by the triangulation to solve. Finally, it is solved by LM (Levenberg–Marquardt) optimization algorithm.

Experiments

Two experiments are conducted for validate the effectiveness and performance of the algorithm. One on a single-camera dataset and the other on an automated grounding rod hanging and removing robot platform. The method mentioned in the literature [15] is called “ZHI” and the method proposed in this paper is called "OURS".

Dataset verification

In this paper, we conduct single-camera dataset experiments with images captured by the Mech-Eye Nano 3D camera, with an image resolution of 1280 × 1024 pixels. The experimental setup involved a scene incorporating a chessboard (calibration board). The checkerboard grid was calibrated to a size of 20 × 20 mm for each grid. The camera pose can be determined from the checkerboard treated as the “hand”, and feature points can be extracted and matched to obtain the camera pose while ignoring the checkerboard grid, treating it as the “eye”. In order to evaluate the effectiveness and robustness of the algorithm, the paradigm of the difference between the calibration result and the true value obtained by the algorithm is used as the calibration error. The rotational error is shown in Eq. (16) and the translational error is shown in Eq. (17).

$$ e_{R} = \left\| {\widetilde{Rx} - I3} \right\|_{2} $$
(16)
$$ e_{t} = \left\| {\widetilde{t}} \right\|_{2} ,\widetilde{t} = \left( {\widetilde{x},\widetilde{y},\widetilde{z}} \right)^{T} $$
(17)

where \(\widetilde{{R_{x} }}\) is the rotated portion of the hand-eye transformation matrix, \(I_{3}\) is the third-order unitary matrix. Rotation error it is defined by calculating the difference between two rotation matrices, which describe the rotation of an object in three-dimensional space. When calculating the rotation error in Eq. (16), we used the Euclidean norm, which is an unitless measure that represents the distance between two vectors in a vector space. \(\widetilde{t} = \left( {\widetilde{x},\widetilde{y},\widetilde{z}} \right)^{T}\) is the translational portion of the hand-eye transformation matrix.

Before conducting the experiments on the single camera dataset, the matching effect of the feature matching method based on scoring mechanism proposed in this paper is firstly compared with ORB + BF, SIFT + BF and AKAZE + RANSAC. The experiments utilize images captured by the Mech-Eye Nano 3D camera from Mech-mind, and the comparative results are illustrated in Fig. 11.

Fig. 11
figure 11

Feature extraction and matching

The comparison results in Fig. 11 demonstrate that the matching method proposed in this paper yields a higher number of matching points and exhibits a more uniform distribution compared to other methods. The detailed comparison of different feature matching algorithms is presented in Table 2. From the table, it is evident that the algorithm introduced in this paper enhances the internal point matching rate by more than 20% compared to other algorithms, demonstrating superior efficiency in terms of processing time. Hence, the proposed feature matching algorithm in this paper is validated to be more accurate.

Table 2 Matching statistics for different algorithms

To investigate the impact of noise on the error, Gaussian noise was introduced to the images with a mean of 0 and a standard deviation varying from 0.05 to 0.3 in increments of 0.05. A total of 100 times experiments were conducted for each noise level, with each experiment consisting of 18 hand-eye motions. The resulting Gaussian image is depicted in Fig. 12b. The box-and-line plots in Fig. 13a show the translation error for different noise levels, while Fig. 13b illustrates the box-and-line plot of rotation error. The figures reveal that the error generated by this method is smaller than that of “ZHI”, and the distribution is more uniform and exhibits greater resistance to noise.

Fig. 12
figure 12

Single camera experimental scene

Fig. 13
figure 13

Comparison of method errors under Gaussian noise

To assess the impact of different numbers of movements on the solution error, 12–18 images are captured from the original set, and the corresponding poses of the manipulator are determined. Subsequently, 100 experiments are conducted, and the results are presented in Fig. 14a for translation error and Fig. 14b for rotation error. Observing the figures, it becomes evident that, when the number of movements is less than 16, both the “ZHI” method and our method exhibit significant fluctuations in translation and rotation errors. However, as the number of movements exceeds 16, the errors gradually stabilize. A comparison between the two methods reveals that the distribution of errors in the “ZHI” method is notably uneven and more dispersed. This unevenness in the “ZHI” method results from employing the Scale Invariant Feature Transform (SIFT) for extracting image feature points, leading to fewer matching feature points.

Fig. 14
figure 14

Effect of the number of movements on the error

To gain a clearer insight into the performance across different datasets, we conducted calibrations on two datasets, comparing our method with the “ZHI” method. Dataset 1 comprises images captured by the Mech-Eye Nano camera, while dataset 2 consists of images used in the “ZHI” method from literature [15], captured by the rear camera of a Samsung Galaxy phone.

Both datasets underwent 100 times experiments, and the average values were computed. The results are presented in Table 3. Notably, in dataset 1, our method demonstrated a 25.76% improvement in average translation error and a 17.34% improvement in average rotation error compared to the “ZHI” method. In dataset 2, our method exhibited a 25.33% increase in average translation error and a 24.56% increase in average rotation error compared to the “ZHI” method. This experimental comparison underscores the higher accuracy of the method proposed in this paper. Please note that the accuracy of hand-eye self-calibration algorithm could reach approximately 10–1 mm while do not rely on specific calibration boards or human intervention. Whereas the accuracy of traditional hand-eye calibration algorithms with calibration boards is approximately 10–2 mm.

Table 3 Calibration results with different evaluation indicators

Practical robot verification

The experimental scene is depicted in Fig. 15a. Within the substation, there is a 500 kV high-voltage wire. The task of our designed robot is to hang the grounding rod onto the wire for a relative test. After the test, our robot will remove the grounding rod and return it to the preset location. Figure 15b captures a moment during the hanging process, where the robot is preparing to attach the rod to the high-voltage wire. Figure 15c illustrates the removal process, wherein the robot utilizes a 3D camera to recognize the handle of the grounding rod, subsequently gras** it and removing the rod.

Fig. 15
figure 15

Scene and example of the automatic hanging and removing experiment

To evaluate the effectiveness of the hand-eye self-calibration algorithm proposed in this paper for the grounding rod hanging and removing robot system, a two-step validation process was conducted. Firstly, the algorithm was reassessed within the operational context of the validated robot system. Subsequently, the efficiency of our grounding rod hanging and removing robot system was independently validated.

A. Validation of the hand-eye self-calibration algorithm

To validate the efficacy of the hand-eye self-calibration algorithm proposed in this paper, experiments were conducted on an automated grounding rod hanging and removing robot platform. The experiment comprised three phases: the robot hand-eye self-calibration phase, the grounding rod hanging phase, and the grounding rod removing phase. In the robot hand-eye self-calibration phase, the relative pose transformations of the robot end-effector and the camera were determined. Subsequently, the grounding rod was suspended to the high-voltage power line in the grounding rod hanging phase, followed by the removal and restoration of the ground rod to its original position in the grounding rod removing phase.

For the new install camera, hand-eye calibration needs to be conducted in advance. During the operation, in order to simulate potential scenarios such as loosened connections or deviations between the camera and its mechanical parts due to accidental contact in real-world situations, the offset of the hand-eye connecting device is randomly slight touched through manual intervention. For the experimental type of no using self-calibration, we recorded the success rate of hanging and removing without utilizing the self-calibration algorithm. The success rate obtained without the self-calibration algorithm provides a baseline for comparison and allows us to assess the improvements achieved when the algorithm is implemented. The experimental procedure, as illustrated in Fig. 16. A total of fifty tests were conducted for each hanging and removing experiment respectively.

Fig. 16
figure 16

Validation process of the hand-eye self-calibration algorithm

The results of the experiments are presented in Table 4. Utilizing the self-calibration method proposed in this paper increased the success rate of hanging by 16% and removing by 24%. This improvement highlights the enhanced efficiency and success rate of the robot self-maintenance facilitated by our method.

Table 4 Experimental comparison before and after introducing self-calibration method

B. Experimental validation of automatic grounding rod hanging and removing robot

Before hanging up the grounding rod and removing it, the robot system needs to recognize the pose of the hanging point and the handle of grounding rod, respectively. We visualized the process of recognizing the pose of these elements in the robot system. The RGB image of the high-voltage wire and the grounding rod handle, along with the ROI point cloud, are shown in Figs. 17 and 18.

Fig. 17
figure 17

An example of the RGB image and its ROI point cloud of the hanging area in a high-voltage wire

Fig. 18
figure 18

An example of the RGB image and its ROI point cloud of the handle of grounding rod

In practical scenarios, the success rate of robot operations can be influenced by wind speed. Elevated wind speeds may induce swaying in both the floating platform and the high-voltage wire, potentially causing failures in the automatic hanging and removing operations. Through experimental validation, it was found that the success rate of automatic hanging and removing was higher in environments where the wind speed did not exceed 2 m/s. Consequently, we suggest that the designed robot should be used preferably in this situation. Our verification experiments were also conducted under the specific condition of a wind speed not exceeding 2 m/s.

Twenty times experiments of automated hanging and removing operations with practical robots were conducted, of which 19 times were successful, with a success rate of 95%. The duration for each hanging and removing operation is less than 5 min, satisfying the task requirements. The experimental failures can be attributed to significant errors in the recognition accuracy of the visual recognition system induced by varying light conditions. Therefore, we also conducted experiments related to varying light conditions. To assess the robustness of the robot’s automatic hanging and removing operations, the tests were conducted during different periods of sunlight (at sunny day in November of the Northern Hemisphere), while ensuring that wind speeds remained below 2 m/s. The detailed test results are presented in Table 5:

Table 5 Success rate and average time of automatic operation under different light conditions

As indicated in Table 5, hanging and removing failures occurred only between 6:00 p.m. and 7:00 p.m. All other experiments were successful, with completion times consistently within five minutes. The average time required for manual operations by workers is also around five minutes. This highlights the utility of the automated hanging and removing system proposed in this paper, showcasing its efficiency in saving time and mitigating safety concerns for workers.

Conclusion

In this paper, a grounding rod hanging and removing robot with hand-eye self-calibration capability is designed for substation operation. The hardware and software structure are introduced. In order to improve the accuracy and stability of the self-calibration method under the application scenario of robot autonomous operation, a robot hand-eye self-calibration algorithm based on multi-stage objective function optimization is proposed, which transforms the self-calibration problem into a problem of minimizing the objective function. The validation of the dataset and practical robot experiments has confirmed the superior performance of the self-calibration algorithm proposed in this paper. Furthermore, overall operations were verified under various lighting conditions and wind speeds, demonstrating the efficiency of our system. In future, we intend to incorporate more conditions and extend its applicability to a broader range of scenarios, and more cutting-edge technologies such as VR, AR, teleoperation etc.