Keywords

1 Introduction

Gastrointestinal diseases have long plagued human health with the incidence rising year by year, and gastric cancer is the third most common cause of cancer deaths worldwide [1]. Gastroscopy is a common surveillance and treatment method for gastrointestinal disorders. The incidence and mortality rates of diseases such as gastric ulcer, atrophic gastritis and gastric cancer, can be dramatically reduced by gastroscopic interventions [2, 3].

However, traditional gastroscopy has some limitations: first, the fish-eye effect resulting from the wide-angle lens of gastroscopy leads to a serious distortion of endoscopy images; second, the narrow vision of a gastroscope makes it difficult to locate and diagnose disease lesions; third, gastroscopy often requires reviews and invasive biopsy markers, which increase the sufferings of patients. In recent years, many researchers have made an attempt to solve those gastroscopy issues and improve the treatment experience by the methods of computer-aided diagnosis [4]. For example, using three-dimensional reconstruction techniques to provide three-dimensional vision for gastroscopy [5]; using endoscopic instruments with a spatial positioning device to achieve non-invasive biopsy markers [6], automatic tracking and detection of lesions [7].

In the above methods, image registration is a basic technology and it is an open issue in endoscopic environment. Zitova [8] summarized recent and classic image registration methods on both advantages and drawbacks. Mikolajczyk [9] and Fransisco [10] evaluated classical image registration methods, which are likely suitable for static scenes or periodic deformation scenes. Deformable registration method has been developed in recent years [11, 12], whereas it is difficult to apply to gastroscopy because of the potential large anatomical differences across individual images and unpredictable deformation during the surgery. Several studies focused on endoscopic image registration with optical flow which may result in an incorrect registration due to inhomogeneous lighting [13]. Some proposed methods tried to solve the discontinuity of the image content between successive image sequences by marking anatomical landmarks (skin markers, screw markers, dental adapters, etc.), which usually required additional auxiliary equipments and manual tags by an expert [8]. Homography matrix delivers significant robustness in registration [14], yet its direct application to endoscopy remains an unsolved problem for the complex motion characteristics and changing visual appearance.

There are three contributions of our paper. First, we propose a registration method named homographic triangle and epipolar constraint registration (HTECR) that can detect and match homographic features in an iterative way. Second, it can be directly applied to the current implemented gastroscopy device without any extra instruments. Third, HTECR can be applied to other abdominal soft organs such as heart, liver and lung, which also have smooth surface.

2 Methods

2.1 Overview

With the fact that the gastric internal surface is smooth enough to be composed of many small homographic triangle planes, an iterative registration algorithm is proposed. This method starts with establishing initial correspondences in gastroscopic sequences and triangulating the initial feature points. Then it verifies the homographic hypothesis of triangles. If the hypothesis is reasonable, the vertexes of corresponding triangles are marked as matching pairs (MPs); otherwise this method registers the inscribed circle center of corresponding triangles with epipolar constraint. Afterwards, the corresponding triangles are re-triangulated with the inscribed circle centers and vertexes. Finally, those new generated triangles are feed into next loop process. Figure 1 shows the working flow of our method.

Fig. 1.
figure 1

Workflow of HTECR

2.2 Initial Feature Detection and Delaunay Triangulation

Initial MPs between gastroscopic sequences are produced by a certain kind of existing registration methods, and then they are clustered by Delaunay triangulation. As endoscope moves flexibly in stomach and captures images at any viewpoints, the adopted descriptor should be robust to rotation and scale. The common used feature detection methods include Shi-Tomasi (also known as GFTT, Good Feature to Track) [15], FAST (Feature from Accelerated Segment Test) [16], SIFT (Scale Invariant Feature Transform) [17], SURF (Speed Up Robust Feature) [18] and CenSurE (Center Surround Extremas for Realtime Feature Detection and Matching) [19]. We performed the estimation and showed the results in [7] for the five kinds methods in early work.

Figure 2 is the triangulation of endoscopic image. For the two point sets V1 and V2 in MPs, edge E (as the green line segments in Fig. 2) is composed of endpoints named A and B in V1 and V2. If there is a circle through A and B without any other points inside the circle, the edge of A and B is a Delaunay edge. Triangulating points in V1 and V2 and insuring all the edges are Delaunay edges, we get the matching triangles composed of initial MPs.

Fig. 2.
figure 2

Delaunay triangulation on gastroscopic image

2.3 Homographic Registration

Let mi and\({{\text{m}}_{\text{i}}}^{\prime} \) points of a plane in the real world, H is a 3 * 3 homography matrix, according to [20], there is such an analytic relationship between the coordinates of matching points i:

$$ {\text{m}}_{\text{i}} = \,{{\text{Hm}}_{\text{i}}}{\hbox{'}} $$
(1)

With the assumption that the triangles generated by Delaunay triangulation are small enough to be consistent with homographic theory [21], H can be estimated by three vertices of the triangles and every point inside the triangle of a reference image should be wrapped and aligned in the corresponding triangle of the target image.

Considering (r1, r2, r3) as the vertices of the triangle in reference image, each point r(λ, β) inside the triangle can be written as a unique convex combination of the three vertices.

$$ {\text{r}}\left( {{\lambda ,}\,\upbeta} \right) =\uplambda{\text{r}}_{ 1} +\upbeta \uplambda ,\,\upbeta{\text{r}}_{ 2} + \left( { 1 { }{-} \,\uplambda\,\text{ - }\,\upbeta} \right)\,{\text{r}}_{ 3} $$
(2)

Where 1 > λ, β, (1 – λ - β) > 0. The intensity value of point r(λ, β) is described as I(λ, β). Normalized cross correlation (NCC) is contributed for validation of the homographic assumptions.

$$ {\text{NCC}}\left( {{\lambda ,}\,\upbeta} \right)\, = \,\sum \sum [{\text{I}}\left( {{\lambda ,}\,\upbeta} \right){\text{ I}}({\text{H}}\left( {{\lambda ,}\,\upbeta} \right))]\,/\,(\sqrt {\sum \sum [{\text{I}}\left( {{\lambda ,}\,\upbeta} \right)]^{2} } \,\sqrt {\sum \sum [{\text{I}}({\text{H}}\left( {{\lambda ,}\,\upbeta} \right))]^{2} } ) $$
(3)

Where the integral interval for λ and β is 0 to 1 respectively. The value of NCC(λ, β) represents the similarity of two matching triangle planes. The closer the value approaches to 1, the more reasonable the homographic assumptions for the two planes, in which case that we record the vertexes of matching planes in the MPs set. On the contrary, the closer the value approaches to 0 indicates the greater probability of inaccurate matching, in which case we will afresh the matching. The most common situation is that the matching triangles are not small enough to be considered as planes in the real 3D space so that they are triangulated into smaller triangles for the homographic assumptions.

2.4 Epipolar Constraint Registration

We mark the two binocular stereo vision planes as P and P′ separately and the line segment between optical center Ol and Or is baseline. For a given point pl in P, it corresponds to the space point p which is on the line determined by pl and Ol. Epipolar plane is determined by point p and the baseline. The intersecting line of epipolar plane and P’ is the epipolar. Epipolar constraint is that the matching point \({{\text{p}}_{1}}{\hbox{'}}\) in P′ of point pl must be on the epipolar.

In this paper, P and P′ are triangles generated by Delaunay triangulation, and make pl be the inscribed circle center which is equidistant from three edges of the triangle. The possible matching point \({{\text{p}}_{1}}{\hbox{'}}\) is on the epipolar of pl. Figure 3 shows the constraint relationship. This constraint greatly accelerates point matches between P and P′ by the way of reducing the dimension of possible matches from two-dimensional down to one-dimensional and the way of reducing the number of possible matches to a large extent.

Fig. 3.
figure 3

Epipolar constraint on gastroscopic image

The estimation of MPs is described as the Euclidean distance of SIFT descriptors by simplifying the vector from 128 to 32 elements:

$$ {\text{D}}_{\text{i}} \, = \,\sum\nolimits_{{\text{i}} = 0}^{32} \left| {{\text{p}}_{\text{i}} - {{\text{p}}_{\text{i}}}{\hbox{'}}} \right| $$
(4)

Where pi and \({{\text{p}}_{\text{i}}}{\hbox{'}}\) represent two MPs and Di stands for the summation of the 32 elements. Supposing point number on the line segment is n and global truth criterion is ε, the reliable matching results should meet the condition that max[D1, D2, D3 … Dn] < ε, which can avoid the incorrect corresponding pairs in non-exist region.

2.5 Iterative Registration

The matching points set is a candidate matching collection, and the MPs set is final matching result computed with our registration method. The iterative process can be described as:

  1. 1.

    Get the initial features and matching points in plane P and P′ with a specific feature point detection method and a registration method.

  2. 2.

    Triangulate the matching points with Delaunay, and the triangle planes are considered as surface of stomach wall.

  3. 3.

    Match the triangles with homographic registration method described in Sect. 2.2. The vertexes of matching triangles are recorded in the MPs set.

  4. 4.

    For the unmatched triangles, match them with epipolar constraint registration method detailed in Sect. 2.3 and cluster matching points into the MPs set.

  5. 5.

    If both of the two methods are not matched, triangulate the triangles with the inscribed circle center and vertexes into smaller triangles and iterate step 3 until no new triangles and MPs generate.

3 Experiments and Results

The registration method was applied to in vivo gastroscopy image sequences. The data were from Sir Run Run Shaw Hospital in Zhejiang Province, China. The gastroscopy video was captured at 25 fps. To ensure the confidentiality, examination information such as examination date and patient’s name was removed from the original gastroscopy images, and the processed images’ size was 470*410. The initial MPs were detected by SIFT in our experiments.

Figure 4 is the working flow for angularis by our proposed method. The number of initial MPs is 392 detected by SIFT. Then 451 MPs were achieved after 30 iteration by HTECR. The final MPs range from the vast majority of image, which benefits further image analysis such as 3D modeling [5] and mosaicking [6]. Simultaneously, with the increase of the iteration, more MPs were registered by HTECR.

Fig. 4.
figure 4

Workflow of HTECR on gastroscopic image

Figure 5 is the precision evaluation of HTECR and SIFT. Affine transformation was added to gastroscopic image to simulate the change of point of view during gastroscopy. The upper half of image is the original image and the bottom half is the transform image. In our experiment, the affine transformation is angular transformation and the angle is 30 degree. Matching the original image and transform image with SIFT and HTECR, we get the MPs as shown in Fig. 5 (the green lines), the blue points in upper half of image (MPs1) and the red points in bottom half (MPs2) are the matching points detected by SIFT and HTECR, the blue points in bottom half (MPs3) are correct matching points of MPs1 computed by the transformation matrix. If MPs2 and MPs3 are identical, the registration has high accuracy. 5(a) is the initial MPs and the number is 183. 5(b) is HTECR and the number is 209.

Fig. 5.
figure 5

Precision evaluation of HTECR (b) compared with SIFT (a)

Euclidean distance is used to evaluate the precision of MPs2 and MPs3, the points whose distances are less than five pixels are served as correct matching pairs because five pixels correspond to the thinnest vessel width [22]. The percentage of correct matchings of SIFT and HTECR are respectively 0.958 and 0.961, which indicates our registration method has better precision.

4 Conclusion

An iterative registration method is proposed in this paper. Initial MPs are acquired by common registration methods and they are clustered into triangles by Delaunay. Homographic assumptions and epipolar constraint are employed to explore further MPs based on initial MPs successively. The final registration results are vertexes of homographic triangles. Gastroscopic image experimentations show the method with a promising performance. However, there are also some issues with HTECR. The speed can be improved by GPU architecture and the mucus covered on gastric internal surface may cause incorrect MPs. More efforts should be made to settle the above limitations.