Background

Forest inventory is an important approach for determining the quantity, quality, and distribution of forest resources [1]. In a forest survey, the diameter at breast height (DBH) of individual trees is one of the most important indicators of tree attributes [2]. Accurate measurement of DBH is essential for forest resource inventory and management, tree growth, and carbon cycle modeling [1, 3]. Currently, acquiring the DBH of individual trees using traditional tapes is a time-consuming and labor-intensive endeavor. Devices which can obtain DBH in a rapid and accurate manner are highly anticipated [4].

Methods for measuring DBH can be divided into two categories: contact and non-contact. Contact dendrometers need to physically touch the tree trunk. Conventional calipers and diameter tapes are the most widely used contact dendrometers in forest surveys. Usually, two people are required to perform the DBH measurement (one for measuring, the other for recording). The limitations of contact dendrometers are their low efficiency and high labor cost. Non-contact dendrometers, such as optical calipers [5], rangefinder dendrometers [6], and optical forks [7], have been designed based on the principle of optical measurement. They do not need to touch the tree trunk; instead, perspective geometry utilizes various angles and distances to calculate the trunk diameter [8]. Photographs taken using a conventional film camera have also been used to perform non-contact DBH measurements. However, additional tools, such as reference stick with a known length or control points, must be placed near the tree trunk as a reference scale in order to determine how distances on the image relate to those in the real world, before the DBH of the tree trunk can be calculated [9, 10]. Moreover, lens distortions and film non-flatness from cameras can decrease the accuracy of the DBH measurements [11].

With the development of digital imaging technology, digital cameras have been used to measure the DBH of individual trees. These methods commonly require auxiliary tools such as reference sticks and calibration poles, and the contours of tree trunks are usually extracted manually, or through color-based approaches. For example, Clark [4] developed an instrument to measure DBH that incorporated a digital camera, a 3-axis magnetometer, and a laser rangefinder. The photo of the tree, range, and orientation data were fed into the “Tree Measurement System” processing program to calculate parameters such as DBH, height, and stem volume. However, manual input was still required to extract the contour of the tree trunk. Juujärvi et al. and Varjo et al. [12, 13] developed an image-based tree measurement system, which consisted of a digital camera, a laser rangefinder, and a calibration stick. Color and stem form models were combined to create a histogram separation model to locate the trunk curves and automatically extract the trunk frame. Camera geometry parameters and viewing geometry must first be determined before the color image information can be transformed into a three-dimensional trunk model of the tree and yield the measurements of tree height and DBH. Brownlie et al. [11] designed a photogrammetric image-based dendrometry system called “TreeD” for measuring the features of individual standing trees. Additional tools such as a transponder and height pole are needed for the “TreeD” system. Field parameters, including the horizontal distance from the camera to the tree and the height of the transponder above the ground, must be measured in the field. These parameters are used to register the tree images in a three-dimensional space using complex triangular-geometry calculations and coordinate transformations. Parameters such as DBH, height, and crown size, can then be measured in the “TreeD” system using stereogram-displaying software. Gazda and Kedra [14] developed a tree architecture description method using an image photogrammetric method, which includes image transformation (turning a non-metric into a metric image), calibration with a reference object, and vectorization. In recent years, smartphone-based passive monocular vision measurement methods have also been used to measure the DBH. For example, Wu et al. [3] proposed a method for measuring the DBH of multiple trees based on a single image taken by a smartphone camera, using machine vision and close-range photogrammetry technology. According to Wu et al., a visual segmentation approach based on an improved frequency-tuned saliency algorithm was used to extract the trunk contour using the color features. An adaptive feature coordinate system and the color information of the tree trunk were used to measure DBH.

Several studies have attempted to utilize multiple images taken from different directions to generate point-cloud data to measure the DBH of individual trees at the plot level. For example, Liang et al. [15] collected several photos taken at different positions around a forest plot using an uncalibrated digital camera. These photographs were used to generate point-cloud data by utilizing the automated image matching process of the Agisoft PhotoScan professional commercial software. The point-cloud data in the camera space was then transformed to obtain 3-Dimensional (3D) point-cloud data in the real-word space, which was then used to measure the DBH of each individual tree in the plot. Mulverhill et al. [16] also used the Agisoft PhotoScan software to construct accurate photogrammetric point-cloud data, and derived DBH, height, taper, and volume of trees in a plot. Forsman et al. [17] utilized a prototype multi-camera rig to record images from the center of field plots in multiple directions. Images were then used to generate point-cloud data to estimate tree attributes. Fan et al. [18] used a smartphone with a Google Tango sensor (the smartphone contained a combination of an RGB (red, green, blue) camera, a time-of-flight camera, and a motion-tracking camera called a vision sensor) to record images of trees, and they designed an algorithm to estimate the DBH and the location of the trees in the plot, using the point-cloud data generated from the time-of-flight camera and camera pose. The advantages of image-based point-cloud data include the low price of the equipment and the simplicity of the field measurements, and the disadvantages include the difficulties of map** small trees and trees that are occluded by the complex forest stands, and the time required for data processing [15].

In past few years, with the development of light detection and ranging (LiDAR) technology, more and more research has utilized ground-based or unmanned LiDAR scanning to obtain 3D point-cloud data of trees, and to derive height and DBH measurements [19,20,21,22,23]. The advantage of LiDAR technology is that it can describe the 3D structure of trees and obtain multiple tree parameters (such as height, DBH, and crown size) at the plot level. However, LiDAR equipment is expensive, its operation in the field is complicated, and data processing is very complex and specialized. At present, it is still difficult to utilize LiDAR technology widely in forestry surveys [21].

Based on the above discussion, we can see that reference sticks, calibration poles, and auxiliary indicators such as angles and distances are needed in the early image-based measurements of DBH. Manual processing is required to extract the trunk contour. This leads to a low degree of automation in measuring DBH. Furthermore, the calculation stage for many prior instruments needed to be conducted on a computer [4, 11,12,13], which led to low working efficiency in field forest surveys. Presently, smartphone-based machine vision and close-range photogrammetry technology have improved the degree of automation in image-based DBH measurements. Reference sticks and calibration poles are rarely used in field measurements. However, conversions between different coordinate systems (e.g., image plane coordinate systems, image space coordinate systems, photogrammetric coordinate systems, and object space coordinate systems) are quite complicated. Furthermore, the accuracy of the three-dimensional coordinates derived from two-dimensional image coordinates cannot be guaranteed [24], which decreases the accuracy of DBH measurements. It is worth mentioning that trunk contour extraction is a vital step for measuring DBH. However, in current research, most algorithms (such as histogram comparisons) of trunk contour extraction is based on the color information of the trunk, which is prone to error in the identification of tree trunks. In addition, instruments developed in previous works were mostly a loose collection of different hardware (e.g., a digital camera, a laser rangefinder, a transponder, a tripod, and a calibration pole), and no highly integrated and handheld device has been developed for easy and convenient DBH measurements. A compact design and user-friendly device could bring image-based DBH measurements to a wider range of users. Therefore, in this research, we attempted to develop a handheld, highly integrated DBH measurement device based on image recognition and laser ranging. We employed convolutional neural networks (CNNs) to identify the tree trunks using color and texture information. The newly developed device can record the longitudes and latitudes of the measurement sites in a text file format together with the measured DBH values and store this along with the tree images in the memory card. We believe that our device can improve the accuracy and efficiency of DBH measurements in forest resource surveys.

Materials and methods

General introduction

The proposed device uses laser ranging and image recognition, and has been developed to perform non-contact DBH measurements. The measured DBH values and the latitudes and longitudes of measurement sites were recorded and written into a text file, which can be easily transferred to an external flash disk.

The core software used is the object detection algorithm, which utilizes CNNs to precisely detect tree trunks. The core hardware includes a digital camera, a laser rangefinder, an embedded development board, a global positioning system (GPS), battery, liquid crystal display (LCD), and a memory module (Fig. 1). The size of the device is 10.5 cm × 5.5 cm × 14.5 cm (length × width × height), and the weight is 600 g, which is light enough to be carried by a single person operating in the field without the support of a tripod. This device can work continuously for about 12 h in an environment with a temperature range of 0–40 °C, which meets the requirements of field forest surveys.

Fig. 1
figure 1

Structure of the device. LCD liquid crystal display, USB universal serial bus

When the device is powered on, the microprocessor continuously reads the low-resolution video through the interface of the digital camera, and a real-time video is displayed on the LCD. When the operator issues the “Take photo” command to the digital camera by pressing a virtual button on the LCD, the digital camera captures a high-resolution photo of the targeted tree trunk. The photo is then processed by the microprocessor to identify and extract the trunk using CNNs algorithm. Then, the number of pixels between the edges of the extracted trunk contour is recorded. Meanwhile, the laser rangefinder measures the horizontal distance between the digital camera and the targeted tree trunk. This information is sent to the microprocessor to calculate the DBH based on the theory of geometrical optics. The DBH value is then displayed on the LCD and written into a text file together with the recorded latitude and longitude. The workflow of the device is shown in Fig. 2.

Fig. 2
figure 2

Workflow of the presented device. CNNs convolutional neural networks, LCD liquid crystal display, GPS global positioning system

Theoretical basis

The theoretical basis of the proposed device is shown in Fig. 3. The DBH is measured based on the horizontal distance from the device to the targeted tree trunk, intrinsic camera parameters (focal length, pixel size), and number of pixels between the edges of the trunk at breast height. In Fig. 3, \(L\) is the projection of the semidiameter of the trunk on the charge-coupled device (CCD) plate of the digital camera, \({ }f{ }\) is the focal length, \(D\) is the horizontal distance from the camera to the targeted tree trunk, and R is the semidiameter of the tree trunk.

Fig. 3
figure 3

Working principle of the device

In the theory of geometrical optics, the relationships between \(L\), \({ }f,\) and \(D\) are expressed by Eq. (1). Based on the imaging principle of a digital camera, \(L\) can be calculated using Eq. (2).

$$\frac{L}{f} = \frac{R}{R + D},$$
(1)
$$L = \frac{1}{2}\left( {N \times \mu } \right).$$
(2)

In Eq. (2), \(N\) is the number of pixels between the edges of the tree trunk at breast height. \(\mu\) is the pixel size. We can use Eq. (3) to calculate \(R\) based on the combination of Eqs. (1) and (2).

$$R = \frac{DN\mu }{{2f - N\mu }}.$$
(3)

\(f_{x}\) is the normalized focal length of the abscissa axis, which is calculated using Eq. (4). Based on Eqs. (3) and (4), we can use Eq. (5) to calculate \(R\).

$$f_{x} = f/\mu ,$$
(4)
$$R = \frac{DN}{{2\mathop f\nolimits_{x} - N}},$$
(5)

\({\text{f}}_{x}\) is one of the intrinsic camera parameters. Although the manufacturer has provided intrinsic parameters such as pixel size and focal length, we need to calibrate the camera to determine the precise intrinsic camera parameters. In this research, we used the method described by Zhang [25] to calibrate the camera and obtain the normalized focal length (\(f_{x}\)). \(N\) is the number of pixels between the edges of the extracted tree trunk at breast height.

Detection of the tree trunk

Detecting the tree trunk is a demanding task because of the variations in texture and color richness of the tree trunk, occlusions of forest scene objects, complex backgrounds, and diverse lighting conditions. The emergence of CNNs provides a good solution for object detection [26]. It can automatically acquire features from the training data that represent the nature of the target. Compared with manually selected features, deep features selected by CNNs have a robust ability to describe the characteristics of targeted objects [27]. Several researchers have utilized CNNs to detect objects in image interpretation [28,29,30,31,32]. In the present study, we adopted a lightweight algorithm based on CNNs that includes an attention-focused mechanism for detecting the tree trunk.

Dataset construction

CNNs are data-driven deep-learning algorithms that require sample data to train the model for object detection. We collected 200 pictures of trees, including those of Cerasus serrulata, Amygdalus persica, Pinus tabuliformis, Ailanthus altissima, and Fraxinus chinensis. We extracted sub-images of the tree trunks from these pictures manually, and used half of them as the training data, while the remaining images were used as the test data.

Construction of the CNNs

A large receptive field is the key to the effective extraction of semantic edges, and the size of the receptive field increases with increasing convolutional layers. The size of the receptive field is calculated using Eq. (6):

$$l_{k} = l_{k - 1} + \left[ {\left( {f_{k} - 1} \right) \times \mathop \prod \limits_{i = 1}^{k - 1} s_{i} } \right].$$
(6)

In Eq. (6), \(l_{k - 1}\) is the size of the receptive field for the \(k - 1\) convolutional layer, \(f_{k}\) is the kernel or pool size of the \(k\) layer, and \(S_{i}\) is the stride of the convolution or pooling layer. The increase in the receptive field size can be achieved by either increasing the size of the kernel or the stride. However, increasing the size of the convolution kernel increases the computation load exponentially. It is difficult for an embedded device to accomplish this computing process. Therefore, we chose to increase the size of the receptive field by increasing the stride size of the convolution. Increasing the stride size can also reduce the size of the feature map and effectively decrease the amount of computation.

Pool down-sampling and convolution down-sampling are two approaches that are commonly used to increase the stride size. As pool down-sampling is more conducive to model convergence, we chose pool down-sampling in the CNNs to increase the stride size. Although we reduced the computation load by pool down-sampling, the computation overhead and memory overhead were still very large for an embedded device. Therefore, we took measures to further compress the CNNs. In our research, we utilized the same approach adopted by MobileNet [33] to compress the model, using separable convolution instead of standard convolution filters to process the information. Separable convolution was composed of depth wise convolutions and 1 × 1 convolutions.

The number of parameters (Prams) and the cost of the standard convolution (Cost) can be obtained by Eqs. (7) and (8) respectively.

$${\text{Prams}} = D_{k} \times D_{k} \times M \times N,$$
(7)
$${\text{Cost}} = D_{k} \times D_{k} \times M \times N \times D_{f} \times D_{f} .$$
(8)

In Eqs. (7) and (8), \(D_{k}\) is the size of the convolution kernel, \(M\) is the number of input channels, \(N\) is the number of output channels, and \(D_{f}\) is the size of the feature map.

The number of parameters and the cost of the separable convolution can be obtained separately using Eqs. (9) and (10).

$${\text{Prams}} = D_{k} \times D_{k} \times M + 1 \times 1 \times M \times N,$$
(9)
$${\text{Cost}} = D_{k} \times D_{k} \times M \times D_{f} \times D_{f} + M \times N \times D_{f} \times D_{f} .$$
(10)

In our study, \(D_{k}\) was set to 3. According to Hollemans [34], and having the same number of input and output channels is beneficial for increasing computational speed and reducing memory overhead. Therefore, \(M\) and \(N\) were both set to 32 for down-sampling and up-sampling. Compared with the standard convolution, the number of parameters for separable convolution dropped from 9216 to 1312.

Spatial attention module

Prior research has found that placing the targeted tree in the middle of the image can reduce image distortion and the influence of complex backgrounds on the extraction of the tree trunk, thereby increasing the measurement accuracy [3]. Therefore, we proposed that the device be placed such that the area to be used for the DBH measurement was in the middle of the photo when measuring. This allowed us to use the special features of a captured photo to filter out non-targeted trees and the background, further improving the accuracy of object detection. Based on this, we adopted the spatial attention module proposed by Woo et al. [15] estimated the DBH using images with the G (green) channel and the RGB channels. The BIAS and relBIAS were 4.8 mm and 1.33% for DBH values estimated from the G image, and 19.8 mm and 5.39% for the RGB image, respectively. The RMSE and relRMSE were 23.9 mm and 6.6% for DBH values estimated from the G image, and 44.7 mm and 12.14% for the RGB image, respectively. Adilson et al. [37] utilized vertical fisheye images to measure the DBH and achieved an RMSE of 14.6 mm. In the study by Fan et al. [18], the RMSE of DBH estimations using smartphones was 12.6 mm, relRMSE was 6.39%, BIAS was 3.3 mm, and relBIAS was 1.78%. Wu et al. [3] developed a smartphone-based DBH measuring device, and the RMSE of the measured DBH values was 2.17 mm. Compared with most previous studies, our device showed higher measurement accuracy (Table 1).

In the field measurement for accuracy testing using the newly developed device, we took only one picture for each tree from one direction to estimate the DBH of the targeted tree. This definitely influenced the DBH measurement accuracy because the tree trunk is not a standard cylinder and the cross-section is not a perfect circle. The measured points represent different diameters with any change in view angle [8]. Hence, we proposed to perform repeated measurements from multiple directions to improve the measurement accuracy when using the newly developed device in the field DBH measurements.

Here, we need to emphasize that the measurement accuracy reported in our research was based on data collected in a deciduous broad-leaved forest in winter. If the test measurement was taken in summer or autumn, the measurement accuracy might be slightly different from that in the winter, as the light condition may vary under the forest. In addition, the measurement site is located in a semi-natural forest, where the tree density may be lower than in a natural forest, particularly sub-tropical or tropical forests. This means that the background of the targeted tree at our test site might be less complex than that of a natural forest. Previous research has proved that complex backgrounds have a negative influence on the extraction of the tree trunk [3]. Therefore, the measurement accuracy of the new device may change slightly if the test data are collected in a natural forest, due to the variation in forest structure.

Economic cost of the newly developed device

Among the dendrometers applied in forest surveys, the cheapest is the conventional tape, which costs less than five dollars for  a tape in China. The LiDAR system is a popular high-throughput technique for DBH measurements. However, the LiDAR system is expensive, costing about fifty thousand dollars to buy the instrument. Another high-throughput technique for DBH measurements is the image-based point-cloud data method. In this method, you need one or several cameras to gain multiple images of the plot from different directions, with the price for one camera ranging from five hundred dollars to six thousand dollars, or even more. For our device, the hardware cost about two thousand dollars. If the device were mass produced, the cost would decrease to five hundred dollars per device.

Although the economic cost of the new device is lower than that of the LiDAR system, it does not mean that the new device could replace the LiDAR system. This is because the design and application scenario of the new device are different from those of LiDAR. Our device is suitable for diameter measurements of individual trees at the quadrat level (usually 20 m × 30 m). Whereas the LiDAR is usually used to measure the diameters of trees at plot level (100 m × 100 m, or even larger).

Analysis of major error sources

The horizontal distance between the laser rangefinder and the targeted tree is a very important factor that affects the measurement accuracy. However, the coded spot emitted by a laser is easily flooded by sunlight [3]. When the device faces the sun or specular light, the ranging precision cannot be guaranteed.

The newly developed device is a handheld instrument, and no tripod and spirit level are required for measurement. It is convenient and efficient to conduct measurements in the field. However, without the support of a tripod and spirit, it is difficult to keep the device parallel to the targeted tree trunk. This may lead to an error in distance measurement and affect its accuracy.

Errors also occurred in the detection of trunk edges, which cause further errors in determining the pixel number between the edges of the tree trunk, leading to the detriment of the measurement accuracy.

The trunk form also causes a difference between the DBH values measured by the new device and that of the conventional tape. We have previously discussed this issue in “Discussion”, and more details were presented in “Evaluation of measurement accuracy” section.

Proposed application scenarios

Our device is designed to provide an alternative to conventional tape, or to replace it in the measurement of tree diameter. It is suitable for measuring the DBH of individual trees in forest inventory at the quadrat level. Our device can also be applied in forestry and agriculture related industries, such as for the measurement of plant traits in plant breeding, as it can be an efficient measurement of tree diameter and fruit diameter.

Conclusion

The newly developed handheld device realized efficient, accurate, instant, and non-contact measurements of DBH, and the CNNs were proven to be successful in the detection of tree trunks in our research. The measured diameter values and the recorded longitudes and latitudes of the measurement sites were written into a text file, which was convenient for export to an external flash disk. We believe that the newly developed device can fulfill the precision requirement in forest surveys, and that the application of this device can improve the efficiency of DBH measurements in forest surveys.