Introduction

Plant phenotype helps in understanding plant environmental interactions and translating their applications in crop management, biostimulants, microbial communities, etc. [1,2,3]. Traditional phenotypic measurements are manually conducted at the single plant level, which is inefficient and has low precision, thus limiting the analysis of the genetics of quantitative traits, especially those associated with yield and stress tolerance [4]. Therefore, high-throughput and automated phenotypic measurement techniques should be developed. Maize is grown extensively around the world as an important food and feed crop, and is one of the three major food crops in the world, along with wheat and rice [5]. An important feature of maize is the structure of the male tassels. Maize tassels produce pollen necessary for maize reproduction, and factors such as the number and length of tassel branches are related to factors affecting grain yield [6]. Studies have shown that the yield of plants with removed male tassels is 50.6% higher than that of intact plants, and that smaller male tassels usually release more energy for grain production and reduce light shading, making a good male structure one of the components of an ideal plant type [7]. In addition, regional trials are an important link in breeding and new variety promotion, and manual investigation of the distinctness, uniformity, and stability (DUS) testing traits is a time-consuming and laborious work [8]. The DUS testing traits related to maize tassels include spindle length, number of branches, tassel weight, tassel density, branch length and branch angle, among which, tassel spindle length and branch number are two important traits. Therefore, the tassel structure should be comprehensively evaluated and understood.

In the field planting environment,maize tassels can be recognized, counted and located based on image depth learning technology [13]. These image-based methods have greatly enhanced the development of phenotypic studies of maize tassels. However, two-dimensional-based methods cannot accurately acquire phenotypic traits in three dimensions. As a result, 3D reconstruction methods, including depth camera-based methods [14, 15], LIDAR-based methods [16, 17], and multi-view image reconstruction-based methods [18,19,20] have been developed to avoid data loss in dimensionality. The 3D data acquisition methods avoid plant self-obscuration that may be encountered in 2D images and provide a basis for complex plant phenotype extraction. The 3D processing software is the traditional method widely used to obtain plant phenotypic traits (leaf area, length, width, inclination, etc.) from 3D data via manual operations, such as organ segmentation on plant 3D data. However, this method is time-consuming and hinders the efficiency of high-throughput phenotype acquisition. Therefore, a method that can improve the efficiency of phenotype extraction is necessary [21]. The DBSCAN(density-based spatial clustering of applications with noise) algorithm is applied to automatic branch segmentation of maize tassel point cloud, but it was difficult to achieve branching and segmentation for compact tassels, so more robust algorithms need to be studied [22]. The DFSP(distance field-based segmentation pipeline) algorithm was proposed for automated segmentation of corn plant stem and leaf point clouds in different directional structures [23].

Automated segmentation techniques based on 3D deep learning can be used to efficiently acquire phenotypic data. The 3D deep learning networks can be divided into three broad categories: 3D voxel grid-based frameworks [24], convolution-based methods [25,26,27], and the framework for direct input point clouds [28]. PointNet extracts features in a way that a global feature is extracted for all point cloud data without considering the direct relationship between local point clouds [28]. PointNet +  + can extract local features at different scales of point clouds and obtain deep features through a multilayer network structure [29]. Meanwhile, several researchers have used deep learning of 3D point clouds in plant organ segmentation.A novel pattern-based deep neural network Pattern-Net was designed for the segmentation of wheat point clouds [30]. DeepSeg3DMaize was developed to extract six phenotypic traits of maize based on PointNet [31]. A dual-functional deep learning neural network PlantNet was proposed for semantic segmentation and instance segmentation of two dicotyledons and one monocotyledon from point clouds [Full size image

Materials

Maize tassel samples (180) were selected from the maize association analysis population for the construction of training and test sets to increase the diversity of sample morphological structure. The materials were planted in the experimental field of the Bei**g Academy of Agricultural and Forestry Sciences (39°56′N, 116°16′E). The materials were sown on May 10, 2021, based on the unified density (row spacing; 60 cm and plant spacing; 30 cm).

Data acquisition

At maize tassel loose pollen stage, the MVS-Pheno V2 platform [35], an easy-to-assemble automated acquisition device with a supplemental light system, was used to obtain multi-view images of maize tassels (Fig. 1a1). The device was deployed in a 3 m long, 3 m wide, and 2 m high tent next to the maize planting area. The maize tassels were cut from the maize plants in the field during the maize power dispersing period, held using a metal frame table (Fig. 1a2), and placed in the central acquisition area of the device for multi-view image acquisition. On the swivel arm of the MVS-Pheno V2 platform, two side cameras(Japan,Canon,Canon77D) are used, which is 80 cm the distance between the cameras and the center of the device,and the vertical distance between the two cameras is 15 cm. For each maize tassel sample, each camera acquires 30 side images at 12° intervals, totaling 60 side images, and each cycle usually takes 90 s (Fig. 1a3). 25 samples of maize tassels ware manually measured to evaluate the reliability of our method, firstly the branch angle and tassel volume ware manually measured by the three-dimensional digitizer device (America, Polhemus, FASTSCAN) (Fig. 2b) [36], and branch length and branch number ware manually measured by image methods (Fig. 2a) [37].

Fig. 2
figure 2

Validation data. a Validation data shot. Placement of branches in order of their position on the stalk. b maize tassel branch skeleton point, and the red dots represent the upper node, the green dots represent the lower node of the branch

The Structure-from-Motion algorithm (SFM) [38] and the Multi-View Stereo algorithm (MVS) [39] were used for the reconstruction of multi-view point clouds. A batch point cloud reconstruction pipeline system(PC_MVS) was integrated and developed on the basis of the open source libraries openMVS [40] and openMVG [41]. Dense point clouds of corn tassels are reconstructed by PC_MVS (Fig. 1a4).

Data set production

Point cloud pre-processing

The maize tassel point cloud reconstructed using multi-view image data had some noise points. There are three types of noise point clouds, namely, surrounding enclosure and ground noise, support frame and calibration plate noise, and maize tassel attachment noise(color noise and outlier noise). The noise was removed as follows: firstly surrounding enclosure and ground noise point cloud was removed based on the rules of the corn tassel point cloud in the center of the reconstructed point cloud scene (Fig. 1b1); the maize tassel point cloud was separated from the point cloud scene using the HSV space segmentation method of point cloud color information(the point cloud vertex color was converted from RGB space to HSV space, and the threshold masks of H, S and V channels were set) (Fig. 1b2), the threshold values were set as follows: H: 15–180, S:0.05–1, V: 0–1 for each channel of HSV under the conventional indoor lighting environment, and the threshold values could quickly eliminate the shooting background cloth, calibrator point clouds from the scene, and remove the color noise caused by light reflection; the calibration plate point cloud was separated by HLS space, and the maize tassel point cloud is corrected based on the actual size of the calibration plate [35]; the outlier noise points were then removed based on the statistical filtering algorithm as follows:

The point \(pi\) was selected from the point cloud \(P\), and the average distance between its \(n\) neighboring points \(\{{m}_{1},{ m}_{2}, {m}_{3}, \dots ,{ m}_{n}\}\) for the point \(pi \mathrm{was calculated}\) as follows:

$${d}_{mean}=\frac{{d}_{1}+{d}_{2}+\dots +{d}_{n}}{n}$$
(1)

where \(d\) denotes the distance between two points.

The standard deviation \(\sigma\) of that point and neighboring points was determined as follows:

$$\sigma =\frac{1}{n-1}\sqrt{{\sum }_{k=1}^{n}{\left({d}_{i}-{d}_{mean}\right)}^{2}}$$
(2)

The neighboring point was removed if the distance from the neighboring point \({m}_{i}\) to \({p}_{i}\) was greater than \(\propto\) standard deviations from the average distance \(({d}_{i}>{d}_{mean}- \propto \times \sigma\)). The noise reduction effect was better when \(\propto =0.\) 5 which get a relatively smooth edge while preserving as much information as possible (Fig. 3c).

Fig. 3
figure 3

Denoising effect of different parameters. a original point cloud; b n  = 20, \(\alpha\)= 0; c n = 20, \(\alpha\)= 0.5; d n = 20, \(\alpha\) = 1

The point clouds generated via multi-view reconstruction were dense,each maize tassel sample point cloud has more than 500,000 points,which cannot successfully run on the normally configured computer for deep learning training. Firstly,the random sample consensus (RANSAC) algorithm [42] is used, and the sample point cloud has been quickly down-sampled from 500,000 to 100,000 points. Down-sampling using the farthest sampling (FPS) [43] algorithm simplified the number of point clouds without destroying the point cloud distribution, therefore, which was used to further sample the sample point cloud from 100,000 to 40,000 points. In this paper, each sample point cloud is reduced to 4000 points to improve the efficiency of model training (Fig. 1b4).

Fig. 4
figure 4

Overview of the network model of maize tassel segmentation. The encoder part consists of three SA layers. Each SA layer sets a different number of sampling points, sampling radius, and Multi-Layer Perceptron (MLP) layer size. The decoder part consists of three IP layers. Each IP layer connects the features extracted by the corresponding SA layer and sets a different MLP layer size

Point cloud annotation

Although deep learning algorithms for point clouds have attracted much attention, the methods have not been widely used in plant phenotype processing. This could be because plant point clouds are more complex than buildings, furniture, etc., and there are few open source datasets. Besides, various plants have many differences and thus require more networks to be trained. In addition, manual annotation is a challenge in some self-shading plants. The morphology of maize tassels is divided into compact and spread types. As a result, it is difficult and time-consuming to completely label each branch of the tassel manually. Therefore, only the top point cloud of each branch of the tassel was selected as the point cloud representative of that branch since the roots of tassel branches are compact (Fig. 1c1). The top point cloud of the segmented branch was continuous (3–5 cm long), and did not cross with the main stem of the tassel to accelerate the labeling speed. The branch point cloud of maize tassels was annotated using CloudCompare [44], then the top point cloud of the branch was marked as 1, and the rest points were marked as 0.

Dataset enhancement and composition

Data enhancement was performed via the random point sampling method for all labeled branch point cloud data since the maize tassel point cloud data obtained via multi-view reconstruction were dense and non-uniform. In the first stage of downsampling, different initial points were selected to ensure sample dataset in the RANSAC, and different point cloud samples were generated through different initial sampling point sequences. Finally, a new enhanced dataset was obtained, including 1260 training sets, 360 test sets, and 180 verification sets, totaling 1800.

Maize TasselSeg system

Segmentation network framework

The specific network framework used for segmentation is shown in Fig. 4. The network input data contained 3-D coordinate information and normal vector information for N points. An encoder-decoder structure was used to improve the performance of the model for the segmentation task. The encoder module of the network consisted of multiple set abstraction (SA) layers. The point set in each SA layer was adopted and abstracted to produce a smaller scale, larger channel set of points. The SA layer consisted of three parts (sampling layer, grou** layer, and PointNet layer). The sampling layer down samples the point cloud collection using the farthest distance sampling algorithm, and each point sampled is then used as the center of mass of a local domain. The grou** layer constructs a local neighborhood by finding the nearest neighbors around the center of mass sampled by the sampling layer and finally abstracts these local neighborhoods via the PointNet layer. In addition, PointNet +  + proposes two methods, multi-scale grou**, and multi-resolution grou**, which enhance the generalization ability of the network by splicing local features at different scales to better abstract the local features of point clouds. In this paper, the neighborhood features of different radiuses for the center of mass obtained were abstracted using MSG as an extension of SA layer by sampling each sampling layer.

Interpolate and PointNet (IP) layers were designed in the decoder part. PointNet +  + adopts the reverse interpolation method and skip connection to achieve the up sampled point cloud features and obtain the discriminative point-wise feature. Reverse interpolation obtains the interpolated feature with C-dim point feature using the inverse distance-weighted(inverse of distance squared) mean based on k-nearest neighbors (Summation parameter \(K\)). The feature can be calculated as follows:

$${f}^{\left(j\right)}\left(x\right)=\frac{{\sum }_{i=1}^{k}{w}_{i}\left(x\right){f}_{i}^{\left(j\right)}}{{\sum }_{i=1}^{k}{w}_{i}\left(x\right)}, j=1,\dots ,C$$
(3)

where:

$${w}_{i}\left(x\right)=\frac{1}{{d\left(x,{x}_{i}\right)}^{2}}$$
(4)

The local level features were obtained by directly concatenating the representations from the previous encoder corresponding layer onto the interpolated features, which passed through a PointNet to obtain the output. The above process was repeated until the features were relayed to the original point set. Encoder was sequentially performed through the three SA layers to obtain features at three different scales. The decoder was then performed through the three IP layers to splice the features at all scales. Finally, segmentation prediction was performed through a fully connected layer. The model output parameter k for the branch top segmentation of maize tassels was set to 2.

Loss function

Segmentation networks are essential in the classification predictions of points. Herein, SoftMax cross-entropy function, which is commonly used in classification tasks, was used as the loss function in the training process as follows:

$$Loss= -\sum_{i=1}^{n}p\left({x}_{i}\right)logq\left({x}_{i}\right)$$
(5)

where \(n\), \(p\left({x}_{i}\right), \mathrm{and} q({x}_{i})\) represent the number of point clouds input to the network, the probability of the true label of the point, and the predicted value of that point, respectively.

Network training

The segmentation network was trained using a server with 16 cores and 32 threads CPUs, 128 GB RAM, 1 NVIDIA GeForce RTX 3090 GPU running under Windows 10 operating system, and Pytorch as the training framework.

The point cloud data containing XYZ coordinates, normal vectors, and label values of custom point cloud size were used as the input data for the network training. The training batch size and initial learning rate were set to 24 and 0.001, respectively. The learning rate was reduced by 50% every 20 epochs. ADAM Solver was used to optimize the network. The weight decay of the model and momentum were set to 0.0001 and 0.9, respectively. Network training was terminated when the training loss function was less than the fixed threshold, otherwise, the training continued until all epochs were completed.

Evaluation indicators

Ground Truth annotation was used to determine the accuracy of the segmentation results based on four quantitative metrics (Intersection Over Union (IoU), segmentation accuracy, precision, and recall). The IoU represents the intersection rate between the predicted and true values of the segmentation network. The accuracy reflects the proportion of correctly segmented points to the ground truth points of the segmentation network. The precision represents the true predicted positive data, while the recall represents the total predicted positive data.

The four indicators were determined as follows:

$$IoU=\frac{TP}{TP+FP+FN}$$
(6)
$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
(7)
$$Precision=\frac{TP}{TP+FP}$$
(8)
$$Recall=\frac{TP}{TP+FN}$$
(9)

A point in the maize tassel point cloud was defined as true positive (TP) if it was marked as the same category. A point was defined as a false negative (FN) if it was mislabeled and it was ground truth. The point was defined as false positive (FP) if it was mislabeled and it was not ground truth. Higher IoU, segmentation accuracy, precision, and recall values represented better accuracy.

Branch extraction of maize tassels

The top point cloud of each branch of the maize tassel was obtained through a segmentation network. Inspired by the shortest path (SP) algorithm [45] and the Median Normalized-Vectors Growth (MNVG)algorithm [46], the bottom-up minimum path algorithm (Algorithm. 1) was constructed, which was used to extract the skeleton point cloud of the tassel and the organs based on the skeleton point cloud for complete extraction of a branch point cloud.

figure a

The top point cloud of each branch was obtained by segmenting the maize tassel point cloud using the learned network. The segmented top point cloud was clustered using a density clustering algorithm. The point cloud was considered to have a uniform density since the maize tassel point cloud was down-sampled using the farthest point sampling.

The average distance \({d}_{mean}\) of the point cloud of single maize tassel was calculated, and \(\varepsilon =3*{d}_{mean}\) was used as the search radius of density clustering, while \(minPts=5\) was set as the minimum density of the neighborhood.

In addition, a threshold of minimum clustering points was selected to reject over-clustered point clouds. The point clouds at the top of the maize tassel branches segmented by the PointNet +  + network were density-clustered to obtain the point cloud instances and the number of branches at the top of each branch.

The bottom-up minimum path algorithm was used to obtain the skeleton of the tassel point clouds, as follows:

First, the maize tassel point cloud was cut into two parts: the top of the branch \(T\) and the remaining part \(R\). For each branch top instance, the point \(p\) in \(R\) that is closest to \(T\) was searched and saved as the initial growth point at the top of the branch (Fig. 5a, b).

Fig. 5
figure 5

Schematic of the bottom-up minimal path algorithm for maize tassels branch nodes searching. a the point cloud is clustered from the network segmentation at the top of branch; b the initial growth point at the top of the branch; c the nodal convex hull; d the root node is selected through traversing the maximum convex hull; e the root node is added to the path search queue; f the R nearest contraction region; g the next search node; h the fork node; i the multiple branch nodes; j the multiple shortest path from the branch top nodes to the root node

All points \(n\) in \(R \mathrm{were traversed}\). The volume of the minimum convex hull of \(n\) with the initial growth point \(p\) of all branches was then calculated. The point \(n\) corresponding to the minimum convex hull with the largest volume was selected as the root node of the tassel point cloud. The parent node of the root node was selected as the termination node. The root node was then put into the traversal list \(O\) (Fig. 5c–d). The traversal list \(O\) was traversed cyclically. Each traversal was performed on all points \(p{\prime}\) within the neighborhood \(r\) of the current traversal point. The nearest point \(p{\prime}{\prime}\) whose parent node was not empty was searched, and its own parent node was set to \(p{\prime}{\prime}\). The point \(p{\prime}\) was stored in the traversal list \(O\), and the current traversal point was removed (Fig. 5e–g). When the traversed point is the fork node (Fig. 5h), multiple branch nodes are searched, and these branching points are set as the parent node respectively (Fig. 5i). This process was repeated until the traversal list \(O\) was empty and the parent node information of each point was obtained. Finally, the initial growing point \(p\) of the top instance of each branch was searched for the shortest path to get the skeleton of each branch (Fig. 5j). The longest branch skeleton was set as the main stem skeleton. The skeleton points overlap** with the main stem skeleton were set as the stem part. Each original point cloud was judged after obtaining the skeleton point cloud, and its nearest branch was used for fusion. Finally, all branch point clouds were obtained.

The algorithm set a multi-layer traversal neighborhood range to improve the robustness of unevenly distributed tassel point clouds. The initial point cloud was down-sampled to the farthest distance to improve the operation efficiency of the algorithm.

Extraction of phenotypic traits

Six phenotypic traits, including the number of branches, branch length, branch curvature degree, branch angle, tassel volume, and tassel dispersion, were extracted based on the stem and branch point clouds. The extraction methods are described in Table 1.

Table 1 Phenotypic traits calculation method