1 Introduction

In recent years, the development of computational pathology has strongly influenced the progress in quantitative digital pathology [1]. The availability of whole slide image technology has encouraged the development of many systems that perform computer-aided diagnosis. However, certain obstacles and limitations still exist in achieving a reliable result. One of the primary problems is the overlap** of structures, also called clusters of objects (commonly cell nuclei), that are segmented for quantification. The problem arises when multiple objects are counted as one due to clustering. Clustering is observed independently of the applied tissue stains, regardless of the use of hematoxylin and eosin (H&E) or different types of immunohistochemical (IHC) stains. Because brightfield microscopy provides no information about the depth of the sample, it is only possible to observe the cell nuclei as overlap** flat objects.

In this study, we present a new approach for cluster splitting, which is based on the distance transform and modified watershed algorithm. There is not only no need for preprocessing the images but also the method proposed in this study improves by cluster classification and recurrent approach. We compare our method to the other known methods and validated the efficiency of our method on two datasets: (1) publically available benchmark dataset BBBC0004 [2] and (2) manually labeled set of images of breast cancer cells with IHC staining against FOXP3 with 3,3’-diaminobenzidine and hematoxylin (DAB&H) [3]. Figure 1 shows two examples from the latter dataset.

Fig. 1
figure 1

Two examples of images from dataset with annotations by experts. Positive nuclei are marked with red dots while blue dots represent negative nuclei. The selected nuclei with boundary annotation are shown with blue and red overlay for each expert’s marking which results with pink intersection

IHC staining is often employed to augment the diagnosis and allow disease discrimination. The typical examination of the antigen expression is based on the cell’s distribution and morphology, as well as on the tissue architecture. Quantitative analysis of such sections can be used to support the diagnosis and evaluation of the disease progression.

The rest of this article is organized as follows. The remaining part of this section describes the related works and reference methods. Section 2 outlines the proposed method along with the dataset description. Section 3 describes the results and Section 4 presents the discussion. Finally, Section 5 presents the conclusion.

1.1 Related works

Cluster splitting is still a troublesome and an unsolved problem in digital pathology. There are many state-of-the-art automated methods for nuclei/cell detection, segmentation, or classification that are reported in the literature, but unfortunately, they are mostly used for typical H&E staining and are difficult to adapt to other staining techniques, such as IHC.

In order to further enhance the capability and accuracy of diagnostic decision, many image analysis techniques have been developed to minimize the problem of over-segmentation and under-segmentation. Some of the researchers tackle the problem of clustering directly, whereas others try to manage it inside their framework. In 2018, Salvi and Molnari [4] proposed MANA method for nuclei segmentation. Their approach is quite flexible, which means it processes images of different types of tissues and scales but is limited to images with H&E staining. Swiderska et al. [5] presented a method for hot-spot selection where separate nuclei segmentation also played a crucial role. Cheng and Rajapakse [6] developed a solution to process images with fluorescence microscopy in which adaptive H-minima transformation cooperates with watershed-like algorithm. Ali et al. [7] proposed a multilevel set method to solve the problem of clustering. Wienert et al. [8] proposed contour-based minimum model with minimal a priori information. Furthermore, Yan et al. [17] uses oval-fitted nuclei segmentation and quantification functions with automatic processing. Qupath is one of the most popular freeware softwares used in the analysis of whole slide images [18].

1.2 Cluster split methods based on the boundary shape

Within the cluster split methods, we can distinguish the group of methods that are based on the values of the pixel intensity and those that are solely based on the shape of the cluster boundary. The advantage of cluster split methods based independently of the pixel intensity values might be crucial in the case of histopathological images as this type of biological data vary in contrast, brightness, and color representation.

1.2.1 “Mouelhi”

This algorithm was presented by Mouelhi et al. [19]. It is based on the construction of a concave vertex graph followed by the selection of the shortest path. They treat the separating edges of the watershed algorithm as propositions from which they select the best fit. To do so, they investigated the endpoints of the edges and looked for the points near the concave points of the outer boundary of the cluster based on the calculated Euclidean distance. Then, by applying the Djikstra algorithm, they computed the shortest, optimal path that can separate the clustered nuclei. This method was developed specifically for clustered breast cancer cells and produces very good results without losing any geometrical cell features.

1.2.2 “Kong”

This algorithm for cluster splitting was introduced by Kong et al. [20] which uses an iterative process. It starts by finding the most likely concave points (mlcp), which are midpoints of the detected concave regions. Cluster splitting is related to the two essential points of the cluster: radial symmetry center and geometrical center. The line between these two points is the basis for selecting the mlcp as the mlcp should be on either side of the said line. Then, the cluster is cut with the line between the selected mlcp. This procedure is iteratively applied until the size of the resulting object is satisfactorily small. This algorithm was a part of the framework that was tested on pathological images of follicular lymphoma and produced incredible results [21].

1.3 Cluster split methods based on intensity

To prove the efficiency of our approach, we compared our results with those of the state-of-the-art cluster splitting algorithms that are based on the intensity of the pixel.

1.3.1 Seeded watershed algorithm

This method was applied as presented in MATLAB Marker-Controlled Watershed Segmentation tutorial [22]. It follows a simple procedure. First, we need to use the gradient magnitude as the segmentation function. Then, we need to compute the foreground markers by performing a series of morphological operations (opening, erosion, reconstruction, closing, dilation, reconstruction, and complementing) and finding the local maxima. Next, we need to estimate the background markers with distance transform algorithm. With the foreground and background makers, the watershed transformation needs to be computed on the gradient magnitude with the imposed minima.

1.3.2 “Huang”

The algorithm was presented by Huang et al. [23] and is quite similar to the seeded watershed algorithm. However, the difference is in the calculation of the foreground markers. Instead of using the morphological operations, the foreground markers are estimated by finding the local maxima with the use of distance transform algorithm (calculated on the complementary binary image).

1.3.3 H-minima

This method [24] is based on applying the H-minima transform algorithm to the distance transform algorithm of binary image. We applied a threshold of suppressed minima relative to the 0.8 of the value in the distance transform algorithm. The H-minima transformation was followed by the watershed algorithm.

1.4 DL approach to cluster splitting

DL algorithm is a novel diagnostic tool in the automated detection analysis of histological images; therefore, we compared our method with two methods applying DL algorithms: a pretrained model proposed by Chen et al. [25] and the U-net model that is trained from scratch on ScienceBowl2018 dataset [26]. Both these solutions were run in Python (training and inference performed with Nvidia GeForce GTX 850M) with implementations that are available online.

1.4.1 “Chen”

Chen et al. [25] proposed a modification in U-net model by applying contour enhancement in loss function. The random clip** and rotation was applied for data augmentation. This is a multi-scale model with variations in the size of the input image during training. Since overlapped patch-based strategy was used, the resulting region in the output mask was combined from the inference of multiple patches. We used implementation available online [27].

1.4.2 U-net

In this study, we used the model of U-net as described in the original paper [28] and also followed the procedure made available online [29]. This is a very challenging dataset consisting of nuclei segmented from a large number of images acquired under a variety of conditions. The nuclei vary in cell type, magnification, and imaging modality (brightfield and fluorescence).

U-net model has been created in Keras. Adam optimizer, with a mini-batch size of 4, was used to train the network for 500 epochs. The binary cross-entropy loss function and dice coefficient metric was evaluated during training. Learning rate was set to 1e–5, and the rest of the hyperparameters were left at default values. Since the prediction maps are outputted as grayscale images, we put a threshold of typical value equal to half of the maximum intensity value, namely 127, to get the binary output maps used for evaluation.

2 Materials and methods

All classical image processing approach algorithms were implemented in MATLAB R2015b and are available at http://ibib.waw.pl/en/scientific-activity/projects/167-umo-2013-11-n-st7-02797and on Medical Image Analysis Platform (MIAP) [30] and MATLAB File Exchange.

2.1 Proposed cluster split methods

As cluster split methods cannot be applied to raw images, we first use segmentation to separate objects of interest from the background. In our previous work [31], we tested various adaptive threshold methods and concluded that Bradley’s [32] method yields the most reliable results. Therefore, we used his method in this investigation. Local threshold value based on the intensity of the pixel and its neighborhood is calculated at every point of an image with sliding window image processing. Moreover, we use simple post-processing to remove small artifacts from the output image. The chosen segmentation algorithm yields reasonably smooth boundaries, so no further preprocessing is performed. We excluded objects consisting of less than 50 pixels and more than 10,000 pixels.

The method presented in this article tackles the difficult problem of distinguishing separate objects based only on the shape of the clustered structure. It is mainly based on the distance transform algorithm, cluster classification, and modified watershed algorithm. Assigning a proper class to the cluster with a particular composition results in increased robustness of the procedure. We further developed the foreground marker estimation procedure and used recursive invoking of the algorithm. Figure 2 presents the workflow of the proposed method.

Fig. 2
figure 2

The scheme of processing using our algorithm with miniature images for better understanding. Cropped binary input contains one cluster. Next, the distance transform (D) is calculated. Foreground markers (FGM) are established with distance transform thresholded with D_tresh (value based on maximum value present in distance transform map, parameter T, and the recurrence index recur_idx). Background markers (BGM) are obtained by applying watershed algorithm to distance transform. Then, we impose minima (FGM and BGM) on the gradient magnitude of the input image. If the ridge is present in the result after applying watershed algorithm, then it is used to split the objects. If the resulting objects (tested in terms of area, eccentricity, and roundness) are still classified as clusters, then the algorithm starts again with increased recurrence index

Our proposed method treats clusters one-by-one, separately. The input for our method is a cropped binary image containing currently processed cluster. The features of the cluster’s shape are calculated and are categorized according to Table 1. We proceed with the calculation of the distance transform on the input image, which is a binary mask of cluster. Then, we establish a threshold value (D_tresh) based on the following: maximum value present in the distance transform map (D), threshold modifier (T), and the recurrence index (recur_idx, initially set to 1). The threshold value calculation is correlated with cluster classification as presented in Algorithm 1. The multiplier in the form of the parameter a allows for the most exhaustive processing of clusters with a vast area. As the threshold increment step is decreased, the separation of clustered objects is more precise. The values of the threshold modifier (T) and parameter a were experimentally set to optimize the number of recurrent iterations. Then, the evaluated threshold value is used on the distance transform map creating foreground markers (FGMs). Concurrently, the distance map is treated with watershed algorithm to obtain background markers (BGMs), which are in most cases nonexistent. Next, we calculate the gradient magnitude of the input image, which is a binary map of cluster. Then, we impose minima on the gradient magnitude so that it has regional minima only at certain desired locations, namely at FGMs. Finally, the watershed algorithm is applied. If the ridge is present in the result of watershed algorithm, then its dilated instance is inserted into the input image to split the objects. The resulting objects are tested in terms of area, eccentricity, and roundness. If they are still classified as clusters, then the algorithm starts again for them with the recurrence index set again to 1. Alternatively, if the object was not split, then the algorithm starts again with an incremented recurrence index.

Table 1 The classification of clusters

Because the algorithm is applied recurrently with every cycle, the threshold value is incremented. This results in more restrictive FGMs at every step thereby allowing to discriminate between closely and loosely clumped objects. The increment is set to 1 as it permits relatively dynamic calculations with an acceptable time of processing. Figure 3 presents an example of such recurrent execution.

Fig. 3
figure 3

Example of recurrent call of the proposed method. Example of recurrent call of proposed method where threshold of distance transform cutoff is increased resulting in cluster separation in second iteration. In this case, the processing would stop after second iteration, but here, every possible step of recurrent call is presented. From left to right: binary image of the cluster, contours estimated with all recurrent calls of proposed method, the same contours shown in 3D

2.2 Datasets

To validate our proposed method, we used two image datasets. Both datasets consisted of images with objects (nuclei) with sparse and compact arrangement including many clustered objects. The dataset (BBBC0004) consisting of the synthetic images provides a more controlled environment, whereas the other dataset (IISPV) allows to test how the implementation is dealing with the natural assignment.

2.2.1 Dataset BBBC0004

We used image set BBBC004v1 [33] from the Broad Bioimage Benchmark Collection [34]. To help assess the performance of the algorithms with regard to cluster splitting, the synthetic image set with known probability of overlap was used. This image set consists of five subsets with increasing degree of clustering, with the following overlap probability: 0,0.15,0.30,0.45, and 0.60. From each subset, we randomly selected five images for evaluation purposes. Each grayscale image contains 300 objects, but the objects overlap and cluster with different probabilities in the five subsets. The images were generated with the SIMCEP simulating platform for fluorescent cell population images [35].

2.2.2 Dataset IISPV

In this study, the images of breast cancer tissue used for the validation of the experiments were obtained from the Molecular Biology and Research Section, Hospital de Tortosa Verge de la Cinta, Institut d’Investigacio Sanitaria Pere Virgili (IISPV), URV, Spain, and from the Pathology Department of the same hospital. The dataset was formerly acquired during the project [36], grant number PI11/0488, conducted at the Instituto de Salud Carlos III, Spain, approved by the Ethics Committee of the Hospital Joan XXIII de Tarragona (reference 22p/2011). The subset used in this study was obtained based on the international cooperation within grant number 2013/11/N/ST7/02797 (Polish National Science Center).

The histological tissue microarrays used for image acquisition were prepared from formalin-fixed, paraffin-embedded tissue blocks of the breast, and auxiliary node biopsies coming from the patients with four major molecular subtypes [37]: Luminal A, luminal B, triple negative, and HER2-enriched. Tissue sample preparation is not crucial for this investigation and was extensively described before [38].

The digital images were captured under × 40 magnification on an automated whole slide scanning system (Aperio ScanScope XT). Then, they were split into images of separate punches using two designated programs [39, 40]. Although the images are originally 3-channel Red-Green-Blue (RGB) images, we converted them to Hue-Saturaton-Value color-space and used only value layer for processing, similar to the previous research [41].

All methods presented in this study were applied to the dataset that consisted of a total of 7557 nuclei within 13 randomly selected regions of interest (ROI) extracted from the dataset. Images differ in the degree of complexity, architecture compactness, global contrast, and brightness, which is typical for this type of biological data. To crop ROI out of the original tissue punches, we created two virtual circles, centered with tissue punch, with radius of 500 and 2000 pixels. We randomly draw from the coordinates of the points of circles’ circumference. This point becomes the upper-left corner of the ROI with size 1000×1000 pixels.

For evaluation purposes, based on the manual annotations performed by the two experts, ground truth templates were generated, see Fig. 1. Each expert marked every location of the positive and negative nuclei with an indicator for object-wise evaluation. Moreover, for pixel-wise evaluation, in each ROI, every immunopositive and 31 randomly selected immunonegative objects’ boundaries were marked thereby creating binary mask. The mean of both experts’ annotation was assumed to be the ground truth, creating a set of 808 manually marked nuclei (405 immunopositive and 403 immunonegative).

2.3 Evaluation

In this study, we conducted three types of evaluation. Our results are compared with the ground truth and the results of reference methods, both object-wise and pixel-wise. As there is no database with well-known ground truths (benchmark) that are publicly available, the method evaluation is performed on the available experimental material with the ground truth annotated by the experts. Moreover, we compared performance times of the proposed method with that of the state of the art.

In total, there were 7557 manually marked nuclei in 13 images. For randomly selected 808 nuclei, the boundary was marked as well. True positive (TP) objects are those which have one matching object in manually labeled ground truth. A segmented nuclei region is counted as matched if the manual mark is localized within its area.

For the purposes of evaluation on the level of objects, we used the most common criteria including precision or positive predictive value (PPV), recall (TPR), and F1 score. PPV is the ratio of TPs to the number of detected objects (PPV = TP/(TP+FP)). Sensitivity or true positive rate (TPR) shows what proportion of the objects of interest is found (TPR = TP/(TP+FN)). In case of both metrics, values closer to 1 imply better outcome. F1 score is the harmonic average of precision and recall which ranges between 0 and 1, where higher values represent better method.

Unfortunately, the metrics described above are not always sufficient to discriminate between the two methods. We calculated the following additional pixel-wise criterions introduced by Cui et al. [15]: missing detection rate (MDR), false detection rate (FDR), under-segmentation rate (USR), and over-segmentation rate (OSR). This approach divides the cause for false positives into two types of errors: false detections and over-segmentation. Furthermore, false negatives could be split into two categories: missing detections and under-segmentation. Figure 4 presents examples of all cases.

Fig. 4
figure 4

All possible cases of segmentation (yellow overlay) and ground truth scoring (red dots). From left to right: true positive, false negative — under-segmentation, false negative — missed detection, false positive — over-segmentation, false positive — false detection

In addition, we used the Jaccard index to measure how closely the boundary fits the 808 nuclei reference ground truth.

3 Results

In this study, we measured the performance of the algorithm on microscopic images by comparing the number of nuclei and their segmentation results (binary images) with the annotated ground truth. The performance was evaluated using object-wise and pixel-wise metrics.

The process of cluster splitting was first analyzed by the performance comparison on BBBC0004 dataset. It consists of synthetic images, which allows for the most controlled environment for assessing algorithms’ performance. The dataset is divided into five subsets with increasing overlap probability, from 0 to 0.6; examples are presented in Fig. 5. Based on our analysis, we were able to distinguish strengths and weaknesses of the cluster splitting methods in relation to increasing clustering. Table 2 presents the comparison.

Fig. 5
figure 5

Cluster splitting results comparison. Images in the top row A shows two clusters. The top 1 is an example of class III—objects are heavily packed with high overlap. The lower one in top row is from class II—it is more typical situation, with multiple objects touching and overlapped. In the bottom row B, the cluster is from class IV—numerous objects with various overlap

Table 2 Comparison of performance of cluster splitting methods on BBBC0004 dataset (None—images without applied cluster splitting method)

Furthermore, we evaluated the methods on real images from IISPV dataset. The process of cluster splitting was performed on the results of Bradley thresholding algorithm in 13 ROI. The initial detection was performed by applying threshold to value layer of Hue-Saturation-Value color-space. The mean ratios of possible errors, presented in Fig. 4, were calculated for each method. To prove the superiority of the proposed method, the results were compared with those of segmentation without the application of cluster splitting method and with the results of the baseline seeded watershed algorithm and other methods from the literature. Alongside quality metrics, we also compared the time consumption for the evaluated cluster splitting methods. Table 3 presents the comparison.

Table 3 Performance of the proposed method in comparison to the state-of-the-art frameworks

In the second part of Table 3, we present the comparison of the proposed method and state-of-the-art frameworks for the quantification in histopathology: Tmarker and Qupath. We present two versions of results for the latter. First, we used default parameter values (qupath_def). Second, the expert with experience in histopathology and this software adjusted manually the processing parameters to achieve best results (qupath_exSep). We also compared the results with widely available ImageJ plugin named ICHtoolbox that uses oval-fitted nuclei segmentation and nuclei segmentation and quantification functions are automatically processed. Table 3 also presents the results of DL approaches. The model pretrained by Chen et al. was tested as is. The results of inference were extracted as binary map. The U-net model achieved the following results: loss: 0.0286 and dice coefficient 0.9393, after 500 epochs, on ScienceBowl2018 dataset. The results of inference on IISPV dataset were thresholded with typical in this situation value equal to half of the maximum intensity value, namely 127, to get the binary output maps used for evaluation.

4 Discussion

The performance comparison in controlled environment of the BBBC0004 dataset gave us much insight in strengths and drawbacks of the proposed method. First, we proved that all methods have similar results while there is no overlap (ov00) and the objects are only touching (as presented in Table 2 in section ov00). With increase in the overlap, the effectiveness diminished in terms of all metrics and all methods. When some of the objects overlap (ov15 and ov30), our method achieved best results with TPR, PPV, and F1-score (over 0.95). For images with more overlap** objects creating complex structures, the Huang method had best recall (TPR) value. However, our proposed method, with much better precision, achieved superior F1-score, as presented in Table 2. As the cluster splitting along with segmentation is usually the preliminary step for object classification, it is critical to lower the missed detections as much as possible. This is because it is possible to discriminate the false positive objects during further processing, whereas false negative (missed) objects will not be further processed.

Moving on from synthetic to real images, we validated the superiority of our proposed method on IISPV dataset. Based on the comparison of the performance on IISPV dataset, it was established that the proposed method outperforms other method, both those based on boundary shape and those incorporating the intensity values. We can assume that nuclei detection was best because of the highest F1 score (0.734).

To differentiate the methods even more precisely, we compared metrics focused on cluster splitting. They rely on object numbers, yet they inform about pixel-level statistics. As presented in Table 3, the use of our proposed method significantly lowers the value of USR. The slight increase in the value of OSR could be remedied by introducing subsequent post-processing step. In general, improvement in terms of under-segmentation is always at the cost of deterioration in terms of over-segmentation. The boundary fitting assessed by Jaccard index stays on the similar level for all methods (except “Huang”). In summary, even though reference methods have better values of OSR and USR, the proposed method is the most balanced solution as proved by superior F1 score results.

According to time evaluation, our proposed method is only outperformed by “Huang” method. Because time evaluation is based on a batch of images with size 1000×1000 pixels, “Kong” and “Mouelhi” methods take longer times, which is discouraging. Apart from these two, the other methods have comparable time of processing.

To prove the reliability of the proposed method in this study, we also compared it to the known state-of-the-art frameworks that are capable of producing quantification and objects’ boundary mask for digital histopathology. In this comparison, only Qupath with extensive expert’s manual parameter tuning was able to outperform the proposed method according to some metrics, as presented in Table 3. Tmarker’s results were drastically low in all metrics. The ImageJ plugin has rather satisfactory results in terms of over- and under-segmentation, but on the downside, there are a lot of missed detections. Overall, our approach has best F1 score and MDR with very good results in terms of false detections.

Nowadays, DL is applied to a variety of computer vision problems, among which is nuclei detection and segmentation. We acknowledge the fact that it is a very powerful tool with vast capabilities, but what we tried to show is that it is not a simple task to adapt DL to a new or even a modified problem. Herein, we performed inference with network pretrained on H&E tissue images (Chen). The results are relatively good but do not outperform the proposed method in this study. However, tackling the problem entirely from the starting point and training the deep network from scratch requires a vast amount of resources and labeled data. With limited access to these resources, the results are quite poor. In summary, establishing a DL model with good performance is not a simple solution while tackling the problem with limited data, such as tissue sections stained with DAB&H from IISPV dataset.

Finally, based on our results, we can say that the proposed method in this study outperforms other methods from literature and state-of-the-art frameworks.

5 Conclusion

The accuracy of the employed automated nuclei segmentation technique is critical in obtaining high-quality and efficient diagnostic performance. Unfortunately, algorithms based on thresholding produce results that almost never perfectly fit to real object’s boundary. In addition, the situation where multiple objects are connected (clustered) and counted as one is frequently observed. Therefore, we proposed novel cluster splitting method.

We provide the efficient method for dividing the clustered nuclei in digital images of histopathological slides where results of initial nuclei detection are improved by applying cluster splitting methods. The proposed method in this study achieved better results in terms of average F1 score (0.734) than that of all other referenced methods: seeded watershed, “Huang”, H-minima, “Mouelhi”, and “Kong.” We managed to keep very low both USR and OSR.

Moreover, the established state-of-the-art frameworks also did not outperform our proposed method while automatically processing the data. Only Qupath with manual parameter tuning was able to achieve superior results. Using the DL technique for the instance segmentation of overlap** cell nuclei is not a simple task and requires vast amount of resources and labeled data. Simply using the available solutions results in mediocre performance, especially with even slightly different data (H&E vs. DAB&H staining) as proved in this study.

The primary disadvantage of the proposed method is the strict categorization of the clusters. It is based on the time-consuming statistical analysis of the processed database. In future, we plan to omit this problem with automatic parameter setting. We also plan to test the algorithm on other datasets consisting of different types of cells.

In summary, the primary achievement of this study is the establishment of the processing step that could be incorporated into image processing framework to improve the results of segmentation.