Introduction

Petrographic image analysis involves dividing mineral grains from thin-section petrographic images to identify grain size and shape and then determine the rock’s composition and structure. Petrographic image segmentation is the primal and crucial step for petrographic image analysis and can be reduced to a binary classification task, with images being separated into edge and background categories. However, this simple binary classification task remains an intriguing problem, especially given the blurred and overlap** edges of the three-dimensional grains observed from the two-dimensional image. This becomes even more challenging as the large-scale color and intensity variations of the grains depend on several factors (explained in detail below).

Automated solutions, such as scanning electron microscopy (SEM), are constrained by minerals that are chemically equivalent but have different optical reflectance properties. This is why, in the past, petrographers had to manually segment hundreds of grains by tracing their contours. Besides, this time-consuming and tedious task also requires the petrographer to combine all the features of plane and cross-polarized light at various polarization angles, which is a subjective process that puts a high premium on the petrographer’s expertise and knowledge.

Most anisotropic minerals lack color when seen using a plane-polarized light microscope, but grains’ hue and intensity vary continually when viewed through a cross-polarized light microscope under different angles of polarized light [1]. Due to the continuous variation of angle between the orientation of the crystal axes and the polarized light, one may observe that the brightness of grains peak at a particular polarization angle while completely turning into the dark as it gradually shifts to another angle, which is a phenomenon referred to as the extinction. Other factors affecting the color of the grain include the thickness of thin sections, the optical properties of different grains, and the crystal structure of minerals. Overall, the segmentation results are prone to over- and under-segmentation problems. On the one hand, the edges of neighboring grains may become blurry, as well as the texture of certain types of rock may be mistakenly considered to be the edge. On the other hand, over-segmentation frequently takes place as a result of impurities in the grains being confused for the smaller grains.

Due to the intricacy of grain edge detection, previous approaches fail to generate an accurate and reliable segmentation result. This paper presents an automatic two-class edge segmentation model for the plane and cross-polarized petrographic images. The following paper is organized as follows. “Literature review” is the literature review of previous works on grains segmentation or grains’ edge detection, and “Theoretical background and methodology” is a detailed description of the Extinction Consistency Perception Network proposed in this paper. The relevant ablation experiments and comparison experiments are shown in “Experiments and results”. Finally, “Conclusion and future work” concludes the entire paper.

In this study, we propose a CNN-based computer vision methodology for generating reliable edge segmentation maps, with a specially designed block for the inception of extinction phenomena from several thin section petrographic images under cross-polarized light. A general summary of our primary contributions is as follows:

  • The extinction consistency perception network (ECPN) is proposed and trained from scratch. The main part of this model, namely the multi-scale edge perception network (MEPN), is composed of the modified EfficientNetV2 and the proposed BiDecoder. It has been proven to be an effective framework for improving feature extraction and aggregation.

  • The proposed multi-angle extinction consistency (MAEC) block can be seen as a preprocessing stage to capture the extinction of grains and augment pixels within the same grains into edge-enhanced features. It consists of two parts: the extinction consistency enhancement (ECE) block and the squeeze and excitation (SE) block, which function in the spatial and channel dimensions respectively.

  • The distance map penalized compound loss function is constructed to direct the network’s attention toward the grains’ boundaries. In comparison to the widely used cross-entropy loss function, it penalizes dissimilarity not only in terms of statistical distributions but also the mismatches across overlap zones.

  • A dataset, named the cross-polarized petrographic image datasets (CPPID),Footnote 1 has been shared to the community with precise ground truth of mineral grain edges. Experiments on CPPID demonstrate that the proposed ECPN model outperforms several classical edge detectors by a large margin, achieving 0.940 ODS and 0.941 OIS.

Literature review

We first review several models for detecting grain edges in petrographic images using traditional image processing methods, which may be divided into four types: edge-based methods [1,2,3], energy-based methods [4], region-based methods [5,6,7,8] and machine learning methods [9,10,11].

Edge-based approaches utilize changes in intensity or luminance to determine grain boundaries, but they depend heavily on hand-crafted feature filters and hence cannot ensure boundary closure. Zhou et al. [2] first transmitted phi- and max-images through an enhanced Canny detector. Then, a region-growing method was applied to segment the generated edge maps. In the subsequent stage, the segmented phi- and max-images were merged into a single image that exceeded the phi- and max-images separately. Goodchild et al. [1] extracted edges from a gradient image using a fundamental gradient operator and estimated closed edges utilizing a number of image processing techniques. Heilbronner et al. [3] proposed a method for creating grain boundary maps from petrographic thin sections called lazy grain boundary (LGB). It extracts boundaries using gradient filtering on sets of regular polarized micrographs and combining the most significant grain boundaries on each image in a given input set to produce a single grain boundary map.

The goal of energy-based approaches is to minimize the energy functions, but these approaches are computationally costly and it might be challenging for them to converge on the optimal method. Jungmann et al. [4] seek to reduce the value of an energy function to achieve the segmentation. They suggested extending the MDL-based region merging approach, which merged edge features across nearby areas.

Region-based methods cluster pixels with comparable attributes which are capable of generating tight borders, however, they are imprecise. Feng et al. [6, 7] proposed a two-stage technique that involved first generating superpixels using a modified simple linear iterative clustering (SLIC) algorithm and then combining them using a designed area merging. Utilizing a series of cross-polarized pictures, Lumbreras et al. [8] over-segmented images and combined over-segmented blocks based on the grain’s preferred direction. Zhou [5] first defines a set of edge operators with varying mask sizes before employing a colored edge detection algorithm to extract colored edges with a large neighborhood to reduce noise. The image is then segmented using a seed region growth algorithm based on color edge information and the distance between the edge and non-edge pixels. Finally, an elimination mechanism is developed to merge the two regions’ shared boundaries.

Fig. 1
figure 1

ECPN overall architecture is made up of two parts: MAEC and MEPN. Note that numbers 0–6 in the modified EfficientNetV2 of the MEPN block denote the stage 0–6 of the original EfficientNetV2 network structure

The quality of features typically limits segmentation maps, whereas learning-based approaches detect edges by employing data-supervised algorithms and manually designed features. Fueten et al. [9] modified the output of a standard segmentation procedure. The artificial neural network was used to categorize pixels with varying patterns and color attributes as edge or non-edge. Izadiet al. [10] generated max images from plane light and cross-polarized light images. Then twelve color features extracted from the max images were clustered incrementally to segment the minerals. Rajdeep et al. [11] applied a psychophysics model to obtain the binary segmented output and utilized a k-means clustering algorithm with the selected threshold to generate the final segmentation map.

Recent advances in CNN backbone architecture, such as VGGNet [12], GoogLeNet [13], and ResNet [14], have resulted in remarkable gains for computer vision tasks. Numerous works, including MobileNet [15, 16], Xception [17], Densenet [18], and EfficientNet [19], have made major strides in attaining more accuracy through effective model architectures with lower complexity and better performance, rather than larger and more complicated networks.

Discussing more detail of the most relevant CNN approaches to our work, HED [20] is the first CNN-based edge detection network that provides state-of-the-art performance as a deep supervision extension of FCN [21]. Following that, other efforts [22,23,24,25,26,27,28,29] continued to set records using boosted encoders and altered backbones. RCF [22] enhanced HED’s skip-layer structure and tested it using image pyramids; BDCN [23] proposed an algorithm for monitoring each layer independently at a given scale and making use of dilated convolution to generate multi-scale features; Instead of employing a model that was already trained, Dexined [24] modified the backbone and train it from scratch; as a result, they achieved results that were both competitive and satisfactory. A straightforward, light, and effective edge detection architecture is presented by PiDiNet [29]. This design is built on its proposed pixel difference convolutions, which integrate conventional edge detection filters into normal convolutional operations.

CNN-based methods [30,31,32,33] on semantic segmentation of petrographic images are also noteworthy. Noting that semantic segmentation partitions images into the background and several mineral types, whereas edge detection separates images into two categories: the edge or background.

Rubo et al. [30] utilized discrete convolutional filters, neural networks, and random forest classifiers respectively to generate the semantic segmentation maps of petrographic images. Results were evaluated by the 10-fold cross-validation testing and chemical microscopy results. Tang et al. [31] employed a three-layer neural network, which received plane and cross-polarized light images. The generated segmentation maps are composed of several different rocks. Saxena et al. [32] studied the potential of using a convolutional neural network to predict the pore in rock images (two classes) as well as a 10-class semantic segmentation map. Das et al. [33] created the Deep Semantic Grain Segmentation network (DSGSN) to semantically segment XPL and PPL pictures of sandstone into two classes (i.e., grain and background).

In summary, traditional methods usually apply image processing techniques or machine learning methods to extract useful features and then predict segmentation maps, which are usually limited by hand-designed features. As a result, their prediction results are not satisfactory, while data-supervised CNN algorithms enjoy their end-to-end learning-based paradigm to predict more refined segmentation results. However, CNN-based methods are limited by the large data requirements, as there is no publicly available dataset of petrographic images. In addition, both previous image processing methods and CNN-based approaches ignore the intrinsic properties of mineral grains and their specific optical properties in petrographic images, such as extinction phenomena, which are widely used by geologists for mineral naming and identification. Furthermore, the majority of earlier models typically input a few rock images (plane or cross-polarized, 1–5 input images) without considering the intrinsic connection of the input images.

Theoretical background and methodology

This section discusses the extinction consistency perception network (ECPN), the proposed method for edge detection in thin-section petrographic images that takes a succession of petrographic images and predicts the grain’s edge map. The ECPN model may be subdivided into two sub-networks (as seen in Fig. 1): (1) the multi-angle extinction consistency (MAEC) block is used to fuse the input of seven image patches into a three-channel feature map that is based on the continuous extinction phenomena; (2) the multi-scale edge perception network (MEPN) to enhance feature extraction and strong feature aggregation.

Multi-angle extinction consistency block

The multi-angle extinction consistency (MAEC) block is used to augment pixels within the same particles into an enhanced feature by capturing the extinction phenomena of grains in petrographic pictures with multiple angles of cross-polarized light.

This block is inspired by the tradition of petrologists manually rotating polarizers to observe multiple sequential images to detect and partition grains. Furthermore, a large number of prior studies [2, 5,6,7,8, 10, 31] used multiple images to determine the grain boundaries. To provide the model with additional information about the grains’ edges, we construct the input of this block as seven various angles of polarized petrographic images, which are illustrated in Fig. 5.

Given a stack of input images X, it has seven image patches \(X\in \mathbb {R}^{m\times n\times 21}\), which are separated into three portions of \(X_\textrm{R}\in \mathbb {R}^{m\times n\times 7}\), \(X_\textrm{G}\in \mathbb {R}^{m\times n\times 7}\) and \(X_\textrm{B}\in \mathbb {R}^{m\times n\times 7}\) based on Red, Green, Blue color space. The separated features are then transformed using the proposed extinction consistency enhancement (ECE) block and the squeeze and excitation (SE) block [34], which are aimed to improve feature representation in the spatial and channel dimensions, respectively.

$$\begin{aligned} X_{i}^{\prime } = \textrm{SE}(\textrm{ECE}(X_{i})). \end{aligned}$$
(1)

The ECE block outputs an enhanced feature and is explained in “Extinction consistency enhancement block”. The SE block has proved to be an excellent channel attention operation, enabling the model to automatically learn the value of channel properties. It is divided into two sections: Squeeze and Excitation. The squeeze operation compresses the features by leveraging global average pooling on the feature map layer; the excitation operation generates the weight for each channel using a two-layer bottleneck structure.

The enhanced feature map is used as the input for the next one-layer convolution operation, which condenses the channel of the features into one channel indicating the color space features. Finally, the R, G, and B color space features are combined to create improved grain edges.

$$\begin{aligned} X^{\prime \prime } = \textrm{concat}(W_\textrm{R}X^{\prime }_\textrm{R},W_\textrm{G}X^{\prime }_\textrm{G},W_\textrm{B}X^{\prime }_\textrm{B}), \end{aligned}$$
(2)

where \(W_\textrm{R}, W_\textrm{G}\) and \(W_\textrm{B}\) denote the one-layer convolution transform matrix.

Extinction consistency enhancement block

Extinction is a term used in optical mineralogy and petrology, that refers to the dimming of cross-polarized light, as viewed through a thin section of a mineral under a petrographic microscope [35]. During the extinction, the value of pixels (R, G, B, and gray value) within a grain change continuously, whereas pixels belonging to adjacent grains change differently, which is referred to as the Extinction Consistency in this paper. We create the extinction consistency enhancement (ECE) block to enhance features within the same grains based on the extinction phenomenon as a beneficial reference for grain edge detection. For each pixel \(\left( i,j \right) \) belongs to the input of the MAEC, let \(F_{({i,j})}\in \mathbb {R}^{1 \times 1 \times 7}\) denotes the value vector and \(F_{({i - x,j - y})} \in \mathbb {R}^{1 \times 1 \times 7}\) presents the value vector of its nearby pixels where \(\left( {x,y} \right) \in \Omega = \left\{ {\left\{ {- 1, - 1} \right\} ,\left\{ {- 1,0} \right\} ,\left\{ {- 1,1} \right\} ,\left\{ {0, - 1} \right\} } \right\} \). The below Fig. 2 illustrates the location distribution of pixel \(\left( i,j \right) \) and pixel \(\left( {i - x,j - y} \right) \).

Fig. 2
figure 2

a The location distribution of pixel \(\left( i,j \right) \) and pixel \(\left( {i - x,j - y} \right) \); b illustration of the proposed MAEC block

Since the pixels within the same rock particles have the same trend of continuous extinction, the more similar the value vector \(F_{({i,j})}\) of one pixel to its nearby pixels, the more likely they belong to the same particles. For this reason, we hope a function \(f\left( \cdot \right) \) is able to measure the similarity between \(F_{({i,j})}\) and \(F_{({i - x,j - y})}\).

Considering the amount of computation, we simply apply the inner product between \(F_{({i,j})}\) and \(F_{({i - x,j - y})}\) as the similarity measurement \(f\left( \cdot \right) \). Besides, with the aim of better representation of features, the value vectors are firstly transformed by the linear transformations and then take the inner product between the transformed features to calculate their similarity:

$$\begin{aligned} f\left( {F_{({i,j})},F_{({i - x,j - y})}} \right)= & {} \left( {W_{({i,j})}F_{({i,j})}} \right) ^{T}\nonumber \\{} & {} \left( {W_{({i - x,j - y})}F_{({i - x,j - y})}} \right) , \end{aligned}$$
(3)

where \(W_{({i,j})}\) and \(W_{({i-x,j-y})}\) are the weight matrix for linear transformation.

The softmax function is utilized to normalize the calculated similarity score where \(\alpha _{({i - x,j - y})}\) intuitively measure how similar between the pixel \(\left( i,j \right) \) and pixel \(\left( i-x,j-x \right) \) are and can also be seen as the weight:

$$\begin{aligned}{} & {} \alpha _{({i - x,j - y})} = \frac{\exp \left( {f\left( {F_{({i,j})},F_{({i - x,j - y})}} \right) } \right) }{\sum _{{({x,y})} \in \Omega }{\exp \left( {f\left( {F_{({i,j})},F_{({i - x,j - y})}} \right) } \right) }}. \end{aligned}$$
(4)

In the next step, we fuse the nearby pixels’ transformed value vector \(W_{({i - x,j - y})}F_{({i - x,j - y})}\) by multiplying and adding them based on weights:

$$\begin{aligned}{} & {} F_{({i,j})}^{\prime } = \sum _{{({x,y})} \in \Omega } \alpha _{({i - x,j - y})}W_{({i - x,j - y})}F_{({i - x,j - y})}. \end{aligned}$$
(5)

Finally, the aggregated feature \(F_{({i,j})}^{\prime }\) is concatenated with the value vector \(F_{({i,j})}\) and the ReLU activation function is employed to obtain a powerful feature representation \(F_{({i,j})}^{\prime \prime }\) of pixel \(\left( i,j \right) \):

$$\begin{aligned}{} & {} F_{({i,j})}^{\prime \prime } = \textrm{ReLU}\left( \textrm{concat}\left( {F_{({i,j})},F_{({i,j})}^{\prime }} \right) \right) . \end{aligned}$$
(6)

For the purpose of the computation reduction, all calculations are conducted in parallel and feature vectors \(F_{({i-x,j-y})}\) are generated by the accelerating scheme proposed by Dai et al. in [36].

Multi-scale edge perception network

The multi-scale edge perception (MEPN) network can be broken down into two distinct components: (1) the encoder developed from EfficientNetV2 had a series of stages for extracting grain edge information with different scales effectively. (2) The BiDecoder was built to make use of the multi-level characteristics and make an accurate prediction of the edge map.

Efficient encoder: We customized the EfficientNetV2 as the encoder component of MEPN due to its parameter amount and effective performance. The following adjustments are made: (1) applying EfficientNetV2-s as the model with less parameter fit for our small dataset; (2) utilizing features maps from stages 1–6 (except stage 4) as the inputs of the BiDecoder block with ascending receptive fields. Stage 4 is not used because it is the first stage to utilize the MBconv6 block and its capabilities are not stable and informative. (3) replacing the final classifying section into a semantic segmenting stage.

BiDecoder: It is vital and necessary to merge different scale features to generate an accurate edge map, while they are less inefficient due to the fact that features only migrate from small to large feature spatial maps. Recently, He et al. [23] utilized the loss function to ensure bi-directional feature flow, which empowers the model to predict multi-scale edge segmentation maps. Inspired by this, we attempted to enhance feature representation from the standpoint of decoder structure by utilizing both top-down and bottom-up information circulation channels and adding an extra skip link.

Fig. 3
figure 3

The detailed architecture of BiDecoder. The input features from stages 1, 2, 3, 5, and 6 with ascending receptive fields and the output is concatenated hierarchical features

As illustrated in Fig. 3, the BiDecoder is connected to the above encoder’s five side outputs \(h_{i}, i=1,2,3,4,5\). Then each input \(h_{i}\), except the bottom one with the smallest spatial size (\(h_{1}\)), is added with the features downsampled from transposed convolution (the yellow square). This is followed by a separable convolution block (the pink square) and a Batch Normalization operation.

$$\begin{aligned} h_{i}^{1} = {\left\{ \begin{array}{ll} h_{i} &{} i=1 \\ \textrm{BN}(\textrm{SepCov}(h_{i} + \textrm{Trans}(h_{i-1}))) &{} i=2,3,4,5 \\ \end{array}\right. } \end{aligned}$$
(7)

After that, the features \(h_{i+1}^{'}\) with larger spatial size were downsampled and re-added to feature \(h_{i}^{'}, i=1,2,3,4\). Additionally, there is a skip connection (the orange arrow) between the feature \(h_{i}, i=2,3,4\) and feature \(h_{i}^{1}, i=2,3,4\) intended to improve feature propagation. Then a series of operation transform and upsample the generated features which can be formulated as follow:

$$\begin{aligned} h_{i}^{''} = {\left\{ \begin{array}{ll} \textrm{Trans}(\textrm{Conv}(\textrm{BN}(\textrm{SepCov}(h_{i}^{'} + \textrm{Conv}(h_{i+1}^{'}))))) &{} i=1 \\ \textrm{Trans}(\textrm{Conv}(\textrm{BN}(\textrm{SepCov}(h_{i}^{'} +\textrm{Conv}(h_{i+1}^{'}))) + h_{i})) &{} i=2,3,4 \\ \textrm{Trans}(\textrm{Conv}(h_{i}^{'})) &{} i=5 \\ \end{array}\right. }.\nonumber \\ \end{aligned}$$
(8)

Finally, all features from different levels are concatenated together as the final result. It’s worth mentioning that the last convolution layer compresses the channel into one, followed by the Sigmoid activation to generate the final two classes (edge and background) segmentation map.

$$\begin{aligned} \textrm{Output} = \textrm{Sigmoid}(\textrm{Conv}(\textrm{BN}(\textrm{Concat}(h_{i}^{''})))). i=1,2,3,4,5.\nonumber \\ \end{aligned}$$
(9)

Distance map penalized compound loss function

Grain edge segmentation is a challenging class imbalance problem because edge pixels account for less than 10% of total pixels, with non-edge pixels accounting for the majority of pixels. As a result, we first inherit the extensively used class-balanced cross-entropy loss function [20, 22,23,24,25,26,27,28,29]. Let X refer to the input image patch stack, \(Y\in \left[ 0,1\right] \) as the ground truths, and \(Y^{\prime }\) as the predicted grain edge map. \(Y^{\prime } = \left[y_{1}^{\prime },~y_{2}^{\prime },\ldots ,~y_{N}^{\prime }~ \right]\) , where \(y_{i}^{\prime } \in \left[0,1 \right]\) represents the probability that a pixel belongs to an edge and N denotes the number of pixels in the predicted edge map. The following equation describes the class-balanced cross-entropy loss function:

$$\begin{aligned}&l_{1}\left( X,W \right) = -\beta \sum \limits _{i \in Y^{+}} \log \sigma \left( y_{i}^{\prime } = 1 \mid X;W \right) - \left( 1 - \beta \right) \nonumber \\&\quad {\sum \limits _{i \in Y^{-}} \log \left( y_{i}^{\prime } = 0 \mid X;W \right) }, \end{aligned}$$
(10)

where \(\beta = |Y^{+} |/ |Y |\) and \(1-\beta = |Y^{-} |/ |Y |\). where \(|Y^{+} |\) and \(|Y^{-} |\) denotes the edge and non-edge ground truth pixel sets, \(\sigma ({\cdot })\) represents the sigmoid activation function, and W defines the set of all network parameters.

The main drawback of \(l_{1}\left( X,W \right) \) is that it only penalizes differences between two statistical distributions based on the entire image, ignoring discrepancies in the regions where the predicted segmentation map Y and the ground truth Y overlap. To add additional information on the edge, we introduce a distance map penalty loss function that directs the network to concentrate on the object boundaries during training. The equation that follows explains how to define it:

$$\begin{aligned}{} & {} l_{2}\left( {X,W} \right) = - \sum _{i = 0}^{N} D_{i}\log \sigma \left( {y_{i}^{\prime }}\mid X;W \right) , \end{aligned}$$
(11)
$$\begin{aligned}{} & {} D = \alpha \left( ~1/\left( {D^{\prime } + \epsilon } \right) \right) , D_{i} \in D,~i \in \left\{ 1,2,\ldots ,~N \right\} , \end{aligned}$$
(12)

where D is the distance penalty mask and \(D_{i} \in D,~i \in \left\{ 1,2,\ldots ,~N \right\} \) represents the distance-map penalized weight of pixel i; \(D^{\prime }\) is the distance map, transformed from the inverted ground truth mask, which Euclidean Distance Map** algorithm [37] is applied. D, \(D^{\prime }\) are illustrated in Fig. 4; \(\epsilon \) is a parameter and is set as 1e−9 with the purpose of avoiding the value of \(\left( {D^{\prime } + \epsilon } \right) \) equal to zero; \(\alpha ({\cdot })\) is an assignment function which is defined as follow:

$$\begin{aligned} \alpha \left( x \right) = \left\{ \begin{matrix} {x,~x \ne 0} \\ {1,x = 0} \\ \end{matrix} \right. \end{aligned}$$
(13)
Fig. 4
figure 4

Visual display of intermediate results of \(l_{2}\left( X,W\right) \). a The ground truth Y; b the inverted ground truth; c the distance map \(D^\prime \) transformed from the inverted ground truth; d the distance penalty mask D, note that the edge width of D is much wider than the ground truth Y because of the non-zero weight near edges

The proposed distance map penalized loss function \(l_{2}\left( {X,W} \right) \) is coupled with the class-balanced cross-entropy loss function \(l_{1}\left( X,W \right) \), so as to reduce the training instability. Therefore, the distance map penalized compound loss function \(l \left( X,W \right) \) is shown in the following equation:

$$\begin{aligned}{} & {} l\left( {X,W} \right) = ~l_{1}\left( {X,W} \right) + ~l_{2}\left( {X,W} \right) . \end{aligned}$$
(14)

Evaluation metrics

We utilize the same F-measure and accuracy (A) that were specified in classic studies [6] to evaluate our proposed model. The evaluation metrics are as follows:

$$\begin{aligned}{} & {} P = \textrm{TP}/\left( \textrm{TP} + \textrm{FP} \right) , \end{aligned}$$
(15)
$$\begin{aligned}{} & {} R = \textrm{TP}/\left( \textrm{TP} + \textrm{FN} \right) , \end{aligned}$$
(16)
$$\begin{aligned}{} & {} F = \frac{2PR}{\left( \textrm{P} + \textrm{R} \right) }, \end{aligned}$$
(17)
$$\begin{aligned}{} & {} A = \frac{\textrm{TP} + \textrm{TN}}{\textrm{TP} + \textrm{TN} + \textrm{FP} + \textrm{FN}}, \end{aligned}$$
(18)

where TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively, and P and R denote precision and recall.

Given an edge probability map, a threshold is used to produce the predicted map. If the probability is above this threshold, the pixel will be assumed as the true edge and vice versa. There are two ways to set this threshold. One is called the optimal dataset scale (ODS) which employs a fixed threshold of all test images and the other is known as the optimal image scale (OIS) which calculates an optimal threshold for every image. In this paper, we use the F-measure of both ODS and OIS to measure the segmentation performance.

The F-measure penalizes the model according to a statistical distribution. The average IOU is used as an additional quantitative model performance indicator to monitor the area where the ground truth and predicted grain edge maps overlap.

$$\begin{aligned}{} & {} IoU = \frac{1}{N}\sum \limits _{i = 1}^{N} \frac{G_{i}\bigwedge P_{i}}{G_{i}\bigvee P_{i}}, \end{aligned}$$
(19)

where \(P_{i}\) and \(G_{i}\) are the predicted edge map and ground truth edge map pixels, respectively, and N is the total number of pixels in the segmentation edge map.

Materials and implementing methods

Cross-polarized petrographic image datasets (CPPID)

The carefully annotated cross-polarized petrographic image dataset for grain edge detection is one of the paper’s key contributions. These datasets were collected using a CAIKON XP-330C polarizing microscope equipped with a Daheng Image MER-2000-19U3C camera. CPPID includes five different kinds of thin sections of rock. Each rock thin section was captured using 25 different fields of view, with images being taken at seven different polarization angles (0\(^{\circ }\), 15\(^{\circ }\), 30\(^{\circ }\), 45\(^{\circ }\), 60\(^{\circ }\), 75\(^{\circ }\), and 90\(^{\circ }\)) for each field of view. There are 875 petrographic images total in the dataset, each measuring \(4666 \times 3672 \times 3\). Table 1 displays detailed information on the original dataset.

Table 1 The introduction of cross-polarized petrographic image datasets (CPPID)

Preprocessing

To increase the perceptual field of the CNN and account for the GPU hardware’s memory constraints, we utilized several pre-processing techniques on the raw dataset, such as image resizing, image patch crop**, and train set and test set division.

We first decreased the data size of inputs to set a high level of batch size to accelerate the training of the model. Given the relatively high resolution of the original image, we reduced the data size by sampling it to a quarter of its original size (2048 \(\times \) 1536 \(\times \) 3) without losing the image’s main information. Since the down-sampling image is still of high resolution, 35 image patches (512 \(\times \) 512 \(\times \) 3) were obtained from the down-sampling image by window sliding clip** operation. With each image patch set containing seven image patches with various polarization angles, we were able to obtain 4375 image patch sets. Another advantage of the window sliding clip** operation is that the object edges are low-level semantics, so unlike other high-level semantic tasks, this operation on the edge detection dataset does not generally introduce image information loss and model accuracy degradation (e.g. segmentation, detection, etc.).

Then the training and test sets were divided in a 4:1 ratio on the image patch sets, with the former being used for model training only and the latter for model testing only. Considering the small number of datasets, we discarded the validation set and instead used the test set to supervise the model training metrics at the same time. The relative ratio of the test set to the training set between 10 and 30% is reasonable. The influences of the above-mentioned operations on datasets can be found in Table 1.

Finally, with the help of Southwest Jiaotong University (SWJTU) petrologists, the grain edges were also meticulously labeled, and the images were downscaled and cropped using the same process.

The generation of the max image

References [1, 2, 5, 9, 10] made use of Max petrographic images since they provided more information on the grain edges. Since there isn’t a CNN model specifically created for grain edge detection, we used this type of image as input to a conventional natural image model for edge detection to compare it with the proposed framework. The fused maximum image, the image patches with the seven different polarization angles, and their corresponding ground truths are shown in the following Fig. 5.

Fig. 5
figure 5

Graphical representations of CPPID dataset. From top to bottom, each line represents quartz sandstone, micritic limestone, coarse-grained quartz sandstone, magnetite quartzite, and quartzite respectively. ag 0\(^{\circ }\), 15\(^{\circ }\), 30\(^{\circ }\), 45\(^{\circ }\), 60\(^{\circ }\), 75\(^{\circ }\), and 90\(^{\circ }\) polarized light image patches; h fused max image; i ground truth

Implementation details

Pytorch was utilized to carry out the implementation. Using the Adam optimizer with a batch size of 4, the model was able to converge after 120 epochs. Note that the relatively small batch size 4 was set because of the limitation of GPU resources. A larger batch size may benefit the model with better accuracy and faster convergence. However, it is important to note that the learning rate needs to be adjusted accordingly when changing the batch size. The initial learning rate was 1e−6 then adjusted using a Warm-start Cosine Annealing schedule [

Data availability

The data that support the findings of this study is available at https://github.com/ELOESZHANG/ECPN.

Notes

  1. Codes and datasets are available at https://github.com/ELOESZHANG/ECPN.

References

  1. Goodchild JS, Fueten F (1998) Edge detection in petrographic images using the rotating polarizer stage. Comput Geosci 24(8):745–751

    Article  ADS  Google Scholar 

  2. Zhou Y, Starkey J, Mansinha L (2004) Identification of mineral grains in a petrographic thin section using phi-and max-images. Math Geol 36(7):781–801

    Article  Google Scholar 

  3. Heilbronner R (2000) Automatic grain boundary detection and grain size analysis using polarization micrographs or orientation images. J Struct Geol 22(7):969–981

    Article  ADS  Google Scholar 

  4. Jungmann M, Pape H, Wißkirchen P, Clauser C, Berlage T (2014) Segmentation of thin section images for grain size analysis using region competition and edge-weighted region merging. Comput Geosci 72:33–48

    Article  ADS  Google Scholar 

  5. Zhou Y, Starkey J, Mansinha L (2004) Segmentation of petrographic images by integrating edge detection and region growing. Comput Geosci 30(8):817–831

    Article  ADS  Google Scholar 

  6. Jiang F, Gu Q, Hau H, Li N (2017) Grain segmentation of multi-angle petrographic thin section microscopic images. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 3879–3883

  7. Jiang F, Gu Q, Hao H, Li N, Wang B, Hu X (2018) A method for automatic grain segmentation of multi-angle cross-polarized microscopic images of sandstone. Comput Geosci 115:143–153

    Article  ADS  Google Scholar 

  8. Lumbreras F, Serrat J (1996) Segmentation of petrographical images of marbles. Comput Geosci 22(5):547–558

    Article  ADS  Google Scholar 

  9. Fueten F, Mason J (2007) An artificial neural net assisted approach to editing edges in petrographic images collected with the rotating polarizer stage. Comput Geosci 33(9):1176–1188

    Article  ADS  Google Scholar 

  10. Izadi H, Sadri J, Mehran N-A (2015) A new intelligent method for minerals segmentation in thin sections based on a novel incremental color clustering. Comput Geosci 81:38–52

    Article  ADS  Google Scholar 

  11. Das R, Shankar BU, Chakraborty T, Ghosh K (2021) Automatic grain segmentation in cross-polarized photomicrographs of sedimentary rocks using psychophysics inspired models. Innov Syst Softw Eng 17(2):167–183

    Article  Google Scholar 

  12. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. ar**v preprint ar**v:1409.1556

  13. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  14. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  15. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) MobileNets: efficient convolutional neural networks for mobile vision applications. ar**v preprint ar**v:1704.04861

  16. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) MobileNetv2: inverted residuals and linear bottlenecks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4510–4520

  17. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1251–1258

  18. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  19. Tan M, Le Q (2019) EfficientNet: rethinking model scaling for convolutional neural networks. In: International conference on machine learning. PMLR, pp 6105–6114

  20. **e S, Tu Z (2015) Holistically-nested edge detection. In: Proceedings of the IEEE international conference on computer vision, pp 1395–1403

  21. Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3431–3440

  22. Liu Y, Cheng M-M, Hu X, Wang K, Bai X (2017) Richer convolutional features for edge detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3000–3009

  23. He J, Zhang S, Yang M, Shan Y, Huang T (2019) Bi-directional cascade network for perceptual edge detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3828–3837

  24. Poma XS, Riba E, Sappa A (2020) Dense extreme inception network: towards a robust CNN model for edge detection. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision, pp 1923–1932

  25. Maninis K-K, Pont-Tuset J, Arbeláez P, Gool LV (2016) Convolutional oriented boundaries. In: European conference on computer vision. Springer, pp 580–596

  26. Wang Y, Zhao X, Huang K (2017) Deep crisp boundaries. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3892–3900

  27. Kelm AP, Rao VS, Zölzer U (2019) Object contour and edge detection with refinecontournet. In: International conference on computer analysis of images and patterns. Springer, pp 246–258

  28. Deng R, Liu S (2020) Deep structural contour detection. In: Proceedings of the 28th ACM international conference on multimedia, pp 304–312

  29. Su Z, Liu W, Yu Z, Hu D, Liao Q, Tian Q, Pietikäinen M, Liu L (2021) Pixel difference networks for efficient edge detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 5117–5127

  30. Rubo RA, de Carvalho Carneiro C, Michelon MF, dos Santos Gioria R (2019) Digital petrography: mineralogy and porosity identification using machine learning algorithms in petrographic thin section images. J Petrol Sci Eng 183:106382

    Article  CAS  Google Scholar 

  31. Tang DG, Milliken KL, Spikes KT (2020) Machine learning for point counting and segmentation of arenite in thin section. Mar Petrol Geol 120:104518

    Article  Google Scholar 

  32. Saxena N, Day-Stirrat RJ, Hows A, Hofmann R (2021) Application of deep learning for semantic segmentation of sandstone thin sections. Comput Geosci 152:104778

  33. Das R, Mondal A, Chakraborty T, Ghosh K (2022) Deep neural networks for automatic grain-matrix segmentation in plane and cross-polarized sandstone photomicrographs. Appl Intell 52(3):2332–2345

  34. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7132–7141

  35. Ness W (1991) Introduction to optical mineralogy. Oxford University Press, New York

    Google Scholar 

  36. Dai Y, Wu Y, Zhou F, Barnard K (2021) Attentional local contrast networks for infrared small target detection. IEEE Trans Geosci Remote Sens 59(11):9813–9824

    Article  ADS  Google Scholar 

  37. Danielsson P-E (1980) Euclidean distance map**. Comput Graph Image Process 14(3):227–248

    Article  Google Scholar 

  38. Loshchilov I, Hutter F (2017) Decoupled weight decay regularization. ar**v preprint ar**v:1711.05101

  39. Winnemöller H (2011) XDoG: advanced image stylization with extended difference-of-gaussians. In: Proceedings of the ACM SIGGRAPH/Eurographics symposium on non-photorealistic animation and rendering, pp 147–156

  40. Dollár P, Zitnick CL (2014) Fast edge detection using structured forests. IEEE Trans Pattern Anal Mach Intell 37(8):1558–1570

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant 62075031, and in part by the Intelligent Terminal Keys Laboratory of Sichuan Province Open Project under Grant SCITLAB-0016 and Grant SCITLAB-0009.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to ** Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, P., Zhou, J., Zhao, W. et al. The edge segmentation of grains in thin-section petrographic images utilising extinction consistency perception network. Complex Intell. Syst. 10, 1231–1245 (2024). https://doi.org/10.1007/s40747-023-01208-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40747-023-01208-y

Keywords