Introduction

With the advent of the era of big data, it is becoming more and more important to obtain valuable and potential knowledge and information from massive data. Cluster analysis is a multivariate statistical method that is divided into groups. According to the degree of similarity between each abstract object, it is divided into several groups, and similar objects are combined into one set. Clustering [1] is a process in which each data point in a data set is aggregated to several centers of the same feature. That is, the process of dividing a data set into classes consisting of similar objects. Therefore, cluster analysis is often used as a preprocessing for other data mining operations [2]. Because cluster analysis techniques can mine useful and implicit undiscovered information and knowledge from large amounts of raw data, it has been widely used in many applications, including image pattern recognition, Wed search, biology and security, and data analysis, text mining and other fields. Generally, clustering algorithm based on partition, clustering algorithm based on hierarchy, clustering algorithm based on density, clustering algorithm based on grid, followed by neural grid clustering algorithm, clustering algorithm based on statistics and fuzzy clustering algorithm [3,4,5,6,7,8,9]. K-means algorithm is one of the most widely used clustering algorithms based on partition [10]. The Euclidean distance method is used to calculate the minimum of the sum of the squares of the distances between the data points and the cluster center points as the objective function of the algorithm. The K-means algorithm has a fast convergence speed, a simple algorithm and can effectively process large data sets [11]‚ however, it is necessary to manually determine the number of clusters and randomly select the initial cluster centers. The DBSCAN algorithm [12] is the most classic density-based clustering algorithm. The DBSCAN algorithm requires two parameters: the density threshold parameter and the distance parameter. The disadvantage is that the parameter affects the cluster quality and cannot process the high-dimensional data well. Secondly, if the sample set has uneven density and large difference between clustering spacing, it will lead to poor clustering quality. Another density clustering algorithm is a new density peak clustering algorithm DPC (density peak clustering) proposed by Rodriguez and Laio [13]. Although the algorithm is simple and efficient, there is no need to calculate the objective function iteratively to determine the cluster center. However, the cutoff distance and cluster center of the algorithm must be manually set or randomly set according to experience. The choice of cutoff distance determines the algorithm. The calculation of local density and relative distance, which in turn, affects the clustering results. Therefore, finding the optimal value of the cutoff distance is a challenge. In order to improve DPC algorithm, some scholars have put forward their own ideas in the past 3 years. There are many ways to set truncation distance manually in these studies. For example, Xu [14] proposed the k-nearest neighbor algorithm based on Gauss kernel to calculate the local density and use the difference vector to automatically select the cluster center. Du [15] proposed to combine FJP algorithm and use fuzzy neighborhood relations to define fuzzy joint points to calculate local density. Li [16] proposed a new comparative model to replace the calculation of the original relative distance in the clustering of comparative density peaks. Bai [17] proposed an accelerated algorithm based on concept approximation with less distance calculation and then proposed an approximate algorithm based on sample clustering to obtain the approximate clustering results of the original algorithm quickly in order to take into account the scalability of the algorithm. Swarm intelligence algorithm has great advantages in solving global optimum and is insensitive to parameters, so swarm intelligence algorithm is used to optimize the intercept and clustering center. Swarm intelligence algorithms include ant colony optimization algorithm [18], artificial bee colony algorithm [19], firefly algorithm [20], particle swarm optimization algorithm [21], teaching and learning optimization algorithm [22], etc. Among them, the TLOB algorithm is an optimization tool inspired by the knowledge transfer mechanism of teachers and students in the classroom. Compared with other optimization algorithms, the most prominent feature of TLBO algorithm is that special control parameters that do not need to be adjusted before execution. The PSO algorithm needs to adjust speed and inertia weights, and the ABC algorithm requires three individuals. GA algorithm needs to adjust crossover rate and mutation rate. However, the TLBO algorithm has the disadvantages of being easily trapped in a local optimum and slow convergence. In order to increase the diversity of new populations, Zou [23] proposed a teaching and learning optimization algorithm (DRLTLBO) for differential and exclusion learning. Ender [24] proposed to combine extreme machine learning with TLBO to solve data classification problems. Kumar [25] used a variety of chaotic mechanisms to solve the problem of TLBO algorithm lacking trade-off between premature convergence and local search and global search. Bourahla [26] used the random local search mechanism of the bat algorithm to the TLBO algorithm and proved its effectiveness. Niu [27] proposed an improved teaching-based optimization algorithm (MTLBO) based on the actual “teaching learning” situation to speed up the convergence of the algorithm. Zhang [28] integrated the number spiral strategy and the triangle mutation rule into the TLBO algorithm in order to improve the shortcomings of the TLBO convergence rate and the tendency to fall into the local optimal solution. At present, many learners apply group intelligence algorithms combined with clustering algorithms to data mining. Anari proposed a new clustering algorithm based on learning automata and ant colony intelligence and solved the problem of optimum placement of data items in grid by LA method [29]. Majdi used genetic algorithm and particle swarm optimization to optimize the pre-determined FCM clustering model, which improves the accuracy of model estimation [30]. Zang [31] proposed a new clustering method based on potential entropy of data field to extract the optimal threshold value and uses Gauss method to automatically determine the clustering center. In this paper, TLBO has the advantages of fast convergence and fast search speed to optimize the cutoff distance in DPC algorithm. On this basis, a clustering algorithm combining TLBO algorithm and density gap is proposed (NSTLBO-DGDPC). Because Euclidean distance calculation does not consider the influence of data point attributes and neighborhood, a density gap distance instead of Euclidean distance. Then, the standard deviation of high-density distance is used to determine the clustering center of the clustering algorithm, which can avoid the influence of manual selection of clustering center on clustering results. Finally, in order to avoid the TLBO algorithm falling into local optimum, niche selection strategy is introduced to exclude similar values when the population density reaches a certain threshold, and then the nonlinear decline strategy is used to update the individual students in the teaching and learning stages to find the optimal dc value.

The main contributions of this paper are summarized as follows:

  1. (1)

    In this paper, we proposed to use the weighted density difference distance instead of the Euclidean distance to calculate the local density of the algorithm.

  2. (2)

    In order to be able to correctly select the clustering center, the final clustering center is selected by using the standard deviation.

  3. (3)

    In order to solve the DPC algorithm manually set the truncation distance, use the TLBO algorithm to find the optimal dc value.

  4. (4)

    TLBO algorithm is easy to fall into local optimal. Therefore, in this paper, we propose to introduce a niche selection strategy to eliminate similar values when the population density reaches a certain threshold and then use a nonlinear decreasing strategy to update the individual students in the teaching and learning stages and finally obtain the optimal dc solution.

  5. (5)

    NSTLBO algorithm improves the accuracy and convergence of the algorithm. The simulation function of the benchmark function proves that the NSTLBO algorithm has better performance.

  6. (6)

    Using the synthetic data set and the real data set to verify the proposed fusion teaching and learning optimization algorithm and the clustering algorithm of density difference distance, it is shown that the algorithm can calculate the number of real data sets without manually setting the dc value. The ACC, AMI, and ARI values of the proposed algorithm have improved, indicating that the algorithm has better clustering quality.

The organization of this paper is as follows: in the second section, the DPC algorithm and the TLBO algorithm are briefly introduced. In third section, a clustering algorithm that combines teaching and learning optimization algorithms and density difference distances is introduced. In subsection, the density difference distance and standard deviation optimization selection clustering center of the clustering algorithm are introduced. The improved TLBO algorithm (NSTLBO) is introduced in subsection. The algorithm steps are described in subsection. The simulation experiments and analysis of the algorithm in this paper are discussed in fourth section. Finally, the fifth section covers conclusion and future work.

Preliminaries

Density Peak Clustering Algorithm DPC

The algorithm is based on two simple and intuitive assumptions. Cluster centers are surrounded by neighborhoods with lower local density, while the distance between cluster centers and any points with higher local density is relatively large. According to these two hypotheses, the algorithm first finds the potential density peak, labels it as the cluster center and then assigns other points to the cluster. For each data point xi, DPC clustering algorithm must calculate the distance between local density ρi and high-density point δi. The local density of data points is defined as Eq. (1):

$$\rho_{i} = \sum\limits_{j} {\chi \left( {d\left( {x_{i} ,x_{j} } \right) - d_{c} } \right)} ,\quad \chi \left( x \right) = \left\{ {\begin{array}{*{20}c} {1,\quad x < 0} \\ {0,\quad x \ge 0} \\ \end{array} } \right.$$
(1)

dc is cutoff distance. d(xi, xj) is the Euclidean distance between data point xi and data point xj The δi of data point xi is determined by Eq. (2).

$$\delta_{i} = \left\{ {\begin{array}{*{20}l} {{\min} \left( {d\left( {x_{i} ,x_{j} } \right)} \right),} \hfill & {\rho_{j} > \rho_{i} } \hfill \\ {{\max} \left( {d\left( {x_{i} ,x_{j} } \right)} \right),} \hfill & {x_{j} \in X,\;{\text{otherwise}}} \hfill \\ \end{array} } \right.$$
(2)

Teaching and Learning Optimization Algorithms

TLBO algorithm is proposed inspired by the teaching process. It has two important stages: teaching stage and learning stage. The algorithm uses these two stages to influence the population to find the optimal solution. The results of each student in the class are mapped to the solution space. The results of each student are the fitness value of the solution. The best students are regarded as teachers, and then other students are taught to improve their learning performance. The space dimension of the solution is d, the total number of students (population) is n, xi is the ith student, f(xi) is the academic achievement (fitness value), and xteacher is the teacher. \(mean = \frac{1}{{n}}\sum\nolimits_{i = 1}^{n} {X_{i} }\) is expressed as the average of students.

In the teaching stage: among all the students, the students with the best scores are selected as the teachers of the next round of learning process. The teachers improve the performance of the whole class by teaching the students’ knowledge to make the students’ grades close to their own learning level. Any student in the class after the teacher stage, the difference between the teacher and student averages in the tth iteration is shown in Eq. (3):

$$Difference\left( t \right) = r_{i} \times \left( {x_{teacher} \left( t \right) - TF \times mean\left( t \right)} \right)$$
(3)
$$TF = round\left[ {1 + rand\left( {0,1} \right)} \right],\quad r_{i} = rand\left( {0,1} \right)$$
(4)

where \(TF\) takes the value 1 or 2, and t is the number of iterations. The resulting description of the teacher knowledge level update is as shown in Eq. (5):

$$x_{i + 1} \left( t \right) = x_{i} \left( t \right) + Difference\left( t \right)$$
(5)
$${\text{If}}\;f\left( {x_{i + 1} \left( t \right)} \right) > f\left( {x_{i} \left( t \right)} \right),\;{\text{then}}\;x_{i} (t) = x_{i + 1} (t),\;{\text{otherwise}}\;x_{i} (t) = x_{i} (t)$$
(6)

In the learning phase: any student in the class randomly searches for other students xj in the class according to their own understanding of the learning content and enhances their knowledge understanding ability. If the student’s xj score is better than the student’s xi score, learn from the student xj and improve his or her knowledge. In the learning phase, the student update description formula is obtained by communicating with different students as shown in Eqs. (7, 8):

$${\text{If}}\;f\left( {x_{i} \left( t \right)} \right) < f\left( {x_{j} \left( t \right)} \right),\;{\text{then}}\;x_{i + 1} \left( t \right) = x_{i} \left( t \right) + r_{i} \left( {x_{j} \left( t \right) - x_{i} \left( t \right)} \right)$$
(7)
$${\text{If}}\;f\left( {x_{i} \left( t \right)} \right) > f\left( {x_{j} \left( t \right)} \right),\;{\text{then}}\;x_{j} \left( t \right) = x_{j} \left( t \right) + r_{i} \left( {x_{i} \left( t \right) - x_{j} \left( t \right)} \right)$$
(8)

Algorithm 1: TLBO algorithm

  • Step 1: Initialize the student population n, dimension d;

  • Step 2: Calculate the student average mean, xteacher;

  • Step 3: Update the student individual by using Eq. (5), and calculate the updated student individual score;

  • Step 4: If the individual score of the student is improved after the update, the student individual replaces the student individual before the update with Eq. (6);

  • Step 5: Randomly select one student individual to compare with another student individual and use Eqs. (7) and (8) to update the student individual and calculate the student individual score;

  • Step 6: If the individual achievement of the student is improved after the update, the updated student individual replaces the student individual before the update;

  • Step 7: If the algorithm satisfies the iterative condition, input the optimal student individual;

Niche Selection Strategy

Niche is originated from biology. That is to say, the same kind of creatures forms a small living environment and separate different kinds of individuals [32]. This idea is applied to the neighborhood of computing science, which is to divide the data with the same data characteristics into one kind and separate the data with different characteristics. DeJong [33] proposed a niche method based on the exclusion mechanism in 1975, that is, in a limited space, various biological individuals can survive and compete for survival resources. Applying this idea to intelligent algorithms can prevent the algorithm from premature convergence and improve population diversity.

Algorithm 2: Selection strategy based on crowding mechanism niche algorithm

  • Step 1: Set the total number of groups, the crowding factor CF (value 2 or 3);

  • Step 2: randomly select 1/CF individuals of the total population as the crowded individuals;

  • Step 3: Execute Eq. (9) to calculate the Hamming distance between the new individual and the crowded individual:

    $$L = \left\| {x_{i} - x_{j} } \right\| = \sum\limits_{k = 1}^{N} {\left( {x_{k,i} - x_{k,j} } \right)}^{2}$$
    (9)
  • Step 4: Based on the similarity of Hamming distance, the individuals similar to the crowded individuals are excluded to form a new current group;

Clustering Algorithm Integrating Teaching and Learning Optimization Algorithm and Density Gap

Standard Deviation Optimization Selection Cluster Center

The DPC algorithm uses the local density and relative distance of the data set to determine the cluster center. It can be seen that the similarity measure between objects is the key of clustering, and the calculation of its properties is calculated by simple Euclidean distance. When the data set density is different, the clustering result will be greatly affected. Assumption d(x1, x2) = d(x2, x3) = d(x3, x4) Since the DPC algorithm uses the Euclidean distance to measure the similarity, the similarity between x2 and x3 is equal to the similarity between x3 and x4. Generally, the similarity of similar data points is large, and the similarity of data points of different classes is small. Secondly, if x1 and x2 belong to high-density, x3 and x4 belong to low density. According to the literature [34], the density of similar or co-manifold data points is similar, the similarity is large, and the density of different types of data points is higher than that of similar data points. The density differs greatly and the similarity is small. Therefore, such clustering is incorrect. In [35], the similarity metric should be influenced by the environment and neighbors. Therefore, the reciprocal of the weighted distance of the K nearest neighbors of any data point can be considered as the neighborhood density, according to the neighborhood density difference between the data points, adjust the distance between the two points. This takes into account both the manifold neighborhood of the data points and less noise.

In order to describe the method we propose, suppose the data set \(D = \left\{ {x_{1} ,x_{2} , \ldots ,x_{n} } \right\} \in R^{n \times m}\) is to be determined.

Definition 1

The distance between any two data points is:

$$d\left( {x_{i} ,x_{j} } \right) = \sqrt {\left( {x_{i} - x_{j} } \right)^{T} \left( {x_{i} - x_{j} } \right)}$$

In order to consider the neighborhood information of the data point, the Euclidean distance of the kth neighbor xk of any data point xi is used, but the data point xi is susceptible to noise, and therefore, the Weighted distance of the kth neighbor xk of the data point xi is used.

Definition 2

The weighted distance of K data points in the k-nearest neighbor of any data point xi is:

$$\varphi {}_{{x_{i} }} = \sum\limits_{j = 1}^{K} {\omega \left( {x_{i} ,x_{j} } \right)} d\left( {x_{i} ,x_{j} } \right),\;{\text{in}}\;\omega \left( {x_{i} ,x_{j} } \right) = 1 - \ln \left( {\frac{{d\left( {x{}_{i},x_{j} } \right)}}{{\sum\nolimits_{k = 1}^{K} {d\left( {x{}_{i},x_{k} } \right)} }}} \right)$$

Definition 3

Density of arbitrary data points

\(\theta_{{x_{i} }} = \frac{1}{{\phi_{{x_{i} }} }}.\) When the density of the neighborhood of the data point xi is high and the distribution is dense, ϕxi is small. When the density of the neighborhood where the data point xi is located is low and the distribution is sparse, ϕxi is large.

Definition 4

Density difference distance

\(d^{\prime } \left( {x_{i} ,x_{j} } \right) = d\left( {x_{i} ,x_{j} } \right)\left( {1 + \frac{{\left| {\theta_{{x_{i} }} - \theta_{{x_{j} }} } \right|}}{{\theta {}_{{\max} }}}} \right),\) in \(\theta_{{\max} } = \left\{ {\left| {\theta_{{x_{i} }} - \theta_{{x_{j} }} } \right|;\;i,j = 1, \ldots ,n} \right\}\) is the maximum value of the density difference, The ratio of |θxi − θxj| and θmax adjusts the distance. If the difference in density is larger, the similarity is smaller, resulting in a density difference distance. The local density ρi of the data point xi becomes a formula:

$$\rho_{i} = \sum\limits_{j} {\chi \left( {d^{{\prime }} \left( {x_{i} ,x_{j} } \right) - d_{c} } \right)}$$
(10)

In order to solve the problem that the DPC algorithm is difficult to obtain an accurate clustering center by manually selecting the clustering center by using the decision graph, this paper combines the method proposed into select the exact cluster number without manual intervention to optimize the algorithm, which is based on the following Eqs. (11) and (12):

$$EC_{i} = \left( {\delta_{i} } \right) \ge 2\sigma \left( {\delta_{i} } \right)$$
(11)

ECi is the expectation of the cluster center, and σ(δi) is the standard deviation of all high-density distances calculated according to Eq. (12).

$$DC_{i} = EC{}_{i} \ge \mu \left( {\rho_{i} } \right)$$
(12)

DCi is the cluster center after separation, and μ(ρi) is the mean of ρi. Because the DPC algorithm may generate a large δi value and a low density cluster center, consider using Eq. (12) to separate this cluster center from the cluster class center of ECi. In this way, DCi has a higher density and a higher density of higher density than adjacent data points. After determining the cluster center, the remaining points are assigned to the points with the smallest distance between all the high local density points.

Improvement of Teaching and Learning Optimization Algorithm

Nonlinear Decreasing Strategy Learning

Because the TLBO algorithm focuses on the learning mode of global exploration in the early stage, the local development ability of the algorithm is weakened in the search process. Although under the influence of teaching factors, with the increasing number of iterations of the algorithm, it may also fall into the local optimum situation. In order to enable the algorithm to learn adaptively in both teaching and to increase the search range and prevent local optimal deadlock, this paper proposes to introduce the weight of the nonlinear decrement strategy in the teacher stage and the student stage, as shown in Eq. (13).

$$w\left( t \right) = \frac{1}{2}\left[ {\left( {w_{{\max} } - w_{{\min} } } \right)\left( {1 - \frac{t}{{t_{{\max} } }}} \right)^{2} + \left( {w_{{\max} } - w_{{\min} } } \right)\left( {\frac{{t_{{\max} } - t_{{}} }}{{t_{{\max} } }}} \right)} \right] + w_{{\min} }$$
(13)

where wmax is the maximum weight and wmin is the minimum weight.

In this way, the balance algorithm uses the larger weight to explore more regions in the early stage and uses the smaller weights to better perform local development in the later stage. The improved individual description formula of the student stage is as shown in (14):

$$x_{i + 1} \left( t \right) = w\left( t \right) \times x_{i} \left( t \right) + Difference_{{}} \left( t \right)$$
(14)

In the learning phase, the student individual xi and the different student individual xj exchange learning and get the knowledge update description formula as shown in (15, 16):

$$x_{i + 1} \left( t \right) = w\left( t \right) \times x_{i} \left( t \right) + r_{i} \left( {x_{j} \left( t \right) - x_{i} \left( t \right)} \right),f\left( {x_{i}^{{}} \left( t \right)} \right) < f\left( {x_{j} \left( t \right)} \right)$$
(15)
$$x_{j} \left( t \right) = w\left( t \right) \times x_{j} \left( t \right) + r_{i} \left( {x_{i} \left( t \right) - x_{j} \left( t \right)} \right),f\left( {x_{i} \left( t \right)} \right) > f\left( {x_{j} \left( t \right)} \right)$$
(16)

By introducing dynamic w(t), the algorithm dynamically updates the knowledge level of teachers teaching individual students, increases the diversity of the population and the search range of the algorithm, thereby reducing the situation that the algorithm falls into local optimum and speeds up the convergence.

Opportunity for Niche Selection Strategy

The teaching and learning optimization algorithm may fall into local optimum in the later stage of iteration, which leads to the problem of population diversity decline. Therefore, the niche strategy is used to improve the shortcomings of teaching and learning optimization algorithms that are easy to fall into local optimum. In the process of increasing the number of iterations, if the algorithm appears that a large number of individual adaptive function values are close together and then come together, a niche selection strategy is introduced at this time. In order to make the teaching and learning optimization algorithm and the niche selection strategy better combined, a variable is needed to measure the degree of aggregation between individuals. In this paper, the discrete coefficient in statistics is used to describe the aggregation of the population Eq. (17):

$$g\left( t \right) = \frac{1}{{f_{avg} \left( t \right)}} \times \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {f\left( {x_{i} \left( t \right)} \right) - \left( {f_{avg} \left( t \right)} \right)} \right)^{2} } }$$
(17)

where f(xi(t)) represents the fitness function value of the current ith student individual; favg(t) is the current student population fitness average, and t is the number of iterations. Discrete coefficients can be used to reflect the degree of dispersion between individuals within a population. In the process of teaching and learning, the learning knowledge level of students is likely to approximate the same value, and the fitness function values may gradually become similar, resulting in the g(t) value changing. Small, the degree of dispersion is getting lower and lower, which indicates that the population is getting more and more concentrated at this time. Set a threshold q, which is usually between 0 and 0.3 depending on the actual situation. When g(t) < q, it indicates that the population aggregation is high, and it is necessary to introduce a niche selection strategy, that is, randomly select 30% of the total population to form the crowded members and then compare the similarity between the newly generated population and the crowded members after adding w(t), crowd out the students with high similarity, so as to avoid falling into the local optimum situation and maintain the diversity of the whole population.

Algorithm Implementation

Algorithm 3: NSTLBO algorithm

  • Step 1Step 6: This process is consistent with the original TLBO algorithm, see “Teaching and Learning Optimization Algorithms” section;

  • Step 7: Determine the degree of population aggregation. If g(t) is less than q, execute Algorithm 2, “Niche Selection Strategy” section. Continue to step 4;

  • Step 8: update the student individual by using Eq. (14), and calculate the updated student individual score;

  • Step 9: If the individual achievement of the student is improved after the update, the student individual replaces the student individual before the update by using Eq. (6);

  • Step 10: randomly select one student individual to compare with another student individual, and use Eqs. (15) and (16) to update the student individual and calculate the student individual score;

  • Step 11: If the individual score of the student is improved after the update, the updated student individual replaces the student individual before the update;

  • Step 12: t = t +1; satisfy the iterative condition and input the optimal student individual;

Clustering Algorithm Integrating Teaching and Learning Optimization Algorithm and Density Gap

The contour index can reflect the compactness and separation of data points. Therefore, this paper uses the contour index as the fitness function of the swarm intelligence algorithm, in order to find the best dc value. The contour index of the sample o is as shown in Eq. (18).

$$S\left( o \right) = \frac{{\left[ {b\left( o \right) - a\left( o \right)} \right]}}{{{\max} \left\{ {a\left( o \right),b\left( o \right)} \right\}}}$$
(18)

First, divide the data set of n sample points in the data set into k clusters, ci(i = 1, 2, , k), a(o) is the average difference or distance between the sample point o in cluster c and all other sample stores in their class. d(o, ci) is the distance between the sample point o and another class ci of all samples, b(o) = mind(o, ci) Where i = 1, 2, …, k and i ≠ j. The S(o) value reflects the quality of the clustering results. If the value of S(o) is larger, the clustering quality of the clustering algorithm is better.

The algorithm idea of this paper is to first set the cutoff distance dc to make the data object average local density as data according to the literature [11]. Then, randomly select a dc value in this interval [dlow, dhigh], calculate the density difference distance d′(xi, xj) by using the Definitions 1–4, calculate the local density ρi and the relative distance δi according to Eqs. (2) and (10), and then according to Eqs. (11) and (12) Select cluster center, the clustering result is evaluated by the fitness function, that is, the contour index S(o), the clustering is good or bad. Repeat the different dc values in the interval until the optimal dc value is selected. The whole process is to use the NSTLBO algorithm as the medium to select the optimal dc solution, and the optimal dc value is substituted into the improved density peak clustering algorithm, and finally the clustering result is obtained. The algorithm steps are as follows:

Algorithm 4: NSTLBO-DGDPC algorithm

  • Step 1: Set the dc interval [dlow, dhigh], initialize the population size of the NSTLBO algorithm 10, the number of iterations 10, and the threshold q is 0.2;

  • Step 2: randomly select a dc value, and calculate the density difference distance d’(xi, xj) by using the Definitions 1–4;

  • Step 3: Calculate the local density ρi and the relative distance δi according to f Eq. (10) and Eq. (2);

  • Step 4: Select a cluster center according to Eq. (11) and Eq. (12);

  • Step 5: According to the fitness Eq. (18), determine the fitness function value corresponding to each student;

  • Step 6: Run the NSTLBO algorithm to find the current optimal student individual and calculate the individual fitness function value of the student;

  • Step 7: Determine the current population aggregation degree, and if the aggregation degree is less than the threshold, use the niche selection strategy to form a new group;

  • Step 8: Perform the teaching phase: use the non-decreasing learning strategy to update the student individual and calculate the updated student individual fitness function value;

  • Step 9: If the updated student individual score is improved, the student individual updated by Eq. (14) replaces the student individual before the update;

  • Step 10: Execution stage: randomly select two different individual students to compare the academic scores, and use Eqs. (15) and (16) to update the individual students and calculate the individual student scores;

  • Step 11: If the algorithm reaches the maximum number of iterations, the output is a good value, otherwise it returns to step 2; if the algorithm does not reach the maximum number of iterations, the algorithm is terminated, and the optimal value is output, and the process returns to step 5;

Simulation Experiment and Validation

According to the above solution, the flowchart of the algorithm is designed as shown in Fig. 1.

Fig. 1
figure 1

The flowchart of algorithm

NSTLBO Algorithm Experimental Proof Validation

Firstly, the optimization performance of NSTLBO is verified. Secondly, the clustering performance of the proposed algorithm is verified. All the experiments mentioned in this paper are compiled and debugged under Windows 1064 bit operating system through MATLAB R2014a. The hardware environment is AMD A10 (2.30 GHz) with 8 GB of memory.

In order to verify the optimization performance of NSTLBO algorithm, this paper uses PSO algorithm, ABC algorithm, TLBO algorithm, and CTLBO algorithm to test and compare with NSTLBO algorithm. In this paper, ten standard test functions are selected from CEC, and the performance of the algorithm is verified. Among them, f1–f3, f7f8 is a continuous unimodal function, f4–f6f9f10 is a continuous multimodal function. For single peak function, there is only one global optimal value, which is used to test the development ability of the algorithm. Multi-peak function has many local optimal solutions, which is used to study the search ability of the algorithm. In order to verify the performance of the algorithm, it is necessary to calculate the average value and standard deviation of the test function. The smaller the average, the better the algorithm. The lower standard deviation indicates that the algorithm is more stable, and its expression and related parameters are shown in Table 1.

Table 1 Details of test function from CEC

Assuming that the population size is set to 50, the maximum number of iterations is set to 500, and the dimensions are 50 and 100, respectively. Inertial weight w = 0.7289, c1 = c2 = 1.5968 for PSO algorithm. The wmax of NSTLBO algorithm is 0.9, wmin is 0.1, q is 0.3. In the experiment, each algorithm runs 50 times for each test function. The optimal value, the optimal average value, and the optimal variance obtained by the algorithm are shown in Table 2 (50 dimensions) and Table 3 (100 dimensions), bold fonts are the best results of algorithms in experiments. In order to further visually compare the optimization performance of the algorithm, the average fitness evolution curves and statistical box diagrams of f1–f10 are drawn as shown in Figs. 2, 3, 4, and 5.

Table 2 Comparisons of algorithm test results (50 Dim)
Table 3 Comparisons of algorithm test results (100 Dim)
Fig. 2
figure 2

Evolution curve of average fitness of algorithms (50 Dim)

Fig. 3
figure 3

Statistical box diagram of the average fitness of the algorithms (50 Dim)

Fig. 4
figure 4

Average fitness of the algorithms (100 Dim)

Fig. 5
figure 5

Statistical box diagram of the average fitness of the algorithms (100 Dim)

As can be seen from the data in Table 2, on the test function, f1, although f1 is a function that is relatively easy to optimize. However, the average and standard deviation of ABC and PSO are relatively poor. The NSTLBO algorithm proposed in this paper achieves the global optimal value both in the mean and standard deviation. On the test function f2, f2 is a unimodal function that is difficult to optimize. The optimal value of the NSTLBO algorithm is lower than the optimal value of the PSO. The average value is slightly lower than the CTLBO algorithm, and the most obvious is the NSTLBO algorithm. The standard deviation is the highest compared to other algorithms. On test functions f3 and f4, where f4 is a multimodal function, the NSTLBO algorithm maintains a good performance on f1, and both the mean and standard deviation reach a global optimal value of zero. On the test function f5, the average and standard deviation of the solution of the NSTLBO algorithm are higher than those of the ABC and PSO algorithms, which are slightly higher than the TLBO and CTLBO algorithms, so the NSTLBO algorithm performs generally on the test function. On the test function f6, from the average and standard deviation of the solution, it is superior to the other four algorithms. On the test function f7, the average value of the NSTLBO algorithm is smaller than the average value of the ABC algorithm, larger than the average values of the TLBO and CTLBO algorithms, and the standard deviation is smaller than the values of the PSO, TLBO, and CTLBO algorithms. On the test function f8, the average and standard deviation of the NSTLBO algorithm are similar to the average and standard deviation of the PSO, TLBO, and CTLBO algorithms. On the test function f9, the performance of other algorithms is better than that of NSTLBO algorithm. On the test function f10, the average and standard deviation of the NSTLBO algorithm are significantly higher than those of other algorithms, indicating that the performance of the NSTLBO algorithm is better.

As can be seen from the data in Table 3, the standard deviation of the NSTLBO algorithm on f1 is 0 when the dimension of the function is increased to 100 dimensions. However, the overall solution ability on the test functions f1 and f2 is reduced, and it does not perform well in the 50-dimensional function. On the test function f2, compared with the other four algorithms, only the optimal value and standard deviation of the solution of the CTLBO algorithm are excellent. For the NSTLBO algorithm, the other three algorithms have no NSTLBO algorithm with high precision. On the test functions f3 and f4, the values of the mean and standard deviation of the solutions of the TLBO, CTLBO, and NSTLBO algorithms compared to the ABC and PSO algorithms are both zero. In the test functions f5 and f6, the mean and standard deviation of the solution and the function in the 50-dimensional state, the value does not have much amplitude, indicating that the NSTLBO algorithm has no change in value when the function dimension is 50-dimensional or 100-dimensional, and the performance It is normal on functions f5 and f6. On the test function f7, the average value of the NSTLBO algorithm is smaller than the average value of Table 2, and the standard deviation is larger than the standard deviation of Table 2, indicating that the stability of the function decreases when the dimension is high. On the test function f8, the average value of the NSTLBO algorithm is larger than the average value of Table 2. The standard deviation is smaller than the standard deviation of PSO, TLBO, and CTLBO, and larger than the standard deviation of ABC. On the test function f9, the average and standard deviation of the NSTLBO algorithm are similar to the values of PSO, TLBO, and CTLBO, which are smaller than the average and standard deviation of Table 2. On the test function f10, the average and standard deviation of the NSTLBO algorithm are significantly higher than the values of other algorithms, which are similar to the values of the CTLBO algorithm in Table 2, and smaller than the values of the ABC, PSO, and TLBO algorithms, indicating that the performance of the NSTLBO algorithm is better than other algorithms. After the dimension becomes larger, in addition to the CTLBO algorithm, the stability of the NSTLBO algorithm becomes higher.

From the convergence curves on (a) (c), (d), (e), (f), and (j) in Fig. 2, it can be seen that the NSTLBO algorithm tests the functions f1, f3, f4, f5, f6, f10’s convergence curve declines fastest and can reach the optimal value quickly with fewer iterations. It can be seen from (a), (c), (d), (f), (g), (j) in Fig. 2 that the convergence curves under the test functions f1, f3, f4, f6, f7, and f10 are relatively smooth. It shows that the NSTLBO algorithm performs best on the test functions f1, f3, f4, and f6 and performs generally on the test functions f5, f6, and f10. From the statistical box diagrams (a), (b), (e), (f), (g), (h), (i), and (j) in Fig. 3, it can be seen that the NSTLBO algorithm is testing the function f1, The sliding intervals on f2, f5, f6, f7, f8, f9, and f10 are larger. Except for f8 and f9, the minimum values are smaller than those of other algorithms. Only f3and f4 have good stability.

From the convergence curve in Fig. 4, it can be seen that when the function is in 100 dimensions, except for f7, f8, and f9, the convergence curves of the NSTLBO algorithm are all under the other algorithms. It can be seen in Fig. 4b that when the number of iterations is far, when it is far less than 100, the curve has already become a straight line. In Fig. 4b, e, it is observed that when the number of iterations approaches 100, the curve becomes a straight line. Therefore, under the two test functions f2 and f5, the NSTLBO algorithm, the convergence speed and solution accuracy of the two methods are basically unchanged. From the statistical box diagram in Fig. 5, the function dimension is increased to 100 dimensions, and the NSTLBO algorithm still has good stability on f3 and f4. Except f7, f8, and f9, it can be seen that the fitness value is the smallest compared to other functions. The fitness value of the NSTLBO algorithm is the smallest.

Analysis of the Experimental Proof of Algorithm NSTLBO-DGDPC

In order to verify the clustering performance of the proposed algorithm, DBSCAN algorithm and DPC algorithm are used to test and compare. In the experiment, the parameters Eps = 0.05, Minps = 25, and the parameters dc or k use the values of Ref. [36]. In this paper, sixteen standard data sets including eight synthetic data sets and eight real data sets are selected as shown in Table 4. This paper chooses three commonly used clustering evaluation indicators: clustering accuracy (ACC), adjusting mutual information (AMI), and adjusting random index (ARI) [37] and carries out simulation experiments on three clustering algorithms. The experimental results of synthetic data sets are shown in Table 5. The real data set Table 6 shows the clustering effect of the composite data set as shown in Fig. 6.

Table 4 Data set
Table 5 Comparison of composite data sets ACC, ARI, and AMI
Table 6 Comparison of real data sets ACC, ARI, and AMI
Fig. 6
figure 6

Clustering effect of two algorithms on synthetic data sets

From the three clustering evaluation indices in Table 5, compared with DPC algorithm, the ACC value of the proposed algorithm is increased by 0.4% on D31, which can be said that the accuracy is similar to DPC algorithm. Its ARI and AMI values are slightly higher than DPC algorithm. Compared with DBSCAN algorithm, the ACC and AMI values are increased by 26% and 19%, and its ARI values are significantly higher than DPC algorithm. This shows that this algorithm can find clustering centers on complex data sets and divide the data well. On aggregation and spiral synthetic data sets, the clustering evaluation indexes of the three algorithms are equal. On flame synthetic data sets, the values of ACC and ARI are equal to those of DPC algorithm. The values of ACC, ARI, and AMI are significantly higher than those of DBSCAN algorithm. On the Pathbased, R15, and Jain synthetic data sets, the ACC, ARI, and AMI values are significantly higher than the other two algorithms. It shows that compared with DPC algorithm, DBSCAN algorithm can process data sets and achieve better clustering results by combining the improved swarm intelligence algorithm with clustering algorithm.

From the three clustering evaluation indexes in Table 6, the ACC, ARI, and AMI values of the proposed algorithm are higher than those of DPC and DBSCAN algorithms on iris and wine real data. On the real data set of wdbc, the ACC value and ARI value are higher than the other two algorithms, but the AMI value is lower than the DPC algorithm. On the Parkinsons real data set, the three clustering indices are significantly higher than DPC and DBSCAN algorithms. On the real data sets of Parkinsons, Seeds, Pima, Glass, and Segmention, the three clustering indexes are significantly higher than the DPC algorithm and DBSCAN algorithm. On the Glass real data set, the ACC and ARI values of each algorithm are smaller than the ACC and ARI values of the Seeds, Pima, and Segmention real data sets. It also shows that the proposed algorithm can achieve better clustering results in both synthetic and real data sets.

Figure 6 is the clustering effect diagram of DPC algorithm and this algorithm on eight kinds of composite data sets with different complexity. Figure 5a on the D31, aggregation, Jain, and Compand data sets, compared with the DPC algorithm in Fig. 6b, this algorithm can divide the data into true classes. The clustering effect of the two algorithms is similar on the spiral, flame, Pathbased, and R15 data sets.

Conclusion and Future Work

The cutoff distance dc determines the local density and high-density distance of the DPC algorithm, so the setting of the dc value is very important. In order to solve the DPC algorithm manually set the cutoff distance and select the clustering center manually, the DPC algorithm distance calculation does not consider the impact of data attributes and neighborhoods on the clustering results. Based on the DPC and TLBO algorithms, in this paper we propose a clustering algorithm that combines teaching and learning optimization algorithms and density difference distances. The dc value range is set to the population search interval, and the NSTLBO algorithm is used to find the optimal dc value. Second, considering the influence of data attributes and its neighborhood factors, the weighted density difference distance is established and replaced by the Euclidean distance, and finally the weighted distance is used to calculate the local density value of the algorithm. Because the DPC algorithm selects the points with larger high-density distance and larger local density when determining the clustering center, the clustering center is selected by using the standard deviation of the distance and the mean density to separate these data points, thereby improving the correct clustering of the DPC algorithm The results of simulation experiments show that the algorithm in this paper can calculate the number of classes of the real data set without manually setting the dc value and has good clustering quality and effect.

For our future work, we will focus on an adaptive parameter selection method to avoid manually setting the parameters of the density peak clustering algorithm. In addition, with the increase in data sets, it will cause a lot of overhead to calculate the distance between data points. We will also study how to improve the computing speed of density peak clustering algorithm in dealing with large data sets, whether using high-performance hardware or using MapReduce to process big data in parallel is worth further study.