Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning

Chen, Hailong; Ge, Miaomiao; Xue, Yutong

doi:10.1007/s42979-020-00183-2

Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning

Original Research
Open access
Published: 16 May 2020

Volume 1, article number 172, (2020)
Cite this article

Download PDF

You have full access to this open access article

SN Computer Science Aims and scope Submit manuscript

Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning

Download PDF

1377 Accesses
4 Citations
Explore all metrics

Abstract

Density peak clustering (DPC) algorithm is to find clustering centers by calculating the local density and distance of data points based on the distance between data points and the cutoff distance (d_c) set manually. Generally, the attribute calculation between data points is simply obtained by Euclidean distance. However, when the density distribution of data points in data sets is uneven, there are high-density and low-density points, and the d_c value is set artificially and randomly, this will seriously affect the clustering results of DPC algorithm. For this reason, a clustering algorithm which combines teaching and learning optimization algorithm and density gap is proposed (NSTLBO-DGDPC). First, in order to consider the influence of data point attributes and neighborhoods, the density difference distance is introduced to replace the Euclidean distance of the original algorithm. Secondly, because manual selection of clustering centers may produce incorrect clustering results, the standard deviation of high-density distance is used to determine the clustering centers of clustering algorithm. Finally, using the teaching and learning optimization algorithm (TLBO) to find the optimal value, in order to avoid the algorithm falling into local optimum. When the population density reaches a certain threshold, the niche selection strategy is introduced to discharge the similarity value, and then the nonlinear decreasing strategy is used to update the students in the teaching stage and the learning stage to obtain the optimal d_c solution. In this paper, the accuracy and convergence of the improved TLBO algorithm (NSTLBO) are verified by ten benchmark functions. Simulation experiments show that the NSTLBO algorithm has better performance. Clustering algorithm integrating teaching and learning optimization algorithm and density gap proposed in this paper are validated by using eight synthetic data sets and eight real data sets. The simulation results show that the algorithm has better clustering quality and effect.

Use of Teaching Learning Based Optimization for Data Clustering

A novel density peaks clustering with sensitivity of local density and density-adaptive metric

Article 16 April 2018

Modified Teacher Learning Based Optimization Method for Data Clustering

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

With the advent of the era of big data, it is becoming more and more important to obtain valuable and potential knowledge and information from massive data. Cluster analysis is a multivariate statistical method that is divided into groups. According to the degree of similarity between each abstract object, it is divided into several groups, and similar objects are combined into one set. Clustering [1] is a process in which each data point in a data set is aggregated to several centers of the same feature. That is, the process of dividing a data set into classes consisting of similar objects. Therefore, cluster analysis is often used as a preprocessing for other data mining operations [2]. Because cluster analysis techniques can mine useful and implicit undiscovered information and knowledge from large amounts of raw data, it has been widely used in many applications, including image pattern recognition, Wed search, biology and security, and data analysis, text mining and other fields. Generally, clustering algorithm based on partition, clustering algorithm based on hierarchy, clustering algorithm based on density, clustering algorithm based on grid, followed by neural grid clustering algorithm, clustering algorithm based on statistics and fuzzy clustering algorithm [3,4,5,6,7,8,9]. K-means algorithm is one of the most widely used clustering algorithms based on partition [10]. The Euclidean distance method is used to calculate the minimum of the sum of the squares of the distances between the data points and the cluster center points as the objective function of the algorithm. The K-means algorithm has a fast convergence speed, a simple algorithm and can effectively process large data sets [11]‚ however, it is necessary to manually determine the number of clusters and randomly select the initial cluster centers. The DBSCAN algorithm [12] is the most classic density-based clustering algorithm. The DBSCAN algorithm requires two parameters: the density threshold parameter and the distance parameter. The disadvantage is that the parameter affects the cluster quality and cannot process the high-dimensional data well. Secondly, if the sample set has uneven density and large difference between clustering spacing, it will lead to poor clustering quality. Another density clustering algorithm is a new density peak clustering algorithm DPC (density peak clustering) proposed by Rodriguez and Laio [13]. Although the algorithm is simple and efficient, there is no need to calculate the objective function iteratively to determine the cluster center. However, the cutoff distance and cluster center of the algorithm must be manually set or randomly set according to experience. The choice of cutoff distance determines the algorithm. The calculation of local density and relative distance, which in turn, affects the clustering results. Therefore, finding the optimal value of the cutoff distance is a challenge. In order to improve DPC algorithm, some scholars have put forward their own ideas in the past 3 years. There are many ways to set truncation distance manually in these studies. For example, Xu [14] proposed the k-nearest neighbor algorithm based on Gauss kernel to calculate the local density and use the difference vector to automatically select the cluster center. Du [15] proposed to combine FJP algorithm and use fuzzy neighborhood relations to define fuzzy joint points to calculate local density. Li [16] proposed a new comparative model to replace the calculation of the original relative distance in the clustering of comparative density peaks. Bai [17] proposed an accelerated algorithm based on concept approximation with less distance calculation and then proposed an approximate algorithm based on sample clustering to obtain the approximate clustering results of the original algorithm quickly in order to take into account the scalability of the algorithm. Swarm intelligence algorithm has great advantages in solving global optimum and is insensitive to parameters, so swarm intelligence algorithm is used to optimize the intercept and clustering center. Swarm intelligence algorithms include ant colony optimization algorithm [18], artificial bee colony algorithm [19], firefly algorithm [20], particle swarm optimization algorithm [21], teaching and learning optimization algorithm [22], etc. Among them, the TLOB algorithm is an optimization tool inspired by the knowledge transfer mechanism of teachers and students in the classroom. Compared with other optimization algorithms, the most prominent feature of TLBO algorithm is that special control parameters that do not need to be adjusted before execution. The PSO algorithm needs to adjust speed and inertia weights, and the ABC algorithm requires three individuals. GA algorithm needs to adjust crossover rate and mutation rate. However, the TLBO algorithm has the disadvantages of being easily trapped in a local optimum and slow convergence. In order to increase the diversity of new populations, Zou [23] proposed a teaching and learning optimization algorithm (DRLTLBO) for differential and exclusion learning. Ender [24] proposed to combine extreme machine learning with TLBO to solve data classification problems. Kumar [25] used a variety of chaotic mechanisms to solve the problem of TLBO algorithm lacking trade-off between premature convergence and local search and global search. Bourahla [26] used the random local search mechanism of the bat algorithm to the TLBO algorithm and proved its effectiveness. Niu [27] proposed an improved teaching-based optimization algorithm (MTLBO) based on the actual “teaching learning” situation to speed up the convergence of the algorithm. Zhang [28] integrated the number spiral strategy and the triangle mutation rule into the TLBO algorithm in order to improve the shortcomings of the TLBO convergence rate and the tendency to fall into the local optimal solution. At present, many learners apply group intelligence algorithms combined with clustering algorithms to data mining. Anari proposed a new clustering algorithm based on learning automata and ant colony intelligence and solved the problem of optimum placement of data items in grid by LA method [29]. Majdi used genetic algorithm and particle swarm optimization to optimize the pre-determined FCM clustering model, which improves the accuracy of model estimation [30]. Zang [31] proposed a new clustering method based on potential entropy of data field to extract the optimal threshold value and uses Gauss method to automatically determine the clustering center. In this paper, TLBO has the advantages of fast convergence and fast search speed to optimize the cutoff distance in DPC algorithm. On this basis, a clustering algorithm combining TLBO algorithm and density gap is proposed (NSTLBO-DGDPC). Because Euclidean distance calculation does not consider the influence of data point attributes and neighborhood, a density gap distance instead of Euclidean distance. Then, the standard deviation of high-density distance is used to determine the clustering center of the clustering algorithm, which can avoid the influence of manual selection of clustering center on clustering results. Finally, in order to avoid the TLBO algorithm falling into local optimum, niche selection strategy is introduced to exclude similar values when the population density reaches a certain threshold, and then the nonlinear decline strategy is used to update the individual students in the teaching and learning stages to find the optimal d_c value.

The main contributions of this paper are summarized as follows:

(1)
In this paper, we proposed to use the weighted density difference distance instead of the Euclidean distance to calculate the local density of the algorithm.
(2)
In order to be able to correctly select the clustering center, the final clustering center is selected by using the standard deviation.
(3)
In order to solve the DPC algorithm manually set the truncation distance, use the TLBO algorithm to find the optimal d_c value.
(4)
TLBO algorithm is easy to fall into local optimal. Therefore, in this paper, we propose to introduce a niche selection strategy to eliminate similar values when the population density reaches a certain threshold and then use a nonlinear decreasing strategy to update the individual students in the teaching and learning stages and finally obtain the optimal d_c solution.
(5)
NSTLBO algorithm improves the accuracy and convergence of the algorithm. The simulation function of the benchmark function proves that the NSTLBO algorithm has better performance.
(6)
Using the synthetic data set and the real data set to verify the proposed fusion teaching and learning optimization algorithm and the clustering algorithm of density difference distance, it is shown that the algorithm can calculate the number of real data sets without manually setting the d_c value. The ACC, AMI, and ARI values of the proposed algorithm have improved, indicating that the algorithm has better clustering quality.

The organization of this paper is as follows: in the second section, the DPC algorithm and the TLBO algorithm are briefly introduced. In third section, a clustering algorithm that combines teaching and learning optimization algorithms and density difference distances is introduced. In subsection, the density difference distance and standard deviation optimization selection clustering center of the clustering algorithm are introduced. The improved TLBO algorithm (NSTLBO) is introduced in subsection. The algorithm steps are described in subsection. The simulation experiments and analysis of the algorithm in this paper are discussed in fourth section. Finally, the fifth section covers conclusion and future work.

Preliminaries

Density Peak Clustering Algorithm DPC

The algorithm is based on two simple and intuitive assumptions. Cluster centers are surrounded by neighborhoods with lower local density, while the distance between cluster centers and any points with higher local density is relatively large. According to these two hypotheses, the algorithm first finds the potential density peak, labels it as the cluster center and then assigns other points to the cluster. For each data point x_i, DPC clustering algorithm must calculate the distance between local density ρ_i and high-density point δ_i. The local density of data points is defined as Eq. (1):

$$\rho_{i} = \sum\limits_{j} {\chi \left( {d\left( {x_{i} ,x_{j} } \right) - d_{c} } \right)} ,\quad \chi \left( x \right) = \left\{ {\begin{array}{*{20}c} {1,\quad x < 0} \\ {0,\quad x \ge 0} \\ \end{array} } \right.$$

(1)

d_c is cutoff distance. d(x_i, x_j) is the Euclidean distance between data point x_i and data point x_j The δ_i of data point x_i is determined by Eq. (2).

$$\delta_{i} = \left\{ {\begin{array}{*{20}l} {{\min} \left( {d\left( {x_{i} ,x_{j} } \right)} \right),} \hfill & {\rho_{j} > \rho_{i} } \hfill \\ {{\max} \left( {d\left( {x_{i} ,x_{j} } \right)} \right),} \hfill & {x_{j} \in X,\;{\text{otherwise}}} \hfill \\ \end{array} } \right.$$

(2)

Teaching and Learning Optimization Algorithms

TLBO algorithm is proposed inspired by the teaching process. It has two important stages: teaching stage and learning stage. The algorithm uses these two stages to influence the population to find the optimal solution. The results of each student in the class are mapped to the solution space. The results of each student are the fitness value of the solution. The best students are regarded as teachers, and then other students are taught to improve their learning performance. The space dimension of the solution is d, the total number of students (population) is n, x_i is the ith student, f(x_i) is the academic achievement (fitness value), and x_teacher is the teacher. $mean = \frac{1}{{n}}\sum\nolimits_{i = 1}^{n} {X_{i} }$ is expressed as the average of students.

In the teaching stage: among all the students, the students with the best scores are selected as the teachers of the next round of learning process. The teachers improve the performance of the whole class by teaching the students’ knowledge to make the students’ grades close to their own learning level. Any student in the class after the teacher stage, the difference between the teacher and student averages in the tth iteration is shown in Eq. (3):

$$Difference\left( t \right) = r_{i} \times \left( {x_{teacher} \left( t \right) - TF \times mean\left( t \right)} \right)$$

(3)

$$TF = round\left[ {1 + rand\left( {0,1} \right)} \right],\quad r_{i} = rand\left( {0,1} \right)$$

(4)

where $TF$ takes the value 1 or 2, and t is the number of iterations. The resulting description of the teacher knowledge level update is as shown in Eq. (5):

$$x_{i + 1} \left( t \right) = x_{i} \left( t \right) + Difference\left( t \right)$$

(5)

$${\text{If}}\;f\left( {x_{i + 1} \left( t \right)} \right) > f\left( {x_{i} \left( t \right)} \right),\;{\text{then}}\;x_{i} (t) = x_{i + 1} (t),\;{\text{otherwise}}\;x_{i} (t) = x_{i} (t)$$

(6)

In the learning phase: any student in the class randomly searches for other students x_j in the class according to their own understanding of the learning content and enhances their knowledge understanding ability. If the student’s x_j score is better than the student’s x_i score, learn from the student x_j and improve his or her knowledge. In the learning phase, the student update description formula is obtained by communicating with different students as shown in Eqs. (7, 8):

$${\text{If}}\;f\left( {x_{i} \left( t \right)} \right) < f\left( {x_{j} \left( t \right)} \right),\;{\text{then}}\;x_{i + 1} \left( t \right) = x_{i} \left( t \right) + r_{i} \left( {x_{j} \left( t \right) - x_{i} \left( t \right)} \right)$$

(7)

$${\text{If}}\;f\left( {x_{i} \left( t \right)} \right) > f\left( {x_{j} \left( t \right)} \right),\;{\text{then}}\;x_{j} \left( t \right) = x_{j} \left( t \right) + r_{i} \left( {x_{i} \left( t \right) - x_{j} \left( t \right)} \right)$$

(8)

Algorithm 1: TLBO algorithm

Step 1: Initialize the student population n, dimension d;
Step 2: Calculate the student average mean, x_teacher;
Step 3: Update the student individual by using Eq. (5), and calculate the updated student individual score;
Step 4: If the individual score of the student is improved after the update, the student individual replaces the student individual before the update with Eq. (6);
Step 5: Randomly select one student individual to compare with another student individual and use Eqs. (7) and (8) to update the student individual and calculate the student individual score;
Step 6: If the individual achievement of the student is improved after the update, the updated student individual replaces the student individual before the update;
Step 7: If the algorithm satisfies the iterative condition, input the optimal student individual;

Niche Selection Strategy

Niche is originated from biology. That is to say, the same kind of creatures forms a small living environment and separate different kinds of individuals [32]. This idea is applied to the neighborhood of computing science, which is to divide the data with the same data characteristics into one kind and separate the data with different characteristics. DeJong [33] proposed a niche method based on the exclusion mechanism in 1975, that is, in a limited space, various biological individuals can survive and compete for survival resources. Applying this idea to intelligent algorithms can prevent the algorithm from premature convergence and improve population diversity.

Algorithm 2: Selection strategy based on crowding mechanism niche algorithm

Step 1: Set the total number of groups, the crowding factor CF (value 2 or 3);
Step 2: randomly select 1/CF individuals of the total population as the crowded individuals;
Step 3: Execute Eq. (9) to calculate the Hamming distance between the new individual and the crowded individual:
$$L = \left\| {x_{i} - x_{j} } \right\| = \sum\limits_{k = 1}^{N} {\left( {x_{k,i} - x_{k,j} } \right)}^{2}$$
(9)
Step 4: Based on the similarity of Hamming distance, the individuals similar to the crowded individuals are excluded to form a new current group;

Clustering Algorithm Integrating Teaching and Learning Optimization Algorithm and Density Gap

Standard Deviation Optimization Selection Cluster Center

The DPC algorithm uses the local density and relative distance of the data set to determine the cluster center. It can be seen that the similarity measure between objects is the key of clustering, and the calculation of its properties is calculated by simple Euclidean distance. When the data set density is different, the clustering result will be greatly affected. Assumption d(x₁, x₂) = d(x₂, x₃) = d(x₃, x₄) Since the DPC algorithm uses the Euclidean distance to measure the similarity, the similarity between x₂ and x₃ is equal to the similarity between x₃ and x₄. Generally, the similarity of similar data points is large, and the similarity of data points of different classes is small. Secondly, if x₁ and x₂ belong to high-density, x₃ and x₄ belong to low density. According to the literature [34], the density of similar or co-manifold data points is similar, the similarity is large, and the density of different types of data points is higher than that of similar data points. The density differs greatly and the similarity is small. Therefore, such clustering is incorrect. In [35], the similarity metric should be influenced by the environment and neighbors. Therefore, the reciprocal of the weighted distance of the K nearest neighbors of any data point can be considered as the neighborhood density, according to the neighborhood density difference between the data points, adjust the distance between the two points. This takes into account both the manifold neighborhood of the data points and less noise.

In order to describe the method we propose, suppose the data set $D = \left\{ {x_{1} ,x_{2} , \ldots ,x_{n} } \right\} \in R^{n \times m}$ is to be determined.

Definition 1

The distance between any two data points is:

$$d\left( {x_{i} ,x_{j} } \right) = \sqrt {\left( {x_{i} - x_{j} } \right)^{T} \left( {x_{i} - x_{j} } \right)}$$

In order to consider the neighborhood information of the data point, the Euclidean distance of the kth neighbor x_k of any data point x_i is used, but the data point x_i is susceptible to noise, and therefore, the Weighted distance of the kth neighbor x_k of the data point x_i is used.

Definition 2

The weighted distance of K data points in the k-nearest neighbor of any data point x_i is:

$$\varphi {}_{{x_{i} }} = \sum\limits_{j = 1}^{K} {\omega \left( {x_{i} ,x_{j} } \right)} d\left( {x_{i} ,x_{j} } \right),\;{\text{in}}\;\omega \left( {x_{i} ,x_{j} } \right) = 1 - \ln \left( {\frac{{d\left( {x{}_{i},x_{j} } \right)}}{{\sum\nolimits_{k = 1}^{K} {d\left( {x{}_{i},x_{k} } \right)} }}} \right)$$

Definition 3

Density of arbitrary data points

$\theta_{{x_{i} }} = \frac{1}{{\phi_{{x_{i} }} }}.$ When the density of the neighborhood of the data point x_i is high and the distribution is dense, ϕ_xi is small. When the density of the neighborhood where the data point x_i is located is low and the distribution is sparse, ϕ_xi is large.

Definition 4

Density difference distance

$d^{\prime } \left( {x_{i} ,x_{j} } \right) = d\left( {x_{i} ,x_{j} } \right)\left( {1 + \frac{{\left| {\theta_{{x_{i} }} - \theta_{{x_{j} }} } \right|}}{{\theta {}_{{\max} }}}} \right),$ in $\theta_{{\max} } = \left\{ {\left| {\theta_{{x_{i} }} - \theta_{{x_{j} }} } \right|;\;i,j = 1, \ldots ,n} \right\}$ is the maximum value of the density difference, The ratio of |θ_xi − θ_xj| and θ_max adjusts the distance. If the difference in density is larger, the similarity is smaller, resulting in a density difference distance. The local density ρ_i of the data point x_i becomes a formula:

$$\rho_{i} = \sum\limits_{j} {\chi \left( {d^{{\prime }} \left( {x_{i} ,x_{j} } \right) - d_{c} } \right)}$$

(10)

In order to solve the problem that the DPC algorithm is difficult to obtain an accurate clustering center by manually selecting the clustering center by using the decision graph, this paper combines the method proposed into select the exact cluster number without manual intervention to optimize the algorithm, which is based on the following Eqs. (11) and (12):

$$EC_{i} = \left( {\delta_{i} } \right) \ge 2\sigma \left( {\delta_{i} } \right)$$

(11)

EC_i is the expectation of the cluster center, and σ(δ_i) is the standard deviation of all high-density distances calculated according to Eq. (12).

$$DC_{i} = EC{}_{i} \ge \mu \left( {\rho_{i} } \right)$$

(12)

DC_i is the cluster center after separation, and μ(ρ_i) is the mean of ρ_i. Because the DPC algorithm may generate a large δ_i value and a low density cluster center, consider using Eq. (12) to separate this cluster center from the cluster class center of EC_i. In this way, DC_i has a higher density and a higher density of higher density than adjacent data points. After determining the cluster center, the remaining points are assigned to the points with the smallest distance between all the high local density points.

Improvement of Teaching and Learning Optimization Algorithm

Nonlinear Decreasing Strategy Learning

Because the TLBO algorithm focuses on the learning mode of global exploration in the early stage, the local development ability of the algorithm is weakened in the search process. Although under the influence of teaching factors, with the increasing number of iterations of the algorithm, it may also fall into the local optimum situation. In order to enable the algorithm to learn adaptively in both teaching and to increase the search range and prevent local optimal deadlock, this paper proposes to introduce the weight of the nonlinear decrement strategy in the teacher stage and the student stage, as shown in Eq. (13).

$$w\left( t \right) = \frac{1}{2}\left[ {\left( {w_{{\max} } - w_{{\min} } } \right)\left( {1 - \frac{t}{{t_{{\max} } }}} \right)^{2} + \left( {w_{{\max} } - w_{{\min} } } \right)\left( {\frac{{t_{{\max} } - t_{{}} }}{{t_{{\max} } }}} \right)} \right] + w_{{\min} }$$

(13)

where w_max is the maximum weight and w_min is the minimum weight.

In this way, the balance algorithm uses the larger weight to explore more regions in the early stage and uses the smaller weights to better perform local development in the later stage. The improved individual description formula of the student stage is as shown in (14):

$$x_{i + 1} \left( t \right) = w\left( t \right) \times x_{i} \left( t \right) + Difference_{{}} \left( t \right)$$

(14)

In the learning phase, the student individual x_i and the different student individual x_j exchange learning and get the knowledge update description formula as shown in (15, 16):

$$x_{i + 1} \left( t \right) = w\left( t \right) \times x_{i} \left( t \right) + r_{i} \left( {x_{j} \left( t \right) - x_{i} \left( t \right)} \right),f\left( {x_{i}^{{}} \left( t \right)} \right) < f\left( {x_{j} \left( t \right)} \right)$$

(15)

$$x_{j} \left( t \right) = w\left( t \right) \times x_{j} \left( t \right) + r_{i} \left( {x_{i} \left( t \right) - x_{j} \left( t \right)} \right),f\left( {x_{i} \left( t \right)} \right) > f\left( {x_{j} \left( t \right)} \right)$$

(16)

By introducing dynamic w(t), the algorithm dynamically updates the knowledge level of teachers teaching individual students, increases the diversity of the population and the search range of the algorithm, thereby reducing the situation that the algorithm falls into local optimum and speeds up the convergence.

Opportunity for Niche Selection Strategy

The teaching and learning optimization algorithm may fall into local optimum in the later stage of iteration, which leads to the problem of population diversity decline. Therefore, the niche strategy is used to improve the shortcomings of teaching and learning optimization algorithms that are easy to fall into local optimum. In the process of increasing the number of iterations, if the algorithm appears that a large number of individual adaptive function values are close together and then come together, a niche selection strategy is introduced at this time. In order to make the teaching and learning optimization algorithm and the niche selection strategy better combined, a variable is needed to measure the degree of aggregation between individuals. In this paper, the discrete coefficient in statistics is used to describe the aggregation of the population Eq. (17):

$$g\left( t \right) = \frac{1}{{f_{avg} \left( t \right)}} \times \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left( {f\left( {x_{i} \left( t \right)} \right) - \left( {f_{avg} \left( t \right)} \right)} \right)^{2} } }$$

(17)

where f(x_i(t)) represents the fitness function value of the current ith student individual; f_avg(t) is the current student population fitness average, and t is the number of iterations. Discrete coefficients can be used to reflect the degree of dispersion between individuals within a population. In the process of teaching and learning, the learning knowledge level of students is likely to approximate the same value, and the fitness function values may gradually become similar, resulting in the g(t) value changing. Small, the degree of dispersion is getting lower and lower, which indicates that the population is getting more and more concentrated at this time. Set a threshold q, which is usually between 0 and 0.3 depending on the actual situation. When g(t) < q, it indicates that the population aggregation is high, and it is necessary to introduce a niche selection strategy, that is, randomly select 30% of the total population to form the crowded members and then compare the similarity between the newly generated population and the crowded members after adding w(t), crowd out the students with high similarity, so as to avoid falling into the local optimum situation and maintain the diversity of the whole population.

Algorithm Implementation

Algorithm 3: NSTLBO algorithm

Step 1–Step 6: This process is consistent with the original TLBO algorithm, see “Teaching and Learning Optimization Algorithms” section;
Step 7: Determine the degree of population aggregation. If g(t) is less than q, execute Algorithm 2, “Niche Selection Strategy” section. Continue to step 4;
Step 8: update the student individual by using Eq. (14), and calculate the updated student individual score;
Step 9: If the individual achievement of the student is improved after the update, the student individual replaces the student individual before the update by using Eq. (6);
Step 10: randomly select one student individual to compare with another student individual, and use Eqs. (15) and (16) to update the student individual and calculate the student individual score;
Step 11: If the individual score of the student is improved after the update, the updated student individual replaces the student individual before the update;
Step 12: t = t +1; satisfy the iterative condition and input the optimal student individual;

Clustering Algorithm Integrating Teaching and Learning Optimization Algorithm and Density Gap

The contour index can reflect the compactness and separation of data points. Therefore, this paper uses the contour index as the fitness function of the swarm intelligence algorithm, in order to find the best d_c value. The contour index of the sample o is as shown in Eq. (18).

$$S\left( o \right) = \frac{{\left[ {b\left( o \right) - a\left( o \right)} \right]}}{{{\max} \left\{ {a\left( o \right),b\left( o \right)} \right\}}}$$

(18)

First, divide the data set of n sample points in the data set into k clusters, c_i(i = 1, 2, …, k), a(o) is the average difference or distance between the sample point o in cluster c and all other sample stores in their class. d(o, c_i) is the distance between the sample point o and another class c_i of all samples, b(o) = mind(o, c_i) Where i = 1, 2, …, k and i ≠ j. The S(o) value reflects the quality of the clustering results. If the value of S(o) is larger, the clustering quality of the clustering algorithm is better.

The algorithm idea of this paper is to first set the cutoff distance d_c to make the data object average local density as data according to the literature [11]. Then, randomly select a d_c value in this interval [d_low, d_high], calculate the density difference distance d′(x_i, x_j) by using the Definitions 1–4, calculate the local density ρ_i and the relative distance δ_i according to Eqs. (2) and (10), and then according to Eqs. (11) and (12) Select cluster center, the clustering result is evaluated by the fitness function, that is, the contour index S(o), the clustering is good or bad. Repeat the different d_c values in the interval until the optimal d_c value is selected. The whole process is to use the NSTLBO algorithm as the medium to select the optimal d_c solution, and the optimal d_c value is substituted into the improved density peak clustering algorithm, and finally the clustering result is obtained. The algorithm steps are as follows:

Algorithm 4: NSTLBO-DGDPC algorithm

Step 1: Set the d_c interval [dlow, dhigh], initialize the population size of the NSTLBO algorithm 10, the number of iterations 10, and the threshold q is 0.2;
Step 2: randomly select a d_c value, and calculate the density difference distance d’(x_i, x_j) by using the Definitions 1–4;
Step 3: Calculate the local density ρ_i and the relative distance δ_i according to f Eq. (10) and Eq. (2);
Step 4: Select a cluster center according to Eq. (11) and Eq. (12);
Step 5: According to the fitness Eq. (18), determine the fitness function value corresponding to each student;
Step 6: Run the NSTLBO algorithm to find the current optimal student individual and calculate the individual fitness function value of the student;
Step 7: Determine the current population aggregation degree, and if the aggregation degree is less than the threshold, use the niche selection strategy to form a new group;
Step 8: Perform the teaching phase: use the non-decreasing learning strategy to update the student individual and calculate the updated student individual fitness function value;
Step 9: If the updated student individual score is improved, the student individual updated by Eq. (14) replaces the student individual before the update;
Step 10: Execution stage: randomly select two different individual students to compare the academic scores, and use Eqs. (15) and (16) to update the individual students and calculate the individual student scores;
Step 11: If the algorithm reaches the maximum number of iterations, the output is a good value, otherwise it returns to step 2; if the algorithm does not reach the maximum number of iterations, the algorithm is terminated, and the optimal value is output, and the process returns to step 5;

Simulation Experiment and Validation

According to the above solution, the flowchart of the algorithm is designed as shown in Fig. 1.

NSTLBO Algorithm Experimental Proof Validation

Firstly, the optimization performance of NSTLBO is verified. Secondly, the clustering performance of the proposed algorithm is verified. All the experiments mentioned in this paper are compiled and debugged under Windows 1064 bit operating system through MATLAB R2014a. The hardware environment is AMD A10 (2.30 GHz) with 8 GB of memory.

In order to verify the optimization performance of NSTLBO algorithm, this paper uses PSO algorithm, ABC algorithm, TLBO algorithm, and CTLBO algorithm to test and compare with NSTLBO algorithm. In this paper, ten standard test functions are selected from CEC, and the performance of the algorithm is verified. Among them, f₁–f₃, f₇, f₈ is a continuous unimodal function, f₄–f₆, f₉, f₁₀ is a continuous multimodal function. For single peak function, there is only one global optimal value, which is used to test the development ability of the algorithm. Multi-peak function has many local optimal solutions, which is used to study the search ability of the algorithm. In order to verify the performance of the algorithm, it is necessary to calculate the average value and standard deviation of the test function. The smaller the average, the better the algorithm. The lower standard deviation indicates that the algorithm is more stable, and its expression and related parameters are shown in Table 1.

Table 1 Details of test function from CEC

Full size table

Assuming that the population size is set to 50, the maximum number of iterations is set to 500, and the dimensions are 50 and 100, respectively. Inertial weight w = 0.7289, c₁ = c₂ = 1.5968 for PSO algorithm. The w_max of NSTLBO algorithm is 0.9, w_min is 0.1, q is 0.3. In the experiment, each algorithm runs 50 times for each test function. The optimal value, the optimal average value, and the optimal variance obtained by the algorithm are shown in Table 2 (50 dimensions) and Table 3 (100 dimensions), bold fonts are the best results of algorithms in experiments. In order to further visually compare the optimization performance of the algorithm, the average fitness evolution curves and statistical box diagrams of f₁–f₁₀ are drawn as shown in Figs. 2, 3, 4, and 5.

Table 2 Comparisons of algorithm test results (50 Dim)

Full size table

Table 3 Comparisons of algorithm test results (100 Dim)

Full size table

As can be seen from the data in Table 2, on the test function, f₁, although f₁ is a function that is relatively easy to optimize. However, the average and standard deviation of ABC and PSO are relatively poor. The NSTLBO algorithm proposed in this paper achieves the global optimal value both in the mean and standard deviation. On the test function f₂, f₂ is a unimodal function that is difficult to optimize. The optimal value of the NSTLBO algorithm is lower than the optimal value of the PSO. The average value is slightly lower than the CTLBO algorithm, and the most obvious is the NSTLBO algorithm. The standard deviation is the highest compared to other algorithms. On test functions f₃ and f₄, where f₄ is a multimodal function, the NSTLBO algorithm maintains a good performance on f₁, and both the mean and standard deviation reach a global optimal value of zero. On the test function f₅, the average and standard deviation of the solution of the NSTLBO algorithm are higher than those of the ABC and PSO algorithms, which are slightly higher than the TLBO and CTLBO algorithms, so the NSTLBO algorithm performs generally on the test function. On the test function f₆, from the average and standard deviation of the solution, it is superior to the other four algorithms. On the test function f₇, the average value of the NSTLBO algorithm is smaller than the average value of the ABC algorithm, larger than the average values of the TLBO and CTLBO algorithms, and the standard deviation is smaller than the values of the PSO, TLBO, and CTLBO algorithms. On the test function f₈, the average and standard deviation of the NSTLBO algorithm are similar to the average and standard deviation of the PSO, TLBO, and CTLBO algorithms. On the test function f₉, the performance of other algorithms is better than that of NSTLBO algorithm. On the test function f₁₀, the average and standard deviation of the NSTLBO algorithm are significantly higher than those of other algorithms, indicating that the performance of the NSTLBO algorithm is better.

As can be seen from the data in Table 3, the standard deviation of the NSTLBO algorithm on f₁ is 0 when the dimension of the function is increased to 100 dimensions. However, the overall solution ability on the test functions f₁ and f₂ is reduced, and it does not perform well in the 50-dimensional function. On the test function f₂, compared with the other four algorithms, only the optimal value and standard deviation of the solution of the CTLBO algorithm are excellent. For the NSTLBO algorithm, the other three algorithms have no NSTLBO algorithm with high precision. On the test functions f₃ and f₄, the values of the mean and standard deviation of the solutions of the TLBO, CTLBO, and NSTLBO algorithms compared to the ABC and PSO algorithms are both zero. In the test functions f₅ and f₆, the mean and standard deviation of the solution and the function in the 50-dimensional state, the value does not have much amplitude, indicating that the NSTLBO algorithm has no change in value when the function dimension is 50-dimensional or 100-dimensional, and the performance It is normal on functions f₅ and f₆. On the test function f₇, the average value of the NSTLBO algorithm is smaller than the average value of Table 2, and the standard deviation is larger than the standard deviation of Table 2, indicating that the stability of the function decreases when the dimension is high. On the test function f₈, the average value of the NSTLBO algorithm is larger than the average value of Table 2. The standard deviation is smaller than the standard deviation of PSO, TLBO, and CTLBO, and larger than the standard deviation of ABC. On the test function f₉, the average and standard deviation of the NSTLBO algorithm are similar to the values of PSO, TLBO, and CTLBO, which are smaller than the average and standard deviation of Table 2. On the test function f₁₀, the average and standard deviation of the NSTLBO algorithm are significantly higher than the values of other algorithms, which are similar to the values of the CTLBO algorithm in Table 2, and smaller than the values of the ABC, PSO, and TLBO algorithms, indicating that the performance of the NSTLBO algorithm is better than other algorithms. After the dimension becomes larger, in addition to the CTLBO algorithm, the stability of the NSTLBO algorithm becomes higher.

From the convergence curves on (a) (c), (d), (e), (f), and (j) in Fig. 2, it can be seen that the NSTLBO algorithm tests the functions f₁, f₃, f₄, f₅, f₆, f₁₀’s convergence curve declines fastest and can reach the optimal value quickly with fewer iterations. It can be seen from (a), (c), (d), (f), (g), (j) in Fig. 2 that the convergence curves under the test functions f₁, f₃, f₄, f₆, f₇, and f₁₀ are relatively smooth. It shows that the NSTLBO algorithm performs best on the test functions f₁, f₃, f₄, and f₆ and performs generally on the test functions f₅, f₆, and f₁₀. From the statistical box diagrams (a), (b), (e), (f), (g), (h), (i), and (j) in Fig. 3, it can be seen that the NSTLBO algorithm is testing the function f₁, The sliding intervals on f₂, f₅, f₆, f₇, f₈, f₉, and f₁₀ are larger. Except for f₈ and f₉, the minimum values are smaller than those of other algorithms. Only f₃and f₄ have good stability.

From the convergence curve in Fig. 4, it can be seen that when the function is in 100 dimensions, except for f₇, f₈, and f₉, the convergence curves of the NSTLBO algorithm are all under the other algorithms. It can be seen in Fig. 4b that when the number of iterations is far, when it is far less than 100, the curve has already become a straight line. In Fig. 4b, e, it is observed that when the number of iterations approaches 100, the curve becomes a straight line. Therefore, under the two test functions f₂ and f₅, the NSTLBO algorithm, the convergence speed and solution accuracy of the two methods are basically unchanged. From the statistical box diagram in Fig. 5, the function dimension is increased to 100 dimensions, and the NSTLBO algorithm still has good stability on f₃ and f₄. Except f₇, f₈, and f₉, it can be seen that the fitness value is the smallest compared to other functions. The fitness value of the NSTLBO algorithm is the smallest.

Analysis of the Experimental Proof of Algorithm NSTLBO-DGDPC

In order to verify the clustering performance of the proposed algorithm, DBSCAN algorithm and DPC algorithm are used to test and compare. In the experiment, the parameters Eps = 0.05, Minps = 25, and the parameters d_c or k use the values of Ref. [36]. In this paper, sixteen standard data sets including eight synthetic data sets and eight real data sets are selected as shown in Table 4. This paper chooses three commonly used clustering evaluation indicators: clustering accuracy (ACC), adjusting mutual information (AMI), and adjusting random index (ARI) [37] and carries out simulation experiments on three clustering algorithms. The experimental results of synthetic data sets are shown in Table 5. The real data set Table 6 shows the clustering effect of the composite data set as shown in Fig. 6.

Table 4 Data set

Full size table

Table 5 Comparison of composite data sets ACC, ARI, and AMI

Full size table

Table 6 Comparison of real data sets ACC, ARI, and AMI

Full size table

From the three clustering evaluation indices in Table 5, compared with DPC algorithm, the ACC value of the proposed algorithm is increased by 0.4% on D31, which can be said that the accuracy is similar to DPC algorithm. Its ARI and AMI values are slightly higher than DPC algorithm. Compared with DBSCAN algorithm, the ACC and AMI values are increased by 26% and 19%, and its ARI values are significantly higher than DPC algorithm. This shows that this algorithm can find clustering centers on complex data sets and divide the data well. On aggregation and spiral synthetic data sets, the clustering evaluation indexes of the three algorithms are equal. On flame synthetic data sets, the values of ACC and ARI are equal to those of DPC algorithm. The values of ACC, ARI, and AMI are significantly higher than those of DBSCAN algorithm. On the Pathbased, R15, and Jain synthetic data sets, the ACC, ARI, and AMI values are significantly higher than the other two algorithms. It shows that compared with DPC algorithm, DBSCAN algorithm can process data sets and achieve better clustering results by combining the improved swarm intelligence algorithm with clustering algorithm.

From the three clustering evaluation indexes in Table 6, the ACC, ARI, and AMI values of the proposed algorithm are higher than those of DPC and DBSCAN algorithms on iris and wine real data. On the real data set of wdbc, the ACC value and ARI value are higher than the other two algorithms, but the AMI value is lower than the DPC algorithm. On the Parkinsons real data set, the three clustering indices are significantly higher than DPC and DBSCAN algorithms. On the real data sets of Parkinsons, Seeds, Pima, Glass, and Segmention, the three clustering indexes are significantly higher than the DPC algorithm and DBSCAN algorithm. On the Glass real data set, the ACC and ARI values of each algorithm are smaller than the ACC and ARI values of the Seeds, Pima, and Segmention real data sets. It also shows that the proposed algorithm can achieve better clustering results in both synthetic and real data sets.

Figure 6 is the clustering effect diagram of DPC algorithm and this algorithm on eight kinds of composite data sets with different complexity. Figure 5a on the D31, aggregation, Jain, and Compand data sets, compared with the DPC algorithm in Fig. 6b, this algorithm can divide the data into true classes. The clustering effect of the two algorithms is similar on the spiral, flame, Pathbased, and R15 data sets.

Conclusion and Future Work

The cutoff distance dc determines the local density and high-density distance of the DPC algorithm, so the setting of the d_c value is very important. In order to solve the DPC algorithm manually set the cutoff distance and select the clustering center manually, the DPC algorithm distance calculation does not consider the impact of data attributes and neighborhoods on the clustering results. Based on the DPC and TLBO algorithms, in this paper we propose a clustering algorithm that combines teaching and learning optimization algorithms and density difference distances. The d_c value range is set to the population search interval, and the NSTLBO algorithm is used to find the optimal d_c value. Second, considering the influence of data attributes and its neighborhood factors, the weighted density difference distance is established and replaced by the Euclidean distance, and finally the weighted distance is used to calculate the local density value of the algorithm. Because the DPC algorithm selects the points with larger high-density distance and larger local density when determining the clustering center, the clustering center is selected by using the standard deviation of the distance and the mean density to separate these data points, thereby improving the correct clustering of the DPC algorithm The results of simulation experiments show that the algorithm in this paper can calculate the number of classes of the real data set without manually setting the d_c value and has good clustering quality and effect.

For our future work, we will focus on an adaptive parameter selection method to avoid manually setting the parameters of the density peak clustering algorithm. In addition, with the increase in data sets, it will cause a lot of overhead to calculate the distance between data points. We will also study how to improve the computing speed of density peak clustering algorithm in dealing with large data sets, whether using high-performance hardware or using MapReduce to process big data in parallel is worth further study.

References

Bousbaci A, Kamel N. Efficient data distribution and results merging for parallel data clustering in mapreduce environment. Appl Intell. 2018;48(8):2408–28.
Article Google Scholar
Qian WN, Zhou AY. Analyzing popular clustering algorithms from different viewpoints. J Soft. 2002;13(8):1382–94.
Google Scholar
Xu JH, Liu H. Wed users clustering analusis based on k-means algorithm. In: Information Networking and Automation (ICINA). IEEE; 2010. p. v26–29.
Guha S, Rastogi R, Shim K. ROCK: a robust clustering algorithm for categorical attributes. In: Proceedings of the IEEE Conference on Data Engineering. 1999.
Ester M, Kriegel HP, Xu X. A density-based algorithm for discovering clusters a density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of international conference on knowledge discovery and data mining. AAAI Press; 1996. p. 226–31.
Kohonen T. The self organizing maps. Proc IEEE. 1990;78(9):1464–80.
Article Google Scholar
Wang W, Yang J, Muntz R. STING: a statistical information grid approach to spatial data mining. In: Proceedings of 23rd VLDB Conference; 1997. p. 186–95.
Fraley C, Raftery AE. Model-based clustering, discriminant analysis and density estimation. J Am Stat Assoc. 2002;97(458):611–31.
Article MathSciNet MATH Google Scholar
Sun HJ, Wang SR, Jiang QS. FCM-based model selection algorithms for determining the number of clusters. Pattern Recognit. 2004;37:2027–37.
Article MATH Google Scholar
Anil KJ. Data clustering: 50 years beyond K-Means. Pattern Recognit Lett. 2010;31(8):651–66.
Article Google Scholar
Sudipto G, Rajeev R, Kyuseok S. Cure: an efficient clustering algorithm for large databases. Inf. Syst. 1998;26(1):35–58.
MATH Google Scholar
Ester M, Kriegel HP, Sander J, et al. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining, vol. 96. AAAI Press; 1996. p. 226–31.
Rodriguez A, Laio A. Clustering by fast search and find of density peaks. Science. 2014;344(6191):1492–6.
Article Google Scholar
Xu ML, Li YH, Li RX, et al. An extended adaptive density peaks clustering for overlap** community detection in social networks. Neurocomputing. 2019;337:287–302.
Article Google Scholar
Du MJ, Ding SF, Xue Y. A robust density peaks clustering algorithm using fuzzy neighborhood. Int J Mach Learn Cybern. 2018;9(7):1131–40.
Article Google Scholar
Li ZJ, Tang YC. Comparative density peaks clustering. Expert Syst Appl. 2018;95:236–47.
Article Google Scholar
Bai L, Cheng XQ, Liang JY, et al. Fast graph clustering with a new description model for community detection. Inf Sci. 2017;71:375–86.
Google Scholar
Reynolds CW. Flocks, herds, and schools: a distributed behavioral model. ACM. 1987;21(4):25–34.
Google Scholar
Gao WF, Liu SY, Huang LL. Enhancing artificial bee colony algorithm using more information-based search equations. Inf Sci. 2014;270:112–33.
Article MathSciNet MATH Google Scholar
Mauder T, Sandera C, Stetina J, et al. Optimization of the quality of continuously cast steel slabs using the firefly algorithm. Mater Technol. 2011;45(4):347–50.
Google Scholar
Kennedy J, Eberhart R. Particle swarm optimization. In: Proceedings of ICNN95-international Conference on Neural Networks. 1995. p. 1942–8.
Rao RV, Savsani VJ, Rai DP. Teaching–learning based optimization: a novel method for constrained mechanical design optimization problems. Comput Aided Des. 2011;43(3):303–15.
Article Google Scholar
Zou F, Chen D, Lu R, et al. Teaching–learning-based optimization with differential and repulsion learning for global optimization and nonlinear modeling. Soft Comput. 2017;1:1–29.
Google Scholar
Sevinç E, Dökeroğlu T. A novel hybrid teaching-learning-based optimization algorithm for the classification of data by using extreme learning machines. Turk J Electr Eng Comput Sci. 2019;27:1523–33.
Article Google Scholar
Kumar Y, Singh PK. A chaotic teaching learning based optimization algorithm for clustering problems. Appl Intell. 2018;49:1036–62.
Article Google Scholar
Kheireddine B, Zoubida B, Tarik H. Improved version of teaching learning-based optimization algorithm using random local search: TLBO-RLS. COMPEL Int J Comput Math Electr Electron Eng. 2019;38:1048–60.
Article Google Scholar
Niu P, Ma Y, Yan S. A modified teaching–learning-based optimization algorithm for numerical function optimization. Int J Mach Learn Cybern. 2018;10:1357–71.
Article Google Scholar
Zhang Z, Huang H, Huang C, et al. TLBO, logarithmic helix and triangular mutation global optimization. Neurol Sci. 2018;31:4435–50.
Google Scholar
Kumar Y, Kuma P. A chaotic teaching learning based optimization algorithm for clustering problems. Appl Intell. 2019;49(3):1036–62.
Article Google Scholar
Zou F, Chen DB, Xu QZ. A survey of teaching–learning-based optimization. Neurocomputing. 2018;335:366–83.
Article Google Scholar
Zang WK, Ren LY, Zhang WQ, et al. Automatic density peaks clustering using DNA genetic algorithm optimized data field and gaussian process. Neurocomputing. 2017;31(8):366–83.
MathSciNet Google Scholar
DeJong KA. An analysis of the behavior of class of genetic adaptive systems. University of Michigan; 1975.
Wan M, Yin SQ, Ta T, et al. Optimized fuzzy clustering by fast search and find of density peaks. In: Proceedings of 3rd IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA); IEEE. 2018. p. 83–7.
Krumhansl C. Concerning the applicability of geometric models to similarity data: the interrelationship between similarity and spatial density. Memory Cognit. 1978;85(5):445–63.
Google Scholar
Li D, Luo K, Sun Z. Fuzzy clustering of new niche fireflies. Comput Eng Sci. 2017;39(05):1005–10.
Google Scholar
**e J, Gao H, **e W, et al. Grant, Robust clustering by detecting density peaks and assigning points based on fuzzy weighted K-nearest neighbors. Inf Sci. 2018;354:19–40.
Article Google Scholar
Nguyen XV, Epps J, Bailey J. Information theoretic measures for clusterings comparison: is a correction for chance necessary. In: Proceedings of the 26th Annual Machine Learning; 2009. p. 1073–80.

Download references

Acknowledgements

This work was financially supported by National Natural Science Foundation of China (NSFC) (Grant No. 61772160). Special Foundation of Scientific and Technological Innovation for Young Scientists of Harbin, China (Grant No. 2017RAQXJ045).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China
Hailong Chen, Miaomiao Ge & Yutong Xue

Authors

Hailong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Miaomiao Ge
View author publications
You can also search for this author in PubMed Google Scholar
Yutong Xue
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hailong Chen.

Ethics declarations

Conflict of interest

The authors declared that there was no conflict of interest regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, H., Ge, M. & Xue, Y. Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning. SN COMPUT. SCI. 1, 172 (2020). https://doi.org/10.1007/s42979-020-00183-2

Download citation

Received: 09 February 2020
Accepted: 27 April 2020
Published: 16 May 2020
DOI: https://doi.org/10.1007/s42979-020-00183-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Clustering Algorithm of Density Difference Optimized by Mixed Teaching and Learning

Abstract

Similar content being viewed by others

Use of Teaching Learning Based Optimization for Data Clustering

A novel density peaks clustering with sensitivity of local density and density-adaptive metric

Modified Teacher Learning Based Optimization Method for Data Clustering

Introduction