Introduction

In the area of Artificial Intelligence (AI) and numerous scientific fields, it is always challenging to improve the cognitive skills of machines to automate various complex tasks [2].Footnote 1 One of the keys to this problem is to computationally assess semantic similarity/relatedness between things in ways human beings do. In fact, in the last decades, numerous researchers from various fields have studied and developed many different approaches to estimate semantic similarity/relatedness between words. These approaches have been applied to facilitate various natural language processing (NLP) as well as information retrieval (IR) tasks [3] such as document categorization or clustering [4], word sense disambiguation [5], query interpretation [6, 7], information extraction [8] etc. In some occasions, semantic similarity and semantic relatedness are used interchangeably, as both of them could refer to semantic likeness. However, in most works in the literature as well as in this work, the concept of semantic similarity is more specific than semantic relatedness [9]. Generally speaking, the semantic similarity between concepts only states how taxonomically (is-a relationship) near they are, while their semantic relatedness includes any other relations between them such as meronymy, antonymy, cause-effect, etc. besides taxonomy. For example, “car” is similar to “truck”, but is also highly related to “wheel” and “road” which are not high similar to “car”. It is that the semantic relationships implied in semantic similarity and relatedness are not the same, the computational strategies used to assess semantic similarity and relatedness are normally different. The corpus-based semantic metrics mainly rely on the distribution hypothesis that the statistical co-occurrence between word contexts in a corpus reflects the degree of semantic relationship between words [2]. As the co-occurrence could appear with various semantic relations, thus, it is considered that the corpus-based semantic metrics might confuse the semantic similarity with relatedness [10] and are better to apply to estimate the semantic relatedness to some extent. On the other hand, the ontology-based strategies have proved to be the most successful of semantic similarity metric approaches [11], for the ontology-based semantic metrics are formed depending on “is-a” taxonomies which imply the semantic similarity degree between concepts. However, since the semantic graphs or taxonomies could construct based on the public corpus, e.g., Strube and Ponzetto [12] introduced WikiRelate formed as a taxonomy based on the pages of Wikipedia, the standard ontology-based semantic metrics could combine with the corpus-based ones to form hybrid models [13,14,15]. Furthermore, in recent years, deep learning architectures have promoted hybrid models [16, 17]. This work concentrating the ontology-based semantic metrics are not only for their improvements but also to benefit for the study of hybrid semantic ones. With the emergence of BERT [17], semantic similarity research has made significant progress. However, it still has a lot of room for improvement based on the results of the improved models of BERT tested on text standard datasets [18,19,20,21]. The semantic similarity between words is widely used in the construction of these models, thus, this work will also benefit the semantic similarity research of sentence, text, et al.

A large number of ontology-based semantic similarity metrics between words have been proposed and applied in various domains in recent years, but the robust semantic similarity assessments between words remain a challenging task. In most of the related works, ontology-based semantic similarity metrics are briefly divided into three main categories as follows:

  1. (1)

    Feature-based metrics. The essential idea of these metrics originating from the feature model proposed by Tversky [22] is to regard the feature set of a concept to be the core semantic evidence. The same features of the concepts reflect their semantic similarity degree while the different features imply the degree of their dissimilarity.

  2. (2)

    Path-based or edge-counting metrics. The original model was proposed by Rada et al. [23] and developed in the later works of Wu and Palmer [24], Leacock and Chodorow [25], Li et al. [26] and so on. The key idea of this category is to take the length of the shortest path between concepts as the main semantic evidence.

  3. (3)

    IC-based metrics. The classic models of this category came from [9, 27, 28]. Their essential idea is to take the IC of a concept to be the main semantic evidence. Since the intrinsic IC (iIC) was proposed in the pioneering work of Seco et al. [29], the definition of IC has been the principal point of IC-based metric research.

Apart from the aforementioned three categories of ontology-based semantic similarity metrics, many more researchers nowadays try to take advantages of more than one of these semantic evidences to propose new hybrid semantic metrics. The shortest path and the IC are commonly-used semantic evidences to be combined [26, 30,31,32,33]. Furthermore, since the shortest path and the features usually utilize in the definitions of almost all iICs, thus, an iIC could also be regarded to be a hybrid metric model. In the work of Seco et al. [29], the number of the descendants of a concept that could be considered as its feature utilize to define iIC. In the later iICs proposed by Zhou et al. [34], Sebti and Barfroush [35], Sánchez et al. [36], Yuan et al. [37], Hadj Taieb [38] and Zhu and Iglesias [33], the taxonomic features of the concept, e.g., the descendant set, the ancestor set, the subsumed leafs, the entities, are normally used together with the concept’s depth. During the last decade, the iIC has become one of the essential points of semantic metric research. This research tendency is in accordance with the hybrid metrics having become one of the research mainstreams, because their core ideas, i.e., trying to take advantages of more types of semantic evidences, could be regarded the same. Lastra-Díaz and García-Serrano [39] constructed a series of well-found iICs by extracting different taxonomic features based on two axioms.

During the last decade, hybrid ontology-based semantic similarity metrics (including iIC models) have become the mainstream in the area. Most of the state-of-the-art ontology-based metrics took the IC and the shortest path as the highest influenced semantic evidences. However, the state-of-the-art metrics either extracted the taxonomic features building new IC models or complicatedly nonlinearly combined these two types of semantic evidences forming the computational algorithm of the semantic similarity. These newly proposed metrics generally come across a problem that it is usually difficult even impossible to explain the proposed iICs and the complicated algorithms. Furthermore, tuning parameters that utilize in some of the metrics make the explanation problem worse, as Pirró [40] and Sánchez et al. [41] pointed out that tuning parameters could exceedingly influence the measurement results obtained from the methods proposed in [26]. Therefore, the main motive of this work is to construct an improved model of IC-related semantic similarity metrics which remains interpretable and a simple structure meanwhile. Moreover, most of the state-of-the-art ontology-based semantic similarity metrics proposed in the literature just reported their raw results without making the statistical test. In the researches conducting the statistical test, T-Test commonly performs on the sample whose size is usually small. However, T-Test on small sample generally hypothesizes the normal distribution of the sample which does not test in the researches. Therefore, the other motive of this work is to make extensive statistical tests on the correlation coefficients for different IC-related metrics including not only T-Test but also a nonparametric test - Wilcoxon Signed-Rank Test which does not require the hypothesis of the normal distribution of the sample.

In this work, I detailedly reviewed the classical and state-of-the-art ontology-based semantic similarity metrics in recent years. Based on the IC-related semantic similarity metrics, a hybrid ontology-based model denoted as IC+SP is proposed linearly combining the IC-related semantic similarity metric and the shortest path. I developed the improved IC-related semantic similarity metrics based on IC+SP on platform of HESML [1]. WordNet 3.0 is applied as the ontology to measure the semantic similarity of concept pairs. Furthermore, 5 gold benchmarks (including 65 pairs of nouns from Rubenstein and Goodenough [42], 28 pairs of nouns from Miller and Charles [43], 201 pairs of nouns from WordSim353 [44], 65 pairs of nouns from Pirró [40] and 665 pairs of nouns from Sim-Lex-999 [45]) are used as the experimental datasets to evaluate the performances between the improved IC-related metrics based on IC+SP model and their corresponding original ones. Pearson’s and Spearman’s correlation coefficients between computative and human judgemental semantic similarities are both used to compare IC-related metrics to their derivative improved ones formed based on IC+SP model. There are four principal quantitative indicators for comparison in this work, i.e., average value and maximum value of the raw correlation coefficients as well as p-value and confidence interval of statistical test on the raw correlations. The T-Test is commonly applied to be a standard test in related literature, so it also utilizes in this work. However, the set of correlation coefficients generally does not accord with the normal distribution. Therefore, the Wilcoxon Signed-Rank Test is introduced in this work, for it is a nonparametric test needing no hypothesis of the normal distribution. Based on these comparative indicators, it could confirm whether the improved IC-related metric based on IC+SP model statistically significantly outperform its original one or not.

The remainder of the paper organizes as follows. In “Ontology-based semantic similarity metrics” Section reviews the state-of-the-art ontology-based semantic similarity metrics mainly focusing on the IC-related ones and the iICs. In “A hybrid IC-related model IC+SP” Section presents the improved model, IC+SP. In “Evaluation” Section introduces the experimental data and method as well as reports and analyzes the experimental results. The final section summarizes the paper and prospects the future work.

Ontology-based semantic similarity metrics

Ontology-based approaches to measuring semantic similarity between concepts could commonly divide into three categories, which are feature-based approaches, path-based or edge-counting approaches, IC-based approaches. This classification commonly bases on different types of semantic evidence. The state-of-the-art strategies in this research field generally combined different types of semantic evidence to make hybrid semantic similarity metrics or designed new iIC model of the concept utilizing in the IC-related metrics. For the latest and detailed survey, readers could refer to Chapter 3 of the survey book of Harispe et al. [2]. This work focuses on IC-related semantic similarity metrics, and therefore, the feature-based and path-based families of approaches would briefly introduce while IC-related category reviews detailedly.

Feature-based and edge-counting approaches

Feature-based approaches have been introduced into the community of semantic similarity research since the feature model was proposed by Tversky [22]. Generally, the strategy of the feature model is to represent a concept to be a set of its features. Thus, the semantic similarity between a pair of concepts is assessed using a function of the sets of their common features and non-common ones. If A and B are the sets of the concepts \(a'\) and \(b'\)s features, thus the semantic similarity of a and b, \(si{m_{Tversky}}\), could define to be the function (F) of A and B as follows:

$$\begin{aligned} si{m_{Tversky}}(a,b) = F(A \cap B,A - B,B - A) \end{aligned}$$
(1)

In Eq. (1), the function F has two main concrete formulations which are the contrast model (\(si{m_{CM}}\)) and the ratio model (\(si{m_{RM}}\)).

$$\begin{aligned} si{m_{CM}}(a,b) = \alpha f\left( {A \cap B} \right) + \beta f\left( {A - B} \right) + \gamma f(B - A)\nonumber \\ \end{aligned}$$
(2)
$$\begin{aligned} si{m_{RM}}(a,b) = \frac{{f\left( {A \cap B} \right) }}{{f\left( {A \cap B} \right) + \alpha f\left( {A - B} \right) + \beta f\left( {B - A} \right) }}\nonumber \\ \end{aligned}$$
(3)

In an ontology, one popular paradigm is to take the ancestors of a concept as its features. Maedche and Staab [46] as well as Sánchez et al. [41] constructed their feature-based metrics based on this intuitive strategy. Apart from the ancestors representing as the features, the synsets, attributes (features) and neighbor concepts of a concept could also be considered to be its features. For example, Rodriguez and Egenhofer [47] computed the semantic similarity using the weighted sum of different type of feature similarities between two concepts. Each sort of the aforementioned features could calculate one feature similarity using the ratio model. In the work of Petrakis et al. [48], a piecewise function is proposed to assess the semantic similarity in which three different representations, i.e., synsets, glosses and neighbor concepts, of the features utilized. In the work of D’Amato [49], the instances of a concept could be considered to be its features, if they are available. Currently, pure feature-based approaches are rare to propose, but the idea of the feature model could be found in IC-based and hybrid semantic similarity metrics. For instance, from the standpoint of feature model, the metrics proposed by Lin [28] and Jiang and Conrath [

Table 1 IC-based semantic similarity metrics
Table 2 Hybrid IC-based semantic similarity metrics