Abstract
This research is aimed at interpretable taxonomy extraction from digital assets metadata. The method proposed is based on automated unsupervised decision tree learning. It has the following advantages: i) it allows extracting taxonomies from various data types including numerical, categorical, and textual data, ii) as it uses a decision tree, the method allows classifying digital assets into extracted taxons, iii) the extracted taxonomy is interpretable in terms of the presence or the absence of tokens in assets metadata, iv) the method does not require any extra data, any labeled assets, or v) any manual hyperparameter optimization. Moreover, the extracted taxonomy and classified assets are converted into a knowledge graph for further representation and usage in digital asset management. The method is prototyped, tested on several collections of digital assets, and its usage is demonstrated on one of them.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022301002X/MediaObjects/12202_2023_7043_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022301002X/MediaObjects/12202_2023_7043_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022301002X/MediaObjects/12202_2023_7043_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022301002X/MediaObjects/12202_2023_7043_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1134%2FS199508022301002X/MediaObjects/12202_2023_7043_Fig5_HTML.png)
Notes
https://picvario.com/
REFERENCES
J. Basak and R. Krishnapuram, ‘‘Interpretable hierarchical clustering by constructing an unsupervised decision tree,’’ IEEE Trans. Knowledge Data Eng. 17, 121–132 (2005). https://doi.org/10.1109/TKDE.2005.11
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees (Wadsworth and Brooks, Monterey, CA, 1984). https://doi.org/10.1201/9781315139470
T. Fountain, and M. Lapata, ‘‘Taxonomy induction using hierarchical random graphs,’’ in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2012), pp. 466–476.
R. Fraiman, B. Ghattas, and M. Svarc, ‘‘Interpretable clustering using unsupervised binary trees,’’ Adv. Data Anal. Classif. 7, 125–145 (2013). https://doi.org/10.1007/s11634-013-0129-3
X. Liu, Y. Song, S. Liu, and H. Wang, ‘‘Automatic taxonomy construction from keywords,’’ in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012), pp. 1433–1441. https://doi.org/10.1145/2339530.2339754
P. J. Rousseeuw, ‘‘Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,’’ J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7
J. Shen, Z. Wu, D. Lei, C. Zhang, X. Ren, M. Vanni, B. Sadler, and J. Han, ‘‘HiExpan: Task-guided taxonomy construction by hierarchical tree expansion,’’ in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 2180–2189. https://doi.org/10.1145/3219819.3220115
E. Tsui, W. M. Wang, C. F. Cheung, and A. Lau, ‘‘A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags,’’ Inform. Process. Manage. 46, 44–57 (2010). https://doi.org/10.1016/j.ipm.2009.05.009
C. Yuan and H. Yang, ‘‘Research on K-value selection method of K-means clustering algorithm,’’ J. 2, 226–235 (2019). https://doi.org/10.3390/j2020016
C. Zhang, F. Tao, X. Chen, J. Shen, M. Jiang, B. M. Sadler, M. Vanni, and J. Han, ‘‘TaxoGen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering,’’ in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 2701–2709. https://doi.org/10.1145/3219819.3220064
Funding
This research was supported by Picvario LLCFootnote 1 .
Author information
Authors and Affiliations
Corresponding author
Additional information
(Submitted by E. K. Lipachev)
Rights and permissions
About this article
Cite this article
Baimuratov, I. Interpretable Taxonomy Extraction from Digital Assets Metadata Using Automated Unsupervised Decision Tree Learning. Lobachevskii J Math 44, 86–96 (2023). https://doi.org/10.1134/S199508022301002X
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S199508022301002X