Interpretable Taxonomy Extraction from Digital Assets Metadata Using Automated Unsupervised Decision Tree Learning

Baimuratov, I.

doi:10.1134/S199508022301002X

Interpretable Taxonomy Extraction from Digital Assets Metadata Using Automated Unsupervised Decision Tree Learning

Published: 17 May 2023

Volume 44, pages 86–96, (2023)
Cite this article

Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

I. Baimuratov¹

75 Accesses
1 Altmetric
Explore all metrics

Abstract

This research is aimed at interpretable taxonomy extraction from digital assets metadata. The method proposed is based on automated unsupervised decision tree learning. It has the following advantages: i) it allows extracting taxonomies from various data types including numerical, categorical, and textual data, ii) as it uses a decision tree, the method allows classifying digital assets into extracted taxons, iii) the extracted taxonomy is interpretable in terms of the presence or the absence of tokens in assets metadata, iv) the method does not require any extra data, any labeled assets, or v) any manual hyperparameter optimization. Moreover, the extracted taxonomy and classified assets are converted into a knowledge graph for further representation and usage in digital asset management. The method is prototyped, tested on several collections of digital assets, and its usage is demonstrated on one of them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Notes

https://picvario.com/

REFERENCES

J. Basak and R. Krishnapuram, ‘‘Interpretable hierarchical clustering by constructing an unsupervised decision tree,’’ IEEE Trans. Knowledge Data Eng. 17, 121–132 (2005). https://doi.org/10.1109/TKDE.2005.11
Article Google Scholar
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees (Wadsworth and Brooks, Monterey, CA, 1984). https://doi.org/10.1201/9781315139470
Book MATH Google Scholar
T. Fountain, and M. Lapata, ‘‘Taxonomy induction using hierarchical random graphs,’’ in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2012), pp. 466–476.
R. Fraiman, B. Ghattas, and M. Svarc, ‘‘Interpretable clustering using unsupervised binary trees,’’ Adv. Data Anal. Classif. 7, 125–145 (2013). https://doi.org/10.1007/s11634-013-0129-3
Article MathSciNet MATH Google Scholar
X. Liu, Y. Song, S. Liu, and H. Wang, ‘‘Automatic taxonomy construction from keywords,’’ in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012), pp. 1433–1441. https://doi.org/10.1145/2339530.2339754
P. J. Rousseeuw, ‘‘Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,’’ J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7
Article MATH Google Scholar
J. Shen, Z. Wu, D. Lei, C. Zhang, X. Ren, M. Vanni, B. Sadler, and J. Han, ‘‘HiExpan: Task-guided taxonomy construction by hierarchical tree expansion,’’ in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 2180–2189. https://doi.org/10.1145/3219819.3220115
E. Tsui, W. M. Wang, C. F. Cheung, and A. Lau, ‘‘A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags,’’ Inform. Process. Manage. 46, 44–57 (2010). https://doi.org/10.1016/j.ipm.2009.05.009
Article Google Scholar
C. Yuan and H. Yang, ‘‘Research on K-value selection method of K-means clustering algorithm,’’ J. 2, 226–235 (2019). https://doi.org/10.3390/j2020016
C. Zhang, F. Tao, X. Chen, J. Shen, M. Jiang, B. M. Sadler, M. Vanni, and J. Han, ‘‘TaxoGen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering,’’ in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 2701–2709. https://doi.org/10.1145/3219819.3220064

Download references

Funding

This research was supported by Picvario LLC^{Footnote 1} .

Author information

Authors and Affiliations

St. Petersburg National Research University of Information Technologies, Mechanics and Optics, 197101, St. Petersburg, Russia
I. Baimuratov

Authors

I. Baimuratov
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to I. Baimuratov.

Additional information

(Submitted by E. K. Lipachev)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baimuratov, I. Interpretable Taxonomy Extraction from Digital Assets Metadata Using Automated Unsupervised Decision Tree Learning. Lobachevskii J Math 44, 86–96 (2023). https://doi.org/10.1134/S199508022301002X

Download citation

Received: 19 November 2022
Revised: 29 November 2022
Accepted: 10 December 2022
Published: 17 May 2023
Issue Date: January 2023
DOI: https://doi.org/10.1134/S199508022301002X

Keywords:

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Interpretable Taxonomy Extraction from Digital Assets Metadata Using Automated Unsupervised Decision Tree Learning

Abstract

Access this article

Subscribe and save

Buy Now

Notes

REFERENCES

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Subscribe and save

Buy Now

Search

Navigation