Log in

Interpretable Taxonomy Extraction from Digital Assets Metadata Using Automated Unsupervised Decision Tree Learning

  • Published:
Lobachevskii Journal of Mathematics Aims and scope Submit manuscript

Abstract

This research is aimed at interpretable taxonomy extraction from digital assets metadata. The method proposed is based on automated unsupervised decision tree learning. It has the following advantages: i) it allows extracting taxonomies from various data types including numerical, categorical, and textual data, ii) as it uses a decision tree, the method allows classifying digital assets into extracted taxons, iii) the extracted taxonomy is interpretable in terms of the presence or the absence of tokens in assets metadata, iv) the method does not require any extra data, any labeled assets, or v) any manual hyperparameter optimization. Moreover, the extracted taxonomy and classified assets are converted into a knowledge graph for further representation and usage in digital asset management. The method is prototyped, tested on several collections of digital assets, and its usage is demonstrated on one of them.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. https://picvario.com/

REFERENCES

  1. J. Basak and R. Krishnapuram, ‘‘Interpretable hierarchical clustering by constructing an unsupervised decision tree,’’ IEEE Trans. Knowledge Data Eng. 17, 121–132 (2005). https://doi.org/10.1109/TKDE.2005.11

    Article  Google Scholar 

  2. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees (Wadsworth and Brooks, Monterey, CA, 1984). https://doi.org/10.1201/9781315139470

    Book  MATH  Google Scholar 

  3. T. Fountain, and M. Lapata, ‘‘Taxonomy induction using hierarchical random graphs,’’ in Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (2012), pp. 466–476.

  4. R. Fraiman, B. Ghattas, and M. Svarc, ‘‘Interpretable clustering using unsupervised binary trees,’’ Adv. Data Anal. Classif. 7, 125–145 (2013). https://doi.org/10.1007/s11634-013-0129-3

    Article  MathSciNet  MATH  Google Scholar 

  5. X. Liu, Y. Song, S. Liu, and H. Wang, ‘‘Automatic taxonomy construction from keywords,’’ in Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2012), pp. 1433–1441. https://doi.org/10.1145/2339530.2339754

  6. P. J. Rousseeuw, ‘‘Silhouettes: A graphical aid to the interpretation and validation of cluster analysis,’’ J. Comput. Appl. Math. 20, 53–65 (1987). https://doi.org/10.1016/0377-0427(87)90125-7

    Article  MATH  Google Scholar 

  7. J. Shen, Z. Wu, D. Lei, C. Zhang, X. Ren, M. Vanni, B. Sadler, and J. Han, ‘‘HiExpan: Task-guided taxonomy construction by hierarchical tree expansion,’’ in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 2180–2189. https://doi.org/10.1145/3219819.3220115

  8. E. Tsui, W. M. Wang, C. F. Cheung, and A. Lau, ‘‘A concept-relationship acquisition and inference approach for hierarchical taxonomy construction from tags,’’ Inform. Process. Manage. 46, 44–57 (2010). https://doi.org/10.1016/j.ipm.2009.05.009

    Article  Google Scholar 

  9. C. Yuan and H. Yang, ‘‘Research on K-value selection method of K-means clustering algorithm,’’ J. 2, 226–235 (2019). https://doi.org/10.3390/j2020016

  10. C. Zhang, F. Tao, X. Chen, J. Shen, M. Jiang, B. M. Sadler, M. Vanni, and J. Han, ‘‘TaxoGen: Unsupervised topic taxonomy construction by adaptive term embedding and clustering,’’ in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 2701–2709. https://doi.org/10.1145/3219819.3220064

Download references

Funding

This research was supported by Picvario LLCFootnote 1 .

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to I. Baimuratov.

Additional information

(Submitted by E. K. Lipachev)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baimuratov, I. Interpretable Taxonomy Extraction from Digital Assets Metadata Using Automated Unsupervised Decision Tree Learning. Lobachevskii J Math 44, 86–96 (2023). https://doi.org/10.1134/S199508022301002X

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1134/S199508022301002X

Keywords:

Navigation