Log in

Building hierarchical class structures for extreme multi-class learning

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

Class hierarchical structures play a significant role in large and complex tasks of machine learning. Existing studies on the construction of such structures follow a two-stage strategy. The category similarities are first computed with a certain assumption, and the group partition algorithm is then performed with some hyper-parameters to control the shape of class hierarchy. Despite their effectiveness in many cases, these methods suffer from two problems: (1) optimizing the two-stage objective to obtain the structure is sub-optimal; (2) hyper-parameters make the search space too large to find the optimal structure efficiently. In this paper, we propose a unified and dynamic framework to address these problems, which can: (1) jointly optimize the category similarity and group partition; (2) obtain the class hierarchical structure dynamically without any hyper-parameters. The framework replaces the traditional category similarity with the sample similarity, and constrains samples from the same atomic category partitioned to the same super-category. We theoretically prove that, within our framework, the sample similarity is equivalent to the category similarity and can balance the partitions in terms of the number of samples. Further, we design a modularity-based partition optimization algorithm that can automatically determine the number of partitions on each level. Extensive experimental results on multiple image classification datasets show that the hierarchical structure constructed by the proposed method achieves better accuracy and efficiency compared to existing methods. Additionally, the hierarchy obtained by the proposed method can benefit long-tail learning scenarios due to the balanced partition on samples.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

All data used in this paper is publicly available and can be found in the cited paper.

Code availability

Code is available  at https://github.com/wangyuTJU/greedyIsolation.

References

  1. Zhai J, Zhang S, Wang C (2017) The classification of imbalanced large data sets based on mapreduce and ensemble of elm classifiers. Int J Mach Learn Cybern 8(3):1009–1017

    Article  Google Scholar 

  2. Dabbu M, Karuppusamy L, Pulugu D, Vootla SR, Reddyvari VR (2022) Water atom search algorithm-based deep recurrent neural network for the big data classification based on spark architecture. Int J Mach Learn Cybern 13(8):2297–2312

  3. Pan L, Wang S, Ding Y, Zhao L, Song A (2022) A universal emotion recognition method based on feature priority evaluation and classifier reinforcement. Int J Mach Learn Cybern 13(10):3225–3237

  4. Zheng Y, Fan J, Zhang J, Gao X (2017) Hierarchical learning of multi-task sparse metrics for large-scale image classification. Pattern Recogn 67:97–109

    Article  Google Scholar 

  5. Zhou Y, Hu Q, Wang Y (2018) Deep super-class learning for long-tail distributed image classification. Pattern Recogn 80:118–128

    Article  Google Scholar 

  6. Lin Y, Liu H, Zhao H, Hu Q, Zhu X, Wu X (2022) Hierarchical feature selection based on label distribution learning. IEEE Transact Knowledge Data Eng

  7. Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. In: ICLR 2016 : International Conference on Learning Representations 2016

  8. Deng J, Krause J, Berg AC, Fei-Fei L (2012) Hedging your bets: Optimizing accuracy-specificity trade-offs in large scale visual recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3450–3457. https://doi.org/10.1109/CVPR.2012.6248086

  9. Tenenbaum JB, Kemp C, Griffiths TL, Goodman ND (2011) How to grow a mind: statistics, structure, and abstraction. Science 331(6022):1279–1285

    Article  MathSciNet  MATH  Google Scholar 

  10. Lin Y, Hu Q, Liu J, Zhu X, Wu X (2021) Mulfe: multi-label learning via label-specific feature space ensemble. ACM Transact Knowledge Discovery Data (TKDD) 16(1):1–24

    Google Scholar 

  11. Bellmund JL, Gärdenfors P, Moser EI, Doeller CF (2018) Navigating cognition: spatial codes for human thinking. Science 362(6415):6766

    Article  Google Scholar 

  12. Ye Q, Shi W, Qu K, He H, Zhuang W, Shen X (2021) Joint ran slicing and computation offloading for autonomous vehicular networks: a learning-assisted hierarchical approach. IEEE Open J Vehicular Technol 2:272–288

    Article  Google Scholar 

  13. Al-taezi M, Zhu P, Hu Q, Wang Y, Al-Badwi A (2021) Self-paced hierarchical metric learning (sphml). Int J Mach Learn Cybern 12(9):2529–2541

    Article  Google Scholar 

  14. Xu Z, Zhang B, Li D, Yue X (2022) Hierarchical multilabel classification by exploiting label correlations. Int J Mach Learn Cybern 13(1):115–131

    Article  Google Scholar 

  15. Fu S, Wang G, Xu J (2021) hier2vec: interpretable multi-granular representation learning for hierarchy in social networks. Int J Mach Learn Cybern 12(9):2543–2557

    Article  Google Scholar 

  16. Zhang X, Zhou Y, Tang X, Fan Y (2022) Three-way improved neighborhood entropies based on three-level granular structures. Int J Mach Learn Cybern 13(7):1861–1890

  17. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255

  18. Chen H, Wang Y, Hu Q (2022) Multi-granularity regularized re-balancing for class incremental learning. IEEE Transact Knowledge Data Eng. https://doi.org/10.1109/TKDE.2022.3188335

    Article  Google Scholar 

  19. Wang Y, Wang Z, Hu Q, Zhou Y, Su H (2022) Hierarchical semantic risk minimization for large-scale classification. IEEE Transact Cybern 52(9):9546–9558. https://doi.org/10.1109/TCYB.2021.3059631

    Article  Google Scholar 

  20. Bengio S, Weston J, Grangier D (2010) Label embedding trees for large multi-class tasks. In: Advances in Neural Information Processing Systems 23, pp. 163–171

  21. Liu Y, Dou Y, ** R, Li R (2018) Visual confusion label tree for image classification. In: 2018 IEEE International Conference on Multimedia and Expo (ICME), pp. 1–6

  22. Zhao H, Guo S, Lin Y (2021) Hierarchical classification of data with long-tailed distributions via global and local granulation. Inf Sci 581:536–552. https://doi.org/10.1016/j.ins.2021.09.059

    Article  MathSciNet  Google Scholar 

  23. Fellbaum C (2000) Wordnet : an electronic lexical database. Language 76(3):706

    Article  MATH  Google Scholar 

  24. Zhang C, Cheng J, Tian Q (2018) Image-level classification by hierarchical structure learning with visual and semantic similarities. Inf Sci 422:271–281. https://doi.org/10.1016/j.ins.2017.09.024

    Article  MathSciNet  Google Scholar 

  25. Li L-J, Wang C, Lim Y, Blei DM, Fei-Fei L (2010) Building and using a semantivisual image hierarchy. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3336–3343

  26. Naphade M, Smith JR, Tesic J, Chang S-F, Hsu W, Kennedy L, Hauptmann A, Curtis J (2006) Large-scale concept ontology for multimedia. IEEE Multimedia 13(3):86–91

    Article  Google Scholar 

  27. Sun M, Huang W, Savarese S (2013) Find the best path: An efficient and accurate classifier for image hierarchies. In: 2013 IEEE International Conference on Computer Vision, pp. 265–272

  28. Lei H, Mei K, Zheng N, Dong P, Zhou N, Fan J (2014) Learning group-based dictionaries for discriminative image representation. Pattern Recogn 47(2):899–913

    Article  MATH  Google Scholar 

  29. Griffin G, Perona P (2008) Learning and using taxonomies for fast visual categorization. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8

  30. Yan Z, Zhang H, Piramuthu R, Jagadeesh V, DeCoste D, Di W, Yu Y (2015) Hd-cnn: Hierarchical deep convolutional neural networks for large scale visual recognition. In: 2015 IEEE International Conference on Computer Vision (ICCV), pp. 2740–2748

  31. Deng J, Satheesh S, Berg AC, Li F (2011) Fast and balanced: Efficient label tree learning for large scale object recognition. In: Advances in Neural Information Processing Systems 24, pp. 567–575

  32. Liu B, Sadeghi F, Tappen M, Shamir O, Liu C (2013) Probabilistic label trees for efficient large scale image classification. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition, pp. 843–850

  33. Fan J, Zhou N, Peng J, Gao L (2015) Hierarchical learning of tree classifiers for large-scale plant species identification. IEEE Trans Image Process 24(11):4172–4184

    Article  MathSciNet  MATH  Google Scholar 

  34. Fan J, Zhao T, Kuang Z, Zheng Y, Zhang J, Yu J, Peng J (2017) Hd-mtl: hierarchical deep multi-task learning for large-scale visual recognition. IEEE Trans Image Process 26(4):1923–1938

    Article  MathSciNet  MATH  Google Scholar 

  35. Qu Y, Lin L, Shen F, Lu C, Wu Y, **e Y, Tao D (2017) Joint hierarchical category structure learning and large-scale image classification. IEEE Trans Image Process 26(9):4331–4346

    Article  MathSciNet  MATH  Google Scholar 

  36. Frey BJ, Dueck D (2007) Clustering by passing messages between data points. Science 315(5814):972–976

    Article  MathSciNet  MATH  Google Scholar 

  37. Zheng Y, Chen Q, Fan J, Gao X (2020) Hierarchical convolutional neural network via hierarchical cluster validity based visual tree learning. Neurocomputing 409:408–419

    Article  Google Scholar 

  38. Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech 2008(10):10008

    Article  MATH  Google Scholar 

  39. Newman MEJ (2004) Fast algorithm for detecting community structure in networks. Phys Rev E 69(6):066133

    Article  Google Scholar 

  40. Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):66111

    Article  Google Scholar 

  41. Brandes U, Delling D, Gaertler M, Goerke R, Hoefer M, Nikoloski Z, Wagner D (2006) Maximizing modularity is hard. ar**v preprint ar**v:physics/0608255

  42. Wang S, Siskind JM (2003) Image segmentation with ratio cut. IEEE Trans Pattern Anal Mach Intell 25(6):675–690

    Article  Google Scholar 

  43. Krizhevsky A (2009) Learning Multiple Layers of Features from Tiny Images. Master thesis

  44. **ao J, Hays J, Ehinger KA, Oliva A, Torralba A (2010) Sun database: Large-scale scene recognition from abbey to zoo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3485–3492

  45. Pouransari H, Ghili S (2014) Tiny imagenet visual recognition challenge. CS 231N

  46. Cui Y, Jia M, Lin T-Y, Song Y, Belongie S (2019) Class-balanced loss based on effective number of samples. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9268–9277

  47. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778

  48. Clauset A, Moore C, Newman MEJ (2008) Hierarchical structure and the prediction of missing links in networks. Nature 453(7191):98–101

    Article  Google Scholar 

  49. Wang Y, Liu R, Lin D, Chen D, Li P, Hu Q, Chen CLP (2021) Coarse-to-fine: Progressive knowledge transfer-based multitask convolutional neural network for intelligent large-scale fault diagnosis. IEEE Transactions on Neural Networks and Learning Systems, 1–14. https://doi.org/10.1109/TNNLS.2021.3100928

  50. Wu A, Han Y, Zhu L, Yang Y (2021) Instance-invariant domain adaptive object detection via progressive disentanglement. IEEE Trans Pattern Anal Mach Intell 44(8):4178–4193

  51. Wu A, Zhao S, Deng C, Liu W (2021) Generalized and discriminative few-shot object detection via svd-dictionary enhancement. Adv Neural Inf Process Syst 34:6353–6364

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Scientific Foundation of China (NSFC) under Grants 62106174, and 61732011, and in part by the China Postdoctoral Science Foundation under Grants 2021TQ0242 and 2021M690118.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, H., Wang, Y. & Hu, Q. Building hierarchical class structures for extreme multi-class learning. Int. J. Mach. Learn. & Cyber. 14, 2575–2590 (2023). https://doi.org/10.1007/s13042-023-01783-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01783-z

Keywords

Navigation