Log in

Subgraph generation applied in GraphSAGE deal with imbalanced node classification

  • Neural Networks
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

In graph neural network applications, GraphSAGE applies inductive learning and has been widely applied in important research topics such as node classification. The subgraph of nodes directly affects the classification performance for GraphSAGE since it applies aggregation function to obtain embedding from the neighbors’ feature. In many practical applications, the uneven class distribution of nodes makes it difficult for graph neural network to fully learn the topology and attribute of the minority, which limits the classification performance. Aiming at the problem of imbalanced node classification in GraphSAGE, we propose a new graph over-sampling algorithm called subgraph generation by conditional generative adversarial network (SG-CGAN). SG-CGAN learns the hidden layer expression of different nodes through GraphSAGE and trains conditional generative adversarial network (CGAN) through the nodes’ hidden vector and related subgraph. Meanwhile, the hidden synthetic data are generated as input of CGAN to generate subgraphs of the minority, and retrain the GraphSAGE by adding the synthetic subgraphs. In the experiments on five graph datasets with first-order neighbors, the average improvement in ACC, macro-F1, and micro-F1 was \(1.25\%\), \(4.44\%\), and \(1.59\%\), respectively, compared to not adding synthetic data. In the second-order neighbor experiments, the percentages were \(0.75\%\), \(3.58\%\), and \(2.1\%\), verifying the effectiveness of the SG-CGAN generated data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Algorithm 1
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Data availability

This publication is supported by multiple datasets, which are openly available at locations cited in the reference section.

Code availability

The source code used in this work is available on Github (https://github.com/KaiHuangMO/SGCGAN.git).

Notes

  1. http://www.blogcatalog.com.

References

  • Abedin MZ, Guotai C, Hajek P et al (2023) Combining weighted smote with ensemble learning for the class-imbalanced prediction of small business credit risk. Complex Intell Syst 9(4):3559–3579

    Article  Google Scholar 

  • Ando S, Huang CY (2017) Deep over-sampling framework for classifying imbalanced data. In: Ceci M, Hollmén J, Todorovski L et al (eds) Machine learning and knowledge discovery in databases. Springer International Publishing, Cham, pp 770–785

    Chapter  Google Scholar 

  • Bao Y, Yang S (2023) Two novel smote methods for solving imbalanced classification problems. IEEE Access 11:5816–5823

    Article  Google Scholar 

  • Barua S, Islam MM, Yao X et al (2012) Mwmote-majority weighted minority oversampling technique for imbalanced data set learning. IEEE Trans Knowl Data Eng 26(2):405–425

    Article  Google Scholar 

  • Chawla NV, Bowyer KW, Hall LO et al (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357

    Article  Google Scholar 

  • Ding H, Sun Y, Wang Z et al (2023) A gan and ensemble learning-based hybrid approach for imbalanced data classification. Inform Process Manage 60(2):103–235

    Article  Google Scholar 

  • Dong Y, **ao H, Dong Y (2022) Sa-cgan: an oversampling method based on single attribute guided conditional gan for multi-class imbalanced learning. Neurocomputing 472:326–337

    Article  Google Scholar 

  • Georgios Douzas, Bacao F et al (2018) Effective data generation for imbalanced learning using conditional generative adversarial networks. Exp Syst Appl 91:464–71

    Article  Google Scholar 

  • El Alaoui D, Riffi J, Sabri A et al (2022) Deep graphsage-based recommendation system: jum** knowledge connections with ordinal aggregation network. Neural Comput Appl 4(14):11679–90

    Article  Google Scholar 

  • Elreedy D, Atiya AF, Kamalov F (2023) A theoretical distribution analysis of synthetic minority oversampling technique (smote) for imbalanced learning. Mach Learn. https://doi.org/10.1007/s10994-022-06296-4

    Article  Google Scholar 

  • Fan SKS, Tsai DM, Yeh PC (2023) Effective variational-autoencoder-based generative models for highly imbalanced fault detection data in semiconductor manufacturing. IEEE Trans Semicond Manuf 36(2):205–14

    Article  Google Scholar 

  • Fey M, Lenssen JE (2019) Fast graph representation learning with pytorch geometric. ar**v preprint ar**v:1903.02428

  • Fu S, Tian Y, Tang J et al (2023) Cost-sensitive learning with modified stein loss function. Neurocomputing 525:57–75

    Article  Google Scholar 

  • Goodfellow I, Pouget-Abadie J, Mirza M et al (2020) Generative adversarial networks. Commun ACM 63(11):139–144

    Article  MathSciNet  Google Scholar 

  • Guan H, Zhao L, Dong X et al (2023) Extended natural neighborhood for smote and its variants in imbalanced classification. Eng Appl Artif Intell 124(106):570

    Google Scholar 

  • Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Adv Neural inform Process Syst 30

  • Han Q, Liu H, Huang M et al (2023) Heart disease prediction based on mwmote and res-bigru models. In: 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), IEEE, pp 563–569

  • Hu Y, Qu A, Work D (2022) Detecting extreme traffic events via a context augmented graph autoencoder. ACM Trans Intell Syst Technol (TIST) 13(6):1–23

    Article  Google Scholar 

  • Huang G, Jafari AH (2023) Enhanced balancing gan: minority-class image generation. Neural Comput Appl 35(7):5145–5154

    Article  Google Scholar 

  • Huang K, Wang X (2022) Ada-incvae: improved data generation using variational autoencoder for imbalanced classification. Appl Intell 52(3):2838–2853

    Article  Google Scholar 

  • Isola P, Zhu JY, Zhou T et al (2016) Image-to-image translation with conditional adversarial networks. In: IEEE Conference on Computer Vision & Pattern Recognition

  • Juan X, Zhou F, Wang W et al (2023) Ins-gnn: Improving graph imbalance learning with self-supervision. Inf Sci 637(118):935

    Google Scholar 

  • Lehne B, Schlitt T (2009) Protein-protein interaction databases: kee** up with growing interactomes. Hum Genom 3(3):1–7

    Article  Google Scholar 

  • Lo WW, Layeghy S, Sarhan M et al (2022) E-graphsage: A graph neural network based intrusion detection system for iot. In: NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, IEEE, pp 1–9

  • Lu C, Reddy CK, Wang P et al (2023) Multi-label clinical time-series generation via conditional gan. IEEE Trans Knowl Data Eng

  • Mernyei P, Cangea C (2020) Wiki-cs: A wikipedia-based benchmark for graph neural networks. ar**v preprint ar**v:2007.02901

  • Namata G, London B, Getoor L et al (2012) Query-driven active surveying for collective classification. In: 10th International Workshop on Mining and Learning with Graphs, p 1

  • Perozzi B, Al-Rfou R, Skiena S (2014) Deepwalk: Online learning of social representations. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 701–710

  • Qu L, Zhu H, Zheng R et al (2021) Imgagn: Imbalanced network embedding via generative adversarial graph networks. In: Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pp 1390–1398

  • Rafatirad S, Homayoun H, Chen Z et al (2022) Graph learning. Machine learning for computer scientists and data analysts. Springer, pp 277–304

    Chapter  Google Scholar 

  • Ren Z, Zhu Y, Liu Z et al (2023) Few-shot gan: improving the performance of intelligent fault diagnosis in severe data imbalance. IEEE Trans Instrum Measure 72:1–4

    Google Scholar 

  • Sen P, Namata G, Bilgic M et al (2008) Collective classification in network data. AI Mag 29(3):93–93

    Google Scholar 

  • Shi M, Ding C, Wang R et al (2023) Graph embedding deep broad learning system for data imbalance fault diagnosis of rotating machinery. Reliab Eng Syst Saf 240(109):601

    Google Scholar 

  • Sun Z, Zhang H, Bai J et al (2023) A discriminatively deep fusion approach with improved conditional gan (im-cgan) for facial expression recognition. Pattern Recogn 135(109):157

    Google Scholar 

  • Thakur PS, Jadeja M, Chouhan SS (2024) Cbret: a cluster-based resampling technique for dealing with imbalanced data in code smell prediction. Knowl-Based Syst 286:111390

    Article  Google Scholar 

  • Tomek I (2007) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern SMC 6(6):448–452

    MathSciNet  Google Scholar 

  • Velickovic P, Cucurull G, Casanova A et al (2017) Graph attention networks. Stat 1050:20

    Google Scholar 

  • Wang H, Li P, Lang X et al (2023) Ftgan: a novel gan-based data augmentation method coupled time-frequency domain for imbalanced bearing fault diagnosis. IEEE Trans Instrum Meas 72:1–14

    Google Scholar 

  • Welling M, Kipf TN (2016) Semi-supervised classification with graph convolutional networks. In: J. International Conference on Learning Representations (ICLR 2017)

  • Wu L, Lin H, Gao Z et al (2021) Graphmixup: Improving class-imbalanced node classification on graphs by self-supervised context prediction. ar**v preprint ar**v:2106.11133

  • **a F, Sun K, Yu S et al (2021) Graph learning: a survey. IEEE Trans Artif Intell 2(2):109–127

    Article  Google Scholar 

  • **e Liu H, Zeng S et al (2021) A novel progressively undersampling method based on the density peaks sequence for imbalanced data. Knowl-Based Syst 213(106):689

  • Yan M, Li N (2023) Borderline-margin loss based deep metric learning framework for imbalanced data. Appl Intell 53(2):1487–1504

  • Zhao T, Zhang X, Wang S (2021a) Graphsmote: Imbalanced node classification on graphs with graph neural networks. In: Proceedings of the 14th ACM international conference on web search and data mining, pp 833–841

  • Zhao Y, Hao K, Xs Tang et al (2021) A conditional variational autoencoder based self-transferred algorithm for imbalanced classification. Knowl-Based Syst 218(106):756

    Google Scholar 

  • Zhu Z, **ng H, Xu Y (2023) Balanced neighbor exploration for semi-supervised node classification on imbalanced graph data. Inf Sci 631:31–44

    Article  Google Scholar 

Download references

Funding

Kai Huang reports financial support was provided by Natural Science Foundation (3502Z202372018) of **amen, China. Kai Huang reports financial support was provided by Department of Education (JAT232012) of Fujian Province of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kai Huang.

Ethics declarations

Conflict of interest

The authors have no conflict of interest to declare that are relevant to the content of this article.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Huang, K., Chen, C. Subgraph generation applied in GraphSAGE deal with imbalanced node classification. Soft Comput (2024). https://doi.org/10.1007/s00500-024-09797-7

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00500-024-09797-7

Keywords

Navigation