Abstract
In the fields of social network analysis and knowledge graph, many semi-supervised learning algorithms based on graph convolutional neural network (GCN) have been widely used. Most of these algorithms usually improve the structure of the neural network and the sampling method of each layer of the neural network. However, they don’t pay much attention to the data pre-processing of the algorithm. In the analysis of the input data, the words of different quality in these original data are unevenly distributed. This may obscure some useful data and highlight some irrelevant data. In order to verify the correctness of this hypothesis, the paper proposes a feature matrix compression algorithm (FMC algorithm) for data pre-processing of GCN-based algorithms. The algorithm analyzes and arranges the word columns of the input matrix (the feature of graph) according to the frequency of the word, then merges those words in which the word frequency is smaller, so as to emphasize the role of these words in the graph and optimize the data scale. The present work uses four mainstream datasets in the field and several representative and different algorithms to complete the experiment. The experimental results show that the FMC algorithm achieves better performance.
Similar content being viewed by others
References
F. Fouss, A. Pirotte, J.-M. Renders, M. Saerens, Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. Knowl. Data Eng. 19(3), 355–369 (2007)
L. Tang, H. Liu, Relational learning via latent social dimensions, in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2009), pp. 817–826
S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
P. Sen, G. Namata, L. Getoor, M. Bilgic, B. Galligher, T. Eliassi-Rad, Collective classification in network data. AI Mag 29(3), 93 (2008)
P. Radivojac, W.T. Clark, T.R. Oron, A.M. Schnoes, T. Wittkop, A. Sokolov, K. Graim, C. Funk, K.M. Verspoor, A. Ben-Hur, A large-scale evaluation of computational protein function prediction. Nat Methods 10(3), 221–227 (2013)
F. Lin, W.W. Cohen, Semi-supervised classication of network data using very few labels, in Advances in Social Networks Analysis and Mining (ASONAM), (IEEE Computer Society, 2010), pp. 192–199
A. García-Durán, A. Bordes, N. Usunier, Y. Grandvalet, Combining two and three-way embedding models for link prediction in knowledge bases. J Artif Intell Res 55, 715–742 (2016)
W.L. Hamilton, R. Ying, J. Leskovec, Inductive representation learning on large graphs, in Neural Information Processing Systems (NIPS) (2018)
T.N. Kipf, M. Welling, Semi-supervised classification with graph convolutional networks, in the 5th International Conference on Learning Representations (2016)
J. Chen, T. Ma, C. **ao, FastGCN: Fast learning with graph convolutional networks via importance sampling, in the 6th International Conference on Learning Representations (2018)
S. Deng, H. Rangwala, Y. Ning, Learning Dynamic Context Graphs for Predicting Social Events, in the 25th ACM SIGKDD International Conference (2019)
K. Lei, M. Qin, B. Bai, G. Zhang, M. Yang, GCN-GAN: a non-linear temporal link prediction model for weighted dynamic networks, in IEEE INFOCOM 2019 - IEEE Conference on Computer Communications (2019)
H. Chen, B. Perozzi, R. Al-Rfou, S. Stiena, A tutorial on network embeddings, ar**v preprint ar**v.1808.02590 (2018)
L. Page, S. Brin, R. Motwani, T. Winograd, The pagerank citation ranking: bringing order to the web. Stanford Digital Libraries Working Paper. 9(1), 1-14
T. Mikolov, K. Chen, G. Corrado, J. Dean, Efficient estimation of word representations in vector space. Computer Science. ar**v preprint ar**v:1301.3781
B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online le-arning of social representations, in KDD (2014)
A. Grover, J. Leskovec, Node2vec: Scalable feature learning for networks, in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2016), pp. 855–864
L.F.R. Ribeiro, P.H.P. Saverese, D.R. Figueiredo, struc2vec: Learning node representations from structural identity, in KDD (2017), pp. 385–394
J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, Q. Mei, Line: Large-scale information network embedding, in Proceedings of the 24th International Conference on World Wide Web (2015), pp. 1067–1077
J. Chen, J. Zhu, L. Song, Stochastic Training of Graph Convolutional Networks with Variance. ar**v preprint ar**v:1710.10568 (2017)
M. Belkin, P. Niyogi, Laplacian Eigenmaps and spectral techniques for embedding and clustering. Adv Neural Inf Process Syst 14, 585–591 (2001)
A. Grover, A. Zweig, S. Ermon, Graphite: iterative generative modeling of graphs, in International conference on machine learning (2019), pp. 2434–2444
P. Velickovic, W. Fedus, W.L. Hamilton, P. Liò, Y. Bengio, R.D. Hjelm, Deep graph infomax. ICLR (Poster) 2(3), 4 (2019)
H. Gao, Z. Wang, S. Ji, Large-scale learnable graph convolutional networks, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2018), pp. 1416–1424
Acknowledgements
The authors wish to thank the anonymous reviewers for their detailed feedback and suggestions for improving this work.
Funding
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61272209, 61872164), in part by the Program of Science and Technology Development Plan of Jilin Province of China under Grant 20190302032GX, and in part by the Fundamental Research Funds for the Central Universities (Jilin University).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
Authors have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Wang, H., Dong, LY., Ma, XT. et al. A Graph Attribute Aggregation Method based on Feature Engineering. J. Inst. Eng. India Ser. B 103, 711–719 (2022). https://doi.org/10.1007/s40031-021-00698-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40031-021-00698-z