Abstract
With the rapid development of network technology and the popularization of electronic documents, Chinese text automatic proofreading technology has attracted increasing attention. Automatic proofreading of semantic errors in Chinese text is a key and difficult point in the field of Chinese information processing. Aiming at this problem, we propose a semantic error proofreading method that contains dependency parsing and statistical theory, and construct a two-layer semantic knowledge base to assist error detection and error correction. The two-layer semantic knowledge base includes (1) knowledge base of word collocations containing structured information of sentences extracted from a large-scale corpus; (2) knowledge base of sememe collocations obtained by sememe map** through HowNet. On this basis, cubic association ratio and degree of polymerization are introduced to evaluate the proofreading results to reduce false positives and improve the accuracy of error correction opinions. The experiment result shows that our method will be of great use for the construction of semantic proofreading knowledge base and semantic error automatic proofreading methods.
Supported by the National Natural Science Foundation of China (NSFC No. 61772081).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Wang, D., Song, Y., Li, J., et al.: A hybrid approach to automatic corpus generation for Chinese spelling check. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pp. 2517–2527. ACL, Stroudsburg (2018)
Li, C.W., Chen, J.J., Chang, J.S.: Chinese spelling check based on neural machine translation. In: 32nd Pacific Asia Conference on Language, Information and Computation. ACL, Stroudsburg (2018)
Wang, D., Tay, Y., Zhong L.: Confusion set-guided pointer networks for Chinese spelling check. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pp. 5780–5785. ACL, Stroudsburg (2019)
Ren, H., Yang, L., Xun, E.: A sequence to sequence learning for Chinese grammatical error correction. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018, Part II. LNCS (LNAI), vol. 11109, pp. 401–410. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_36
Zhou, J., Li, C., Liu, H., Bao, Z., Xu, G., Li, L.: Chinese grammatical error correction using statistical and neural models. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018, Part II. LNCS (LNAI), vol. 11109, pp. 117–128. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_10
Li, S., Zhao, J., Shi, G., et al.: Chinese grammatical error correction based on convolutional sequence to sequence model. IEEE Access 7, 72905–72913 (2019)
Luo, W., Luo, Z., Gong, X.: Semantic error checking in automatic proofreading for Chinese texts. In: IEEE International Conference on Systems, Man and Cybernetics, vol. 7, p. 5. IEEE, Piscataway (2002)
Zhang, Y., Zheng, J.: Study of semantic error detecting method for Chinese text. Chin. J. Comput. 40(4), 911–924 (2017)
Liu, L., Chao, C.: Study of automatic proofreading method for non-multi-character word error in Chinese text. Comput. Sci. 43(10), 200–205 (2016)
Cheng, X., Sun, P., Zhu, Q.: The research of Chinese text proofreading system model based on HNC. Microelectron. Comput. 26(10), 49–52 (2009)
Hai, Z.: Research on text semantic feature detection and proofreading. M.S. dissertation, Zhengzhou University, China (2019)
Dong, Z., Dong, Q.: HowNet - a hybrid language and knowledge resource. In: 2003 International Conference on Natural Language Processing and Knowledge Engineering, pp. 820–824. IEEE, Piscataway (2003)
Bloomfield, L.: A set of postulates for the science of language. Language 2(3), 153–164 (1926)
Niu, Y., **e, R., Liu, Z., et al.: Improved word representation learning with sememes. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics, pp. 2049–2058. ACL, Stroudsburg (2017)
Zeng, X., Yang, C., Tu, C., et al.: Chinese LIWC lexicon expansion via hierarchical classification of word embeddings with sememe attention. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 5650–5657. AAAI, Menlo Park (2018)
Tao, Y., Hai, Z., Shi, L., Wei, L.: Study of Chinese word collocation feature extraction and text proofreading. J. Chin. Comput. Syst. 39(11), 2485–2490 (2018)
Oakes, M.: Statics for corpus linguistics, pp. 171–172, Edinburgh. Edinburgh University Press, Edinburgh (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, R., Zhang, Y., Huang, G., Chen, R. (2021). Research on Proofreading Method of Semantic Collocation Error in Chinese. In: Sun, X., Zhang, X., **a, Z., Bertino, E. (eds) Advances in Artificial Intelligence and Security. ICAIS 2021. Communications in Computer and Information Science, vol 1422. Springer, Cham. https://doi.org/10.1007/978-3-030-78615-1_62
Download citation
DOI: https://doi.org/10.1007/978-3-030-78615-1_62
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78614-4
Online ISBN: 978-3-030-78615-1
eBook Packages: Computer ScienceComputer Science (R0)