Abstract
In this paper, we consider the problem of sentence difficulty analysis from various angles. Past works have endeavored to design deterministic scoring algorithms depending only on semantic and syntactic information. We propose instead not only to hire local feature space representing individual sentence with its syntactic and semantic structure, but also to consider global distributional difference among corpora. For the local feature space, we select 28 linguistic features and transform them into conjuncted and discretized form. By applying global score classification, we can show its much improved results. We test our proposed model to 1,000 sentences and get much higher accuracy than traditional learning models such as SVM and AdaBoost.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bormuth, J.R.: Readability: A New Approach. Reading Research Quarterly 1(3), 79–132 (1966)
Klee, T., Fitzgerald, D.: The Relation between Grammatical Development and Mean Length of Utterance in Morphemes. Journal of Child Language 12, 251–269 (1985)
Taylor, W.L.: Cloze Procedure: A New Tool for Measuring Readability. Journalism Quarterly 30, 415–433 (1953)
Dubay, W.H.: The Principles of Readability. Impact Information, Costa Mesa (2004)
Kireyev, K., Landauer, T.K.: Word Maturity: Computational Modeling of Word Knowledge. In: 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT (2011)
Roark, B., Bachrach, A., Cardenas, C., Pallier, C.: Deriving Lexical and Syntactic Expectation-based Measures for Psycholinguistic Modeling via Incremental Top-down Parsing. In: 2009 Conference on Empirical Methods in Natural Language Processing, pp. 324–333 (2009)
Mitchell, J., Lapata, M., Demberg, V., Keller, F.: Syntactic and Semantic Factors in Processing Difficulty: An Integrated Measure. In: 48th Annual Meeting of the Association for Computational Linguistics, pp. 196–206 (2010)
Catlett, J.: On Changing Continuous Attributes into Ordered Discrete Attributes. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 164–178. Springer, Heidelberg (1991)
Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-valued Attributes for Classification Leaning. In: International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)
Kullback, S., Leibler, R.A.: On Information and Sufficiency. Annals of Mathematical Statistics 22(1), 79–86 (1951)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kim, YB., Kim, Y., Kim, YS. (2012). Sentence Difficulty Analysis with Local Feature Space and Global Distributional Difference. In: Lee, G., Howard, D., Kang, J.J., Ślęzak, D. (eds) Convergence and Hybrid Information Technology. ICHIT 2012. Lecture Notes in Computer Science, vol 7425. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32645-5_89
Download citation
DOI: https://doi.org/10.1007/978-3-642-32645-5_89
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-32644-8
Online ISBN: 978-3-642-32645-5
eBook Packages: Computer ScienceComputer Science (R0)