An approach to text data categorization based on the ideas of J.S. Mill

Lyfenko, N. D.

doi:10.3103/S0005105515060035

An approach to text data categorization based on the ideas of J.S. Mill

Published: 05 February 2016

Volume 49, pages 202–212, (2015)
Cite this article

Automatic Documentation and Mathematical Linguistics Aims and scope

N. D. Lyfenko¹

49 Accesses
3 Citations
Explore all metrics

Abstract

The problem of the automatic categorization of text documents in the natural language is considered. The categorization is made by a method that is based on ideas of J.S. Mill. This technique uses the general principles (but not the technical details) of the JSM method for the automatic generation of hypotheses. Tests are described and the performance quality of the system that was built to carry out the described technique is assessed. With an optimal selection of options, the suggested approach shows better accuracy than other techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

Text categorization based on a new classification by thresholds

Article 03 June 2021

Assessing Intelligence Text Classification Techniques

Categorization of text documents taking into account some structural features

Article 01 January 2016

References

Sebastiani, F., Text categorization, in Text Mining and Its Applications, Zanasi, A., Ed., Southampton: Wit Press, 2005, pp. 109–129.
Google Scholar
TextAnalyst. http://www.megaputer.com/site/textanalyst.php. Cited March 12, 2015.
Irosoft. Automatic document classification module docutheque enterprise. http://www.irosoft.com/en/communiques-presse/irosoft-adds-automatic-documentclassification-module-docutheque-entreprise. Cited April 10, 2015.
Automatic Document Classification with Artsyl’s docAlpha. http://www.artsyltech.com/da_classification.html. Cited April 26, 2015.
Yang, Y., An evaluation of statistical approaches to text categorization, Inf. Retr., 1999, vol. 1, nos. 1–2, pp. 69–90.
Article Google Scholar
Joachims, T., Text categorization with suport vector machines: Learning with many relevant features, Proceedings of European Conference on Machine Learning, 1998, pp. 137–142.
Google Scholar
McCallum, K.N., A comparison of event models for naive Bayes text classication, AAAI-98 Workshop on Learning for Text Categorization, 1998.
Google Scholar
Schapire, R.E. and Singer, Y., Boostexter: A boostingbased system for text categorization, Mach. Learn., 2000, no. 39, pp. 135–168.
Article MATH Google Scholar
Bai, J. and Nie, J.-Y., Using language models text classification, Proceedings of Asia Information Retrieval Symposium, Bei**g, 2004.
Google Scholar
Mill, J.S., A System of Logic, Ratiocinative and Inductive, NY.: Harper & Brothers, 1882.
Google Scholar
Finn, V.K., Databases with incomplete information and a new method for automatic generation of hypotheses, in Dialogovye i faktograficheskie sistemy informatsionnogo obespecheniya (Dialogue and Factual Information Support System), Moscow, 1981.
Google Scholar
Mill, J.S., A System of Logic, Ratiocinative and Inductive, Cambridge University Press, 2011.
Book Google Scholar
Finn, V.K., About the computer-oriented formalization of plausible reasoning in the style of Francis Bacon–J.S. Mill, Semiotika Inf., 1983, vol. 20, pp. 35–101.
MathSciNet MATH Google Scholar
Rosser, J.B. and Turquette, A.R., Many-Valued Logics, Amsterdam: North-Holland, 1951.
Google Scholar
Kuznetsov, S.O., JSM method in the language of Galois, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist., 2006, no. 12, pp. 1–7.
Google Scholar
Ganter, B. and Wille, R., Formal Concept Analysis: Mathematical Foundations, Berlin: Springer-Verlag, 1999.
Book MATH Google Scholar
Finn, V.K., Epistemological foundations of the JSM method for automatic hypothesis generation, Autom. Doc. Math. Linguist., 2014, vol. 48, no. 2, pp. 96–148.
Article MathSciNet Google Scholar
Finn, V.K., On the definition of empirical regularities by the JSM method for the automatic generation of hypotheses, Sci. Tech. Inf. Process., 2012, vol. 39, no. 5, pp. 261–267.
Article MathSciNet Google Scholar
Finn, V.K., J.S. Mill’s inductive methods in artificial intelligence systems, Sci. Tech. Inf. Process., Part I, 2011, vol. 38, no. 6, pp. 385–402; Part II, 2012, vol. 39, pp. 241–260.
Article Google Scholar
Volkova, A.Yu., Algorithmization of procedures of the JSM method for automatic hypothesis generation, Autom. Doc. Math. Linguist., 2011, vol. 45, no. 3, pp. 113–120.
Article Google Scholar
Anshakov, O.M., The JSM method: A set-theoretical explanation, Autom. Doc. Math. Linguist., 2012, vol. 46, no. 5, pp. 202–220.
Article Google Scholar
Grigor’ev, P.A., A method for automatic generation of hypotheses that is similar to JSM-method: the use of statistical considerations, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist., 1996, nos. 5–6, pp. 52–55.
Google Scholar
Grigor’ev, P.A., Sword-systems or JSM-systems for chains using statistical considerations, Nauchn.Tekhn. Inform., Ser. 2. Protsessy Sist., 1996, nos. 5–6, pp. 45–51.
Google Scholar
Anshakov, O.M., Generalized quantifiers are defined using templates. Part I, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist., 2000, no. 11, pp. 5–17.
Google Scholar
Anshakov, O.M., Generalized quantifiers are defined using templates. Part II, Nauchn.-Tekhn. Inform., Ser. 2. Protsessy Sist., 2001, no. 5, pp. 35–48.
Google Scholar
Gaek, P. and Gavranek, T., Avtomaticheskoe obrazovanie gipotez: Matematicheskie osnovy obshchei teorii (Automatic Hypothesis Formation: Mathematical Foundations of General Theory), Moscow: Nauka, 1984.
Google Scholar
Porter, M.F., Snowball: A Language for Stemming Algorithms, 2001.
Google Scholar
Segalovich, I., A Fast Morphological Algorithm with Unknown Word Guessing Induced by a Dictionary for a Web Search Engine, MLMTA, 2003.
Google Scholar
Korobov, M., Morphological analyzer and generator for Russian and Ukrainian languages, Analysis of Images, Social Networks and Texts: 4th International Conference (AIST 2015), Yekaterinburg, 2015.
Google Scholar
Automatic Text Processing. http://www.aot.ru. Cited February 6, 2015.
Salton, G., Allan, J., and Buckley, C., Automatic structuring and retrieval of large text files, Commun. ACM, 1994, vol. 37, no. 2.
Article Google Scholar
Cavnar, W.B. and Trenkle, J.M., N-Gram-based text categorization, Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, 1994, pp. 161–175.
Google Scholar
Dunning, T., Statistical Identification of Languages, Comp. Res. Lab. Technical Report, MCCS, 1994, pp. 94–273.
Google Scholar
Salton, G., Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer, Boston: Addison-Wesley Longman Publishing, 1989.
Google Scholar
Yang, Y. and Pedersen, J.O., A comparative study on feature selection in text categorization, Proc. of ICML-97, 1997, pp. 412–420.
Google Scholar
Ahonen-Myka, H., Finding all maximal frequent sequences in text, Proceedings of the 16th International Conference of Machine Learning, ICML-99 Workshop on Machine Learning in Text Data Analisys, 1999, pp. 11–17.
Google Scholar
Menon R.K. and Choi, Y., Domain independent authorship attribution without domain adaptation, Proceedings of Recent Advances in Natural Language Processing, Hissar, 2011, pp. 309–315.
Google Scholar
Raghavan, S. and Kovashka, R., Mooney authorship attribution using probabilistic context-free grammars, Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics (ACL-2010), 2010, pp. 38–42.
Google Scholar
Ageev, M.S. and Kuralenok, I.E., Official metrics of ROMIP-2004, Rossiiskii seminar po Otsenke Metodov Informatsionnogo Poiska (ROMIP 2004) (Russian Seminar on Evaluation of Information Retrieval Methods(ROMIP 2004)), Pushchino, 2004.
Google Scholar

Download references

Author information

Authors and Affiliations

Russian State Institute for the Humanities, pl. Miusskaya 6, GSP-3, Moscow, 125993, Russia
N. D. Lyfenko

Authors

N. D. Lyfenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to N. D. Lyfenko.

Additional information

Original Russian Text © N.D. Lyfenko, 2015, published in Nauchno-Tekhnicheskaya Informatsiya, Seriya 2, 2015, No. 11, pp. 12–23.

About this article

Cite this article

Lyfenko, N.D. An approach to text data categorization based on the ideas of J.S. Mill. Autom. Doc. Math. Linguist. 49, 202–212 (2015). https://doi.org/10.3103/S0005105515060035

Download citation

Received: 12 August 2015
Published: 05 February 2016
Issue Date: November 2015
DOI: https://doi.org/10.3103/S0005105515060035

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

EUR 32.99 /Month

Get 10 units per month
Download Article/Chapter or Ebook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price includes VAT (Germany)

Instant access to the full article PDF.

Institutional subscriptions

An approach to text data categorization based on the ideas of J.S. Mill

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text categorization based on a new classification by thresholds

Assessing Intelligence Text Classification Techniques

Categorization of text documents taking into account some structural features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

An approach to text data categorization based on the ideas of J.S. Mill

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Text categorization based on a new classification by thresholds

Assessing Intelligence Text Classification Techniques

Categorization of text documents taking into account some structural features

References

Author information

Authors and Affiliations

Corresponding author

Additional information

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation