![Loading...](https://link.springer.com/static/c4a417b97a76cc2980e3c25e2271af3129e08bbe/images/pdf-preview/spacer.gif)
-
Reference Work Entry At a glance
Correction to: Language, Script, and Font Recognition
Owing to an unfortunate oversight the second author Niladri Sekhar Dash was missing in the initially published html version of this chapter. He has now been added.
-
Reference Work Entry In depth
Language, Script, and Font Recognition
Automatic identification of a language within a text document containing multiple scripts and fonts is a challenging task, as it is not only linked with the shape, size, and style of the characters and symbols...
-
Chapter and Conference Paper
A System for Recognition of Named Entities in Odia Text Corpus Using Machine Learning Algorithm
This paper presents a novel approach to recognize named entities in Odia corpus. The development of a NER system for Odia using Support Vector Machine is a challenging task in intelligent computing. NER aims a...
-
Chapter and Conference Paper
Development of Odia Language Corpus from Modern News Paper Texts: Some Problems and Issues
In this paper, we have tried to describe the details about the strategies and methods we have adapted to design and develop a digital Odia corpus of newspaper texts. We have also attempted to identify the scop...
-
Book
-
Chapter
Language-specific Synsets and Challenges in Synset Linkage in Urdu WordNet
The Urdu WordNet is being developed following the process used to develop the Hindi WordNet by using the Expansion Approach. This paper, in the first part, presents some of our experiences that we gathered in ...
-
Chapter
Problems in Translating Hindi Synsets into the Bangla WordNet
In this chapter, I have made an attempt to look into the problems and challenges I have faced in develo** the Bangla synsets that will stand as conceptual equivalents for the Hindi synsets used in the IndoWo...
-
Chapter
Defining Language-Specific Synsets in IndoWordNet: Some Theoretical and Practical Issues
A WordNet is a digital network of semantically linked words, which are organized around the notion of synsets of a language. A synset is a set of synonyms with same part-of-speech (mostly), which are potential...
-
Chapter and Conference Paper
Application of TF-IDF Feature for Categorizing Documents of Online Bangla Web Text Corpus
This paper explores the use of standard features as well as machine learning approaches for categorizing Bangla text documents of online Web corpus. The TF-IDF feature with dimensionality reduction technique (...
-
Chapter and Conference Paper
Categorization of Bangla Web Text Documents Based on TF-IDF-ICF Text Analysis Scheme
With the rapid growth and huge availability of digital text data, automatic text categorization or classification is a comparatively more effective solution in organizing and managing textual information. It i...
-
Chapter
Features of a Corpus
Defining the characteristic features of a corpus, in general, has been an issue of great debate for decades. Due to diversities involved in the types of text used for corpus generation, identification of featu...
-
Chapter
Pre-digital Corpora (Part 2)
Following the footsteps of the previous chapter (Chap. 9), in this chapter, we have presented a short description of the process of corpus generation and utilization in ...
-
Chapter
Nature of Data
It is always difficult to define the nature of language data since language texts often possess multiple properties, due to which the nature of a particular text may overlap with that of another. However, sinc...
-
Chapter
Digital Text Corpora (Part 2)
The generation of text corpora is not confined to a few widely privileged languages such as English, French, German or Spanish. Many lesser-known and under-privileged languages are also emerging with corpora o...
-
Chapter
Nature of Text Application
In this chapter, we have sketched out how language corpora can be classified based on the nature of the application of texts at various domains of linguistics and language technology. We have argued that a ‘pa...
-
Chapter
Utilization of Language Corpora
Even after nearly 70 years, the staunch supporters of the generative genre still like to argue that linguistics is a branch of intuition and introspection where corpora, as a showcase of empirical language dat...
-
Chapter
Web Text Corpus
The World Wide Web is viewed as a useful linguistic resource since it is a unique linguistic world that is full of surprising linguistic data and information. It is the largest store of texts in existence that...
-
Chapter
Definition of ‘Corpus’
Understanding the concept of ‘corpus’ has been one of the challenging issues in corpus linguistics in recent times. Language users are often confused with the concept, and as a result of this, they sometimes c...
-
Chapter
Genre of Text
Classification of corpus based on genre is a difficult theoretical exercise which is carried out in this chapter. In this chapter, we have first justified why it is necessary to classify corpora based on certa...
-
Chapter
Digital Text Corpora (Part 1)
The history of digital text corpus generation and usage presents an interesting narrative. It shows how technology has brought about a resurgence in the discipline of linguistics, which was otherwise turning i...