Skip to main content

previous disabled Page of 3
and
  1. Reference Work Entry At a glance

    Correction to: Language, Script, and Font Recognition

    Owing to an unfortunate oversight the second author Niladri Sekhar Dash was missing in the initially published html version of this chapter. He has now been added.

    Umapada Pal, Niladri Sekhar Dash in Handbook of Document Image Processing and Recognition (2014)

  2. No Access

    Reference Work Entry In depth

    Language, Script, and Font Recognition

    Automatic identification of a language within a text document containing multiple scripts and fonts is a challenging task, as it is not only linked with the shape, size, and style of the characters and symbols...

    Umapada Pal, Niladri Sekhar Dash in Handbook of Document Image Processing and Recognition (2014)

  3. No Access

    Chapter and Conference Paper

    A System for Recognition of Named Entities in Odia Text Corpus Using Machine Learning Algorithm

    This paper presents a novel approach to recognize named entities in Odia corpus. The development of a NER system for Odia using Support Vector Machine is a challenging task in intelligent computing. NER aims a...

    Bishwa Ranjan Das, Srikanta Patnaik in Computational Intelligence in Data Mining … (2015)

  4. No Access

    Chapter and Conference Paper

    Development of Odia Language Corpus from Modern News Paper Texts: Some Problems and Issues

    In this paper, we have tried to describe the details about the strategies and methods we have adapted to design and develop a digital Odia corpus of newspaper texts. We have also attempted to identify the scop...

    Bishwa Ranjan Das, Srikanta Patnaik in Intelligent Computing, Communication and D… (2015)

  5. No Access

    Book

  6. No Access

    Chapter

    Language-specific Synsets and Challenges in Synset Linkage in Urdu WordNet

    The Urdu WordNet is being developed following the process used to develop the Hindi WordNet by using the Expansion Approach. This paper, in the first part, presents some of our experiences that we gathered in ...

    Rizwanur Rahman, Mazhar Mehdi Hussain in The WordNet in Indian Languages (2017)

  7. No Access

    Chapter

    Problems in Translating Hindi Synsets into the Bangla WordNet

    In this chapter, I have made an attempt to look into the problems and challenges I have faced in develo** the Bangla synsets that will stand as conceptual equivalents for the Hindi synsets used in the IndoWo...

    Niladri Sekhar Dash in The WordNet in Indian Languages (2017)

  8. No Access

    Chapter

    Defining Language-Specific Synsets in IndoWordNet: Some Theoretical and Practical Issues

    A WordNet is a digital network of semantically linked words, which are organized around the notion of synsets of a language. A synset is a set of synonyms with same part-of-speech (mostly), which are potential...

    Niladri Sekhar Dash in The WordNet in Indian Languages (2017)

  9. No Access

    Chapter and Conference Paper

    Application of TF-IDF Feature for Categorizing Documents of Online Bangla Web Text Corpus

    This paper explores the use of standard features as well as machine learning approaches for categorizing Bangla text documents of online Web corpus. The TF-IDF feature with dimensionality reduction technique (...

    Ankita Dhar, Niladri Sekhar Dash, Kaushik Roy in Intelligent Engineering Informatics (2018)

  10. No Access

    Chapter and Conference Paper

    Categorization of Bangla Web Text Documents Based on TF-IDF-ICF Text Analysis Scheme

    With the rapid growth and huge availability of digital text data, automatic text categorization or classification is a comparatively more effective solution in organizing and managing textual information. It i...

    Ankita Dhar, Niladri Sekhar Dash, Kaushik Roy in Social Transformation – Digital Way (2018)

  11. No Access

    Chapter

    Features of a Corpus

    Defining the characteristic features of a corpus, in general, has been an issue of great debate for decades. Due to diversities involved in the types of text used for corpus generation, identification of featu...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  12. No Access

    Chapter

    Pre-digital Corpora (Part 2)

    Following the footsteps of the previous chapter (Chap. 9), in this chapter, we have presented a short description of the process of corpus generation and utilization in ...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  13. No Access

    Chapter

    Nature of Data

    It is always difficult to define the nature of language data since language texts often possess multiple properties, due to which the nature of a particular text may overlap with that of another. However, sinc...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  14. No Access

    Chapter

    Digital Text Corpora (Part 2)

    The generation of text corpora is not confined to a few widely privileged languages such as English, French, German or Spanish. Many lesser-known and under-privileged languages are also emerging with corpora o...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  15. No Access

    Chapter

    Nature of Text Application

    In this chapter, we have sketched out how language corpora can be classified based on the nature of the application of texts at various domains of linguistics and language technology. We have argued that a ‘pa...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  16. No Access

    Chapter

    Utilization of Language Corpora

    Even after nearly 70 years, the staunch supporters of the generative genre still like to argue that linguistics is a branch of intuition and introspection where corpora, as a showcase of empirical language dat...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  17. No Access

    Chapter

    Web Text Corpus

    The World Wide Web is viewed as a useful linguistic resource since it is a unique linguistic world that is full of surprising linguistic data and information. It is the largest store of texts in existence that...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  18. No Access

    Chapter

    Definition of ‘Corpus’

    Understanding the concept of ‘corpus’ has been one of the challenging issues in corpus linguistics in recent times. Language users are often confused with the concept, and as a result of this, they sometimes c...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  19. No Access

    Chapter

    Genre of Text

    Classification of corpus based on genre is a difficult theoretical exercise which is carried out in this chapter. In this chapter, we have first justified why it is necessary to classify corpora based on certa...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

  20. No Access

    Chapter

    Digital Text Corpora (Part 1)

    The history of digital text corpus generation and usage presents an interesting narrative. It shows how technology has brought about a resurgence in the discipline of linguistics, which was otherwise turning i...

    Niladri Sekhar Dash, S. Arulmozi in History, Features, and Typology of Language Corpora (2018)

previous disabled Page of 3