Abstract
Social media such as twitter, Facebook are the sources for Stream data. They generate unstructured formal text on various topics containing, emotions expressed on persons, organizations, locations, movies etc. Characteristics of such stream data are velocity, volume, incomplete, often incorrect, cryptic and noisy. Hadoop framework is proposed in our earlier work for recognising and resolving entities within semi structured data such as e-catalogs. This paper extends the framework for recognising and resolving entities from unstructured data such as tweets. Such a system can be used in data integration, de-duplication, detecting events, sentiment analysis. The proposed framework will recognize pre-defined entities from streams using Natural Language Processing (NLP) for extracting local context features and uses Map Reduce for entity resolution. Test results proved that the proposed entity recognition system could identify predefined entities such as location, organization and person entities with an accuracy of 72%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, C., Sun, A., Weng, J., He, Q.: Tweet segmentation and its application to named entity recognition. IEEE Trans. Knowl. Data Eng. 558–570 (2015)
Zirikly, A., Diab, M.: Named entity recognition for arabic social media. In: Proceedings of NAACL-HLT 2015, pp. 176–185 (2015)
Kaur, A., Josan, G.S.: Evaluation of Punjabi named entity recognition using context word feature. In: IJCA, vol. 96, no 20, pp. 32–38 (2014)
Dlugolinsky, S., Krammer, P., Ciglan, M.: Combining named entity recognition methods for concept extraction in microposts. Microposts 1–41 (2014)
Patil, N., Patil, A.S., Pawar, B.V.: Survey of named entity recognition systems with respect to Indian and foreign languages. IJCA, vol. 134, no. 16, pp. 21–26 (2016)
Bonadiman, D., Severyn, A., Moschitti, A.: Deep neural networks for named entity recognition in Italian. In: QCRI (2016)
ERIC: Named-entity recognition using deep learning. http://eric-yuan.me/ner_1/. Accessed Apr 2015
Jurafsky, D., Martin, J.H.: Speech and Language Processing, Chapter 9 (2015)
Wachsmuth, H.: Text analysis pipelines: towards ad-hoc large-scale text mining, p. 139 (2015)
Prabhakar Benny, S., Vasavi, S., Anupriya, P.: International Conference on Computational Modeling and Security (CMS 2016). Elsevier Procedia Computer Science (2016)
Neubig, G.: NLP programming tutorial 5—part of speech tagging with hidden Markov models. http://www.phontron.com/slides/nlp-programming-en-04-hmm.pdf. Accessed Apr 2016
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 1(12), 2493–2537 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Vasavi, S., Prabhakar Benny, S. (2018). Hadoop Framework for Entity Recognition Within High Velocity Streams Using Deep Learning. In: Satapathy, S., Bhateja, V., Raju, K., Janakiramaiah, B. (eds) Data Engineering and Intelligent Computing. Advances in Intelligent Systems and Computing, vol 542 . Springer, Singapore. https://doi.org/10.1007/978-981-10-3223-3_23
Download citation
DOI: https://doi.org/10.1007/978-981-10-3223-3_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-3222-6
Online ISBN: 978-981-10-3223-3
eBook Packages: EngineeringEngineering (R0)