Abstract
Deep learning has become a very popular method for text classification in recent years, due to its ability to improve the accuracy of previous state-of-the-art methods on several benchmarks. However, these improvements required hundreds of thousands to millions labeled training examples, which in many cases can be very time consuming and/or expensive to acquire. This problem is especially significant in domain specific text classification tasks where pretrained embeddings and models are not optimal. In order to cope with this problem, we propose a novel learning framework, Ensembled Transferred Embeddings (ETE), which relies on two key ideas: (1) Labeling a relatively small sample of the target dataset, in a semi-automatic process (2) Leveraging other datasets from related domains or related tasks that are large-scale and labeled, to extract “transferable embeddings” Evaluation of ETE on a large-scale real-world item categorization dataset provided to us by PayPal, shows that it significantly outperforms traditional as well as state-of-the-art item categorization methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use the term noisy to describe user generated text that typically contain grammatical errors, nonstandard spellings, abbreviations, etc., as previously done with tweets on Twitter, (Baldwin et al., 2015).
- 2.
The specific number of 1970 instances was chosen to fit our budget constraint of 200 USD.
- 3.
Our goal here was to demonstrate the advantages of the ETE framework on a large-scale real-world problem, rather than pursuing the best possible accuracy.
- 4.
The harmonic mean of precision and recall of each class weighted by the class proportion in the data.
References
Baldwin, T., de Marneffe, M.-C., Han, B., Kim, Y.-B., Ritter, A., & Xu, W. (2015). Shared tasks of the 2015 workshop on noisy user-generated text: Twitter lexical normalization and named entity recognition. In Proceedings of the workshop on noisy user-generated text (pp. 126–135).
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. ar**v preprint ar**v:1810.04805.
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., & Bengio, S. (2010). Why does unsupervised pre-training help deep learning? Journal of Machine Learning Research, 11 (Feb), 625–660.
Hadar, Y., & Shmueli, E. (2021a). Categorizing items with short and noisy descriptions using ensembled transferred embeddings. Expert Systems with Applications.
Hadar, Y., & Shmueli, E. (2021b). Source code for ensembled transferred enbeddings.https://github.com/h-yonatan/Ensembled-Transferred-Enbeddings. (Accessed: 2021-05-27)
Hedderich, M. A., Lange, L., Adel, H., Strötgen, J., & Klakow, D. (2020). A survey on recent approaches for natural language processing in low-resource scenarios. ar**v preprint ar**v:2010.12309.
Kiros, R., Zhu, Y., Salakhutdinov, R. R., Zemel, R., Urtasun, R., Torralba, A., et al. (2015). Skip-thought vectors. In Advances in neural information processing systems (pp. 3294–3302).
Kozareva, Z. (2015). Everyone likes shop**! multi-class product categorization for e-commerce. In Proceedings of the 2015 conference of the north American chapter of the association for computational linguistics: Human language technologies (pp. 1329–1333).
Krishnan, A., & Amarthaluri, A. (2019). Large scale product categorization using structured and unstructured attributes. ar**v preprint ar**v:1903.04254.
Li, M. Y., Kok, S., & Tan, L. (2018). Don’t classify, translate: Multi-level e-commerce product categorization via machine translation. ar**v preprint ar**v:1812.05774.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. ar**v preprint ar**v:1310.4546.
Ruder, S. (2019). Neural transfer learning for natural language processing (Unpublished doctoral dissertation). NUI Galway.
Sharif Razavian, A., Azizpour, H., Sullivan, J., & Carlsson, S. (2014). CNN features off-the-shelf: an astounding baseline for recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 806–813).
Werbin-Ofir, H., Dery, L., & Shmueli, E. (2019). Beyond majority: Label ranking ensembles based on voting rules. Expert Systems with Applications, 136, 50–61.
Wolpert, D. H. (1992). Stacked generalization. Neural networks, 5 (2), 241–259.
Acknowledgements
This research was funded by PayPal. We would like to thank our colleagues from PayPal: Yaeli, Adam, Omer, and Avihay who provided meaningful insights and greatly assisted in improving this work.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Hadar, Y., Shmueli, E. (2023). Ensembled Transferred Embeddings. In: Rokach, L., Maimon, O., Shmueli, E. (eds) Machine Learning for Data Science Handbook. Springer, Cham. https://doi.org/10.1007/978-3-031-24628-9_26
Download citation
DOI: https://doi.org/10.1007/978-3-031-24628-9_26
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24627-2
Online ISBN: 978-3-031-24628-9
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)