Abstract
The goal of this paper is to generate an effective summary for a given document with specific realtime requirements. We use the softplus function to enhance keyword rankings to favor important sentences, based on which we present a number of extractive summarization algorithms using various keyword extraction and topic clustering methods. We show that our algorithms not only meet the realtime requirements but also yield the best ROUGE scores on DUC-02 over all previously-known algorithms. We also evaluate our summarization methods over the SummBank dataset and other datasets to ensure that our methods are robust. Experiments show that summaries generated by our methods achieve higher or about the same ROUGE scores than extractive summaries generated by human evaluators. Moreover, we define a semantic measure based on word-embedding using Word Mover’s Distance to evaluate the quality of summaries without human-generated benchmarks. We show that for our algorithms, the orderings of the ROUGE scores and the scores under the new measure are highly comparable, suggesting that this new measure may serve as a viable alternative for measuring the quality of a summary.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aslam, J.A., Frost, M.: An information-theoretic measure for document similarity. In: Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2003, pp. 449–450. ACM, New York (2003)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)
Boutsioukis, G.: Natural language toolkit: texttiling (2016). http://www.nltk.org/_modules/-nltk/tokenize/texttiling.html
Brin, S., Page, L.: The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 30, 107–117 (1998)
Cheng, J., Lapata, M.: Neural summarization by extracting sentences and words. CoRR abs/1603.07252 (2016). http://dblp.uni-trier.de/db/journals/corr/corr1603.html#ChengL16a
Corney, D., Albakour, D., Martinez, M., Moussa, S.: What do a million news articles look like? In: Proceedings of the First International Workshop on Recent Trends in News Information Retrieval Co-located with 38th European Conference on Information Retrieval (ECIR 2016), Padua, Italy, 20 March 2016, pp. 42–47 (2016). http://ceur-ws.org/Vol-1568/paper8.pdf
Dasgupta, A., Kumar, R., Ravi, S.: Summarization through submodularity and dispersion. In: ACL, vol. 1, pp. 1014–1022. The Association for Computer Linguistics (2013). http://dblp.uni-trier.de/db/conf/acl/acl2013-1.html#DasguptaKR13
DUC: Document understanding conference 2002 (2002). http://www-nlpir.nist.gov/projects/duc/guidelines/2002.html
Foundation, W.: Wikimedia downloads (2017). https://dumps.wikimedia.org/enwiki/latest/enwiki-latest-pages-articles.xml.bz2
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS, vol. 15, p. 275 (2011)
Hearst, M.A.: TextTiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)
Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM (JACM) 46(5), 604–632 (1999)
Kusner, M.J., Sun, Y., Kolkin, N.I., Weinberger, K.Q.: From word embeddings to document distances. In: Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), pp. 957–966 (2015)
Lin, H., Bilmes, J.A.: A class of submodular functions for document summarization. In: Lin, D., Matsumoto, Y., Mihalcea, R. (eds.) ACL, pp. 510–520. The Association for Computer Linguistics (2011). http://dblp.uni-trier.de/db/conf/acl/acl2011.html#LinB11
Louis, A., Nenkova, A.: Automatically evaluating content selection in summarization without human models. In: Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 1, vol. 1, pp. 306–314. EMNLP 2009. Association for Computational Linguistics, Stroudsburg (2009)
Matsuo, Y., Ishizuka, M.: Keyword extraction from a single document using word co-occurrence statistical information. In: Proceedings of the Sixteenth International Florida Artificial Intelligence Research Society Conference, pp. 392–396. AAAI Press (2003)
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: Proceedings of EMNLP-04 and the 2004 Conference on Empirical Methods in Natural Language Processing, July 2004
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. ar**v preprint ar**v:1301.3781 (2013)
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
MIT: TextRank implementation in python (2014). https://github.com/summanlp/textrank
MIT: A python implementation of the rapid automatic keyword extraction (2015). https://github.com/aneesha/RAKE
Nallapati, R., Zhou, B., dos Santos, C.N., Gülçehre, Ç., **ang, B.: Abstractive text summarization using sequence-to-sequence RNNs and beyond. In: CoNLL, pp. 280–290. ACL (2016)
Parveen, D., Mesgar, M., Strube, M.: Generating coherent summaries of scientific articles using coherence patterns. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 772–783 (2016)
Parveen, D., Ramsl, H.M., Strube, M.: Topical coherence for graph-based extractive summarization. In: Márquez, L., Callison-Burch, C., Su, J., Pighin, D., Marton, Y. (eds.) EMNLP, pp. 1949–1954. The Association for Computational Linguistics (2015)
Parveen, D., Strube, M.: Integrating importance, non-redundancy and coherence in graph-based extractive summarization. In: Yang, Q., Wooldridge, M. (eds.) IJCAI, pp. 1298–1304. AAAI Press (2015). http://dblp.uni-trier.de/db/conf/ijcai/ijcai2015.html#Parveen015
Radev, D.R., et al.: Mead-a platform for multidocument multilingual text summarization. In: LREC (2004)
Radev, D., et al.: Summbank 1.0 LDC2003t16 (2003). https://catalog.ldc.upenn.edu/LDC2003T16
Rehurek, R.: Gensim 2.0.0 (2017). https://pypi.python.org/pypi/gensim
Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. In: Berry, M.W., Kogan, J. (eds.) Text Mining. Applications and Theory, pp. 1–20. Wiley (2010). https://doi.org/10.1002/9780470689646.ch1
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. CoRR abs/1509.00685 (2015). http://dblp.uni-trier.de/db/journals/corr/corr1509.html#RushCW15
Salton, G., Buckley, C.: Term weighting approaches in automatic text retrieval. Cornell University, Ithaca, NY, USA, Technical report (1987)
Shao, L., Wang, J.: DTATG: an automatic title generator based on dependency trees. In: Fred, A.L.N., Dietz, J.L.G., Aveiro, D., Liu, K., Bernardino, J., Filipe, J. (eds.) KDIR, pp. 166–173. SciTePress (2016). http://dblp.uni-trier.de/db/conf/ic3k/kdir2016.html#ShaoW16
Shao, L., Zhang, H., Jia, M., Wang, J.: Efficient and effective single-document summarizations and a word-embedding measurement of quality. In: Proceedings of the 9th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management - (Volume 1), Funchal, Madeira, Portugal, 1–3 November 2017. pp. 114–122 (2017)
Wan, X.: Towards a unified approach to simultaneous single-document and multi-document summarizations. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1137–1145. Association for Computational Linguistics (2010)
Wan, X., **ao, J.: Exploiting neighborhood knowledge for single document summarization and keyphrase extraction. ACM Trans. Inf. Syst. 28(2) (2010). http://dblp.uni-trier.de/db/journals/tois/tois28.html#WanX10
Woodsend, K., Lapata, M.: Automatic generation of story highlights. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 565–574. Association for Computational Linguistics (2010)
Acknowledgements
We thank Ming Jia, **gwen Wang, Cheng Zhang, Wen**g Yang, and the other members of the Text Automation Lab at UMass Lowell for their support and fruitful discussions. We are grateful to Prof. Hong Yu for making the SummBank dataset available for this study.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Shao, L., Zhang, H., Wang, J. (2019). Robust Single-Document Summarizations and a Semantic Measurement of Quality. In: Fred, A., et al. Knowledge Discovery, Knowledge Engineering and Knowledge Management. IC3K 2017. Communications in Computer and Information Science, vol 976. Springer, Cham. https://doi.org/10.1007/978-3-030-15640-4_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-15640-4_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-15639-8
Online ISBN: 978-3-030-15640-4
eBook Packages: Computer ScienceComputer Science (R0)