Understanding Points of Correspondence between Sentences for Abstractive Summarization

Abstract

Fusing sentences containing disparate content is a remarkable human ability that helps create informative and succinct summaries. Such a simple task for humans has remained challenging for modern abstractive summarizers, substantially restricting their applicability in real-world scenarios. In this paper, we present an investigation into fusing sentences drawn from a document by introducing the notion of points of correspondence, which are cohesive devices that tie any two sentences together into a coherent text. The types of points of correspondence are delineated by text cohesion theory, covering pronominal and nominal referencing, repetition and beyond. We create a dataset containing the documents, source and fusion sentences, and human annotations of points of correspondence between sentences. Our dataset bridges the gap between coreference resolution and summarization. It will be shared publicly to serve as a basis for future work to measure the success of sentence fusion systems.

Publication
Association for Computational Linguistics: Student Research Workshop
John Muchovej
John Muchovej
Researcher & Data Scientist

My interests are in using tools from computer science to advance our understanding of cognition and development. I focus this interest in the domains of social cognition, common sense, and linguistics.