Date of Award

Spring 2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Computer Science

First Advisor

Radev, Dragomir

Abstract

Natural language processing aims to build automated systems that can both understand and generate natural language textual data. As the amount of textual data available online has increased exponentially, so has the need for intelligence systems to comprehend and present it to the world. As a result, automatic text summarization, the process by which a text's salient content is automatically distilled into a concise form, has become a necessary tool. Automatic text summarization approaches and applications vary based on the input summarized, which may constitute single or multiple documents of different genres. Furthermore, the desired output style may consist of a sentence or sub-sentential units chosen directly from the input in extractive summarization or a fusion and paraphrase of the input document in abstractive summarization. Despite differences in the above use-cases, specific themes, such as the role of large-scale data for training these models, the application of summarization models in real-world scenarios, and the need for adequately evaluating and comparing summaries, are common across these settings. This dissertation presents novel data and modeling techniques for deep neural network-based summarization models trained across high-resource (thousands of supervised training examples) and low-resource (zero to hundreds of supervised training examples) data settings and a comprehensive evaluation of the model and metric progress in the field. We examine both Recurrent Neural Network (RNN)-based and Transformer-based models to extract and generate summaries from the input. To facilitate the training of large-scale networks, we introduce datasets applicable for multi-document summarization (MDS) for pedagogical applications and for news summarization. While the high-resource settings allow models to advance state-of-the-art performance, the failure of such models to adapt to settings outside of that in which it was initially trained requires smarter use of labeled data and motivates work in low-resource summarization. To this end, we propose unsupervised learning techniques for both extractive summarization in question answering, abstractive summarization on distantly-supervised data for summarization of community question answering forums, and abstractive zero and few-shot summarization across several domains. To measure the progress made along these axes, we revisit the evaluation of current summarization models. In particular, this dissertation addresses the following research objectives: 1) High-resource Summarization. We introduce datasets for multi-document summarization, focusing on pedagogical applications for NLP, news summarization, and Wikipedia topic summarization. Large-scale datasets allow models to achieve state-of-the-art performance on these tasks compared to prior modeling techniques, and we introduce a novel model to reduce redundancy. However, we also examine how models trained on these large-scale datasets fare when applied to new settings, showing the need for more generalizable models. 2) Low-resource Summarization. While high-resource summarization improves model performance, for practical applications, data-efficient models are necessary. We propose a pipeline for creating synthetic training data for training extractive question-answering models, a form of query-based extractive summarization with short-phrase summaries. In other work, we propose an automatic pipeline for training a multi-document summarizer in answer summarization on community question-answering forums without labeled data. Finally, we push the boundaries of abstractive summarization model performance when little or no training data is available across several domains. 3) Automatic Summarization Evaluation. To understand the extent of progress made across recent modeling techniques and better understand the current evaluation protocols, we examine the current metrics used to compare summarization output quality across 12 metrics across 23 deep neural network models and propose better-motivated summarization evaluation guidelines as well as point to open problems in summarization evaluation.

COinS