Tuesday, August 11, 2015

We briefly discuss SUMMONS which is the first example of a multi document summarization system and illustrates one of the stages of text summarization  - topic fusion. It tackles news events about a single topic and produces a briefing that merges relevant information about each event and how reports by different newsagencies have evolved over time. Rather than working with raw text SUMMONS reads a database previously built by a template-based message understanding system. Therefore there are two stages: one that takes fulltext as input and fills template slots and then synthesizes a summary and the other that takes summary opertors and a set of rules and with connective phrases smoothens out a summary. While SUMMONS works great with narrow domain texts, it does not scale to internet documents like the ones from a web search query. This was later improved with systems by McKeown 1999 and Barzilay 1999 that identifies the topics first by clustering and then applying decision rules. These systems start by identifying themes in the form of text units or paragraphs which is then clustered based on similarity measures. To compute thesimilarity measures between text units, these are mapped to vectors of features, that include single words weighted by their TF-IDF scores, noun phrases, proper nouns, synsets from the WordNet database and a database of semantic classes of verbs. For each pair of paragraphs, a vector is computed that represents matches ondifferent features. Decision rules that were learned from data are then used to classify each pair of text units either as similar ordissimilar; this in turn feeds a subsequent algorithm that places the most related paragraphs in the same theme. Once themes are identified SUMMONS enters into an information fusion stage where sentences about a theme that should be included in the summary are decided. But instead of picking these sentences, analgorithm is proposed which compares and intersects predicate argument structures of the phrases within each theme to determine which are repeated often to include in a summary. The comparisonalgorithm then traverses these dependency trees recursively adding identical nodes to output tree. Once the summary content is decided, a grammatical text is generated by translating those structures into arguments expected by the FUF/SURGE languagegeneration system. 
Courtesy: Survey on Automatic Text Summarization by Das and Martins 



No comments:

Post a Comment