Cluster computing

Friday, June 6, 2014

We read the paper : automatic extraction of reasoning chains from textual reports by Sizov and Ozturk. They extract reasoning chains from documents such as medical patient reports, accident reports. Reasoning chains contain information useful to analyze the problem at hand. They use a graph based approach that makes the relations between textual units explicit. Reasoning chains are a sequence of transitions from piece of information to another that connect the problem with the resolution and thus are very valuable in future problem triages. Moreover they determine the validity or the fallaciousness of the arguments presented.

This paper decomposes a document into text units and discovers the connections between the text units and makes them explicit. TRN is automatically acquired from text using a syntactic parser and a semantic similarity measure. We now discuss this TRN in detail. A reasoning chain is extracted as a path from the graph-based text representation. TRN is a graph with two types of nodes. text nodes and section nodes and three types of edges - structural, similarity and causal. This graph is contructed from the text in the following manner: 1) syntax trees obtained from a syntactic parser are added to the graph, 2) section nodes are attached to the sentence nodes, 3) similarity edges are added between similar text nodes and 4) cause relations identified by a discourse paper are added. When representing text units, graph based approaches have used terms or short phrases as text units that are too small for this approach. Another popular choice is to use a whole sentence or a few. However, such text units may contain several pieces of information where only one is relevant for the reasoning chain. In this paper, the authors use a Stanford parser to extract the syntax tree and pick out the S(sentence, clause), NP (noun phrase) and VP (verb phrase) which are referred to as text nodes. Text nodes are identified by a set of stemmed words. Along with these structural relations such as Contains and PartOf are also added as edges. Graphs extracted from different sentences in a document are combined into one. To obtain similarity relations, a similarity value is computed for each pair of text nodes of the same category(S,VP, NP) and Similar edges are added to the graph for node pairs where the similarity is above a threshold. Their similarity measure finds a one to one alignment of words from two text units The LCH for these words are computed using the shortest path between the corresponding senses in WordNet. Then a complete bipartite graph is constructed. Nodes in the bipartite graph represent words from the text units while the edges have weights that correspond to similarities between words. A maximum weighted bipartite matching finds a one-to-one alignment that maximizes the sum of similarities between aligned words. This sum is then normalized and thresholded to add corresponding Similar edges to the graph. Next Causal relations are added to the graph. Causal relations are found by a discourse parser. In this case, they used a PDTB-styled End to End discourse parser. Cause relations found by the parser are added to the TRN graph as Cause edges. This is how the TRN graph is built.
Once the TRN graph is constructed, the reasoning chain is generated based on a three stage process:
1) a report is converted from text into a TRN graph
2) given a start and an end node, several paths are extracted from the graphs and
3) paths are combined, post processed and visualized.

Cluster computing

Friday, June 6, 2014

No comments:

Post a Comment