Friday, September 18, 2015

Today we discuss another Olympiad problem.
[IMO Shortlist 2009]
Five identical empty buckets of 2-liter capacity stand at the vertices of a regular pentagon. Cinderella and her wicked Stepmother go through a sequence of rounds: At the beginning of every round the Stepmother takes one liter of water from the nearby river and distributes it arbitrarily over the five buckets. Then Cinderella chooses a pair of neighboring buckets, empties them into the river, and puts them back. Then the next round begins. The Stepmother’s goal is to make one of these buckets overflow. Cinderella’s goal is to prevent this. Can the wicked Stepmother enforce a bucket overflow?

Let us take the volumes as V1, V2, V3, V4 and V5. Cinderella (C) can only empty adjacent buckets. If the alternate buckets are more than 1 liter, she will not be able to empty both. Therefore she prevents this condition. For each i = 1 to 5, C must ensure that Vi + Vi+2 is at most one on each of her turn. She maintains this 'good' state.

The good configuration enforces return to good state. For example, if two buckets are empty, V4=V5=0, Then V1 + V3 <= 1 and V2 <= 1 because V2 + V4 <= 1 After the step mothers turn,  V1  + V3 + V4 + V5 <= 2. Therefore either V5 V3 <=1 or V4 + V1 <=1  So C empties both V1 and V2 and the new configuration is still good V4 <=1 and V5 + V3 <=1

Since at the end of each turn for C, we have a good configuration, C has a winning strategy.

####################  Paper Review #####################

Today we discuss "Ontology based text summarization - The case of Texminer".
This paper talks about Texminer which does summarization by a combination of summarization techniques.Before delving into the techniques, it might be a good reminder to go over the summarization techniques.

One technique is to use the structure of the discourse to generate abstracts. The Rhetorical Structure Theory, for example, attempts to identify the internal structure of the text and the relations of the discourse formed within it, giving priority to the nuclear components of these relations.On the other hand, Marcu segments the text into small units of discourse. He then proceeds to build a rhetorical structure in the form of a tree by analyzing the set of relations that exist  between the units. Once the discourse structure has been created, he assigns weight and an order to each element of the structure. - the higher the element within the structure the greater its weight.

Different ways of using the text discourse structure has been shown. Some have used the rhetorical status of affirmations contained in documents to identify their internal structure. The main contribution of these authors lies in the algorithm that deals with the non-hierarchical structure : given seven fixed catagories (aim, textual, own, background, contrast, basis and other) it is capable of distributing the contents of the articles within each category.

Another way has been to use templates to generate summaries and retrieve information.  This technique can only be applied when the text is previously structured. Information retrieval systems include Fies that extracts financial information from digital articles.

Mateo combined superficial with deep structure analysis to enhance the coherence and cohesion of the abstract. Alonso and Fuentes for example combine lexical chains plus the rhetorical and argumentative structure, derived using discourse markers.

Using combination of complex linguistic techniques, Aretoulaki developed a model for automatic summarization that selects sentences using content features of a pragmatic and rhetorical nature, obtained by means of superficial linguistic analysis such as the Theory of Speech Acts, the Rhetorical structure Theory and the theories centered on cohesion and coherence.

Automatic production of abstracts was initially based on statistical methods. in the wake of research by Luhn and now uses diverse methodologies. With the extraction of terms or strings of significant words, systems build discourse models. These systems have been used to identify rhetorical structures highly related with the content of the documents and their organizational scheme.

The growth of cognitive science has allowed the incorporation of semantic-conceptual models.Together with the use of knowledge bases, these models vastly improve the process of summarizing texts on a specific topics.

The demand for domain knowledge has seen exceptional growth recently.  Models for extraction and disambiguation of text has changed recently. It may even have become harder to find tools to help with such narrow domain texts. The lack of specific dictionaries, the absence of a defined theory and the dire need for professionals in the sector to have summarization tools to carry out their work has become pervasive. Documentation, Terminology and Natural Language Processing (NLP) indicate that there is a demand here in this narrow-domain texts.

This is where TexMiner may help solve the problem. It is based on the conviction that summarizing texts within a specialized domain requires a model capable of processing its semantic and socio-cognitive components.

The socio-cognitive user paradigm aspect of TexMiner takes into account the historic, social and cultural factors. These techniques consider a domain as the mindshare in terms of concepts, terms and knowledge from a community.
It uses ontologies as algebraic descriptions of a conceptual network in which the elements are binary relations established through concepts.
Lastly it uses three agents for processing information.
The first is a reading agent that reads the documents and labels them in XML. Here the goal is to discern if a document belongs to a particular domain based on the count of words that fall in the domain.
The second is a summarization agent that carries out automatic tagging of the syntactic discourse structure of the text. The design of the tags is governed by
- the sequence in which the levels are tagged in practice
- the symbols used to denote cohesive elements of the text, and
- the presence in the text of geographic elements, verbs statistical data formulae and processes.
The last agent is the Information retrieval agent. Texminer allows for the searches to be made through the ontology, as well as the lexical databases developed to take advantage of the functionalities of summaries for the purpose of text retrieval.

No comments:

Post a Comment