Monday, January 12, 2015

Today we will continue our discussion on Text Knowledge Mining. We discussed a variety of techniques and how they differ from or are related to TKM. Some advanced data mining techniques that can be considered TKM are related to literature-based discovery. The idea behind literature based discovery is this. Let A1, A2, A3, ... , An, B1, B2 .. C, be a set of concepts and suppose that we are interested in finding the relations between C and some of Ai. Suppose that no Ai and C appear simultaneously in the collection, but they can appear simultaneously with some of Bi. If Bi is large enough then a transitive relation between Ai and C can be hypothesized. This therefore can be used to create new hypothesis that can be been later confirmed by experimental studies.
In this technique, the identification of concepts to obtain intermediate forms of documents is a key point. The main problem with this point is that a lot of background knowledge about the text is necessary and it varies from one domain to another. In addition, the authors state that the relation between concepts is based in most of the cases on their simultaneous appearance in the same title or abstract. Also, the generation of hypothesis does not follow a  formal procedure. That said, some formal model for conjectures, consequences and hypothesis are available in the literature which then can be a tool for literature based discovery and TKM.
A text mining technique that can be considered TKM is one involving the discovery of contradictions in a collection of texts.  The starting point is a set of text documents T = t1, .. tn of arbitrary length. A first order logic representation in clausal normal form of the content of the text is employed as intermediate form  and a deductive inference procedure based on the resolution is performed. The intermediate form is obtained by a semi automatic procedure in two steps. First an interlingua representation is obtained from text. Then the first order clausal form is obtained by means of information extraction techniques even involving background knowledge.
The mining process here is similar to the level wise plus candidate generation exploration in frequent item sets discovery, but replacing the minimum support criteria by non-contradiction in the knowledge base obtained by the union of the intermediate forms of texts. Texts are identified first that do not have a contradiction Texts containing contradiction are not considered in the following level. Candidate generation for the next level is performed in the same way as in frequent item set mining in successive levels. The final result is a collection of subsets of texts in the collection that generate a contradiction.
#codingexercise
Double GetAlternateOddNumberRangeMedian()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeMedian();

}

One of the interesting things about the discussion of TKM is that it seems applicable to domains other than text.

No comments:

Post a Comment