Cluster computing

Friday, January 16, 2015

#codingexercise
Double GetAlternateOddNumberRangeSqRtSumOfSquares()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeSqRtSumOfSquares();
}

We continue our discussion on Text Knowledge Mining. We were discussing reasoning. Complexity is one of the important aspects of the automatic inference in knowledge based systems. Systems may find a tradeoff between expressiveness of knowledge reasoning and complexity of reasoning. While this is true for both TDM and TKM, the TKM don't have as much difficulty. First, both TDM and TKM, are hard problems and there are exponential algorithms and many strategies for designing efficient mining algorithms. Second TKM is not intended to be exhaustive so it can limit the data sources.
Now let us look at how to assess the results. In data mining, there are two phases for evaluating the results. The first phase is one in which statistical significance of the results is assessed. The second phase is one in which there is subjective assessment for the novelty and usefulness of the patterns.
For TKM, we have the following:
The results are considered reliable if there is validity and reliability of the text. Second, inference procedures are applied.
The user decides whether the results are non-trivial. fortunately the results are expected to be trivial.
The non-triviality of the results is evaluated by the expert user.
The novelty of the results are evaluated against the BK.
The usefulness is evaluated by the experts.
The authors conclude that TKM is a particular case of knowledge mining. Knowledge mining deals with knowledge while data mining deals with data in which the knowledge is implicit. The operations are therefore deductive and abductive inference.

Thursday, January 15, 2015

We continue with our discussion on Text Knowledge Mining. We were discussing knowledge representation, Text mining requires to translate text into a computationally manageable intermediate form. This step is crucial and poses several challenges to TDM and TKM like obtaining intermediate forms, defining structures for information extraction, identifying semantic links between new concepts, relating different identifiers etc. A key problem with obtaining intermediate forms is that the current methods require human interaction. The objective of an intermediate form is not to represent every possible aspect of the semantic content of a text, but those related to the kind of inference we are interested in. That is why even if we are not able to fully analyze the text, we are only impacted with some missing pieces of knowledge. And TKM is not expected to be exhaustive in obtaining new knowledge. That said, if the analyzer obtains an inexact representation of text in an intermediate form, this can affect the knowledge discovered even so much as reporting false discoveries. The knowledge in a knowledge based system is assumed to be reliable and consistent but the same cannot be said to be true for a collection of text. This poses another challenge in TKM.
We now look at the role of background knowledge. The inclusion of background knowledge in text mining applications is widely recognized. Similarly in TKM, text does not contain common sense knowledge and specific domain knowledge that is necessary in order to perform TKM. As an example text containing A is father of B and B is father of A is not considered contradictory without background knowledge. This knowledge can contribute to TKM in the following ways: First, it allows us to create new knowledge that is not fully contained in the collection of text, but can be derived from a combination of text and background knowledge. This was a requirement of Intelligent text mining. Another interesting application of Background Knowledge is in the assessment of knowledge, in aspects like novelty and importance.
Reasoning and complexity are also important aspects of automatic inference in knowledge based systems.
#codingexercise

Double GetAlternateOddNumberRangeSumOfSquares()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeVarianceSumOfSquares();

}

Wednesday, January 14, 2015

#codingexercise
Double GetAlternateOddNumberRangeStdDev()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeStdDev();

}

We continue our discussion on Text Knowledge Mining. We discussed the algorithm for finding contradictions. Looking for contradictions is very useful. It helps with the consistency of text collection, or to assess the validity of a new text to be incorporated, in terms of the knowledge already contained in the collection. In addition, we can use it to group texts such as when we take a collection of papers expressing opinions about topics where opinions are in different groups. Finally it also lowers the overhead of reasoning with ontologies because we can now instead check the consistency by way of non-contradictions. This check can now become a preliminary requirement for reasoning.
We also looked at the challenges of TKM. Similar to the case in data mining, there are many existing techniques that can be applied but they have to be adapted. This may not always be easy. In addition, some areas require new research. Existing techniques could benefit areas like knowledge representation, reasoning algorithms for performing deductive and abductive inference and knowledge based systems. These are also applicable to natural language processing.
There are several differences between knowledge based systems and TKM
First, knowledge based system is built to contain as much knowledge as possible for a specific purpose. TKM treats them as reports and does not care for any one particular except that they are dedicated information collection.
Second, a knowledge based system tries to answer all the questions whereas a TKM assumes there is no such knowledge pieces and finds new hypothesis.
Third a knowledge based system does reasoning as part of a query processing. TKM does it to find new knowledge without specifying a query, though it can also choose to.
We also look at knowledge representation. Text mining requires to translate text into a computationally manageable intermediate form. This step of text mining is crucial and poses several challenges common to TDM and TKM. A key problem for obtaining intermediate forms for TKM is that the currently used techniques for translating texts to intermediate forms are mainly semi-automatic involving human interaction. On the other hand, many domains are trying to express knowledge representation models directly. In SemanticWeb for example, not only ontologies are used to represent knowledge but also efficient deductive inference techniques like graph based search are also available.

#codingexercise

Double GetAlternateOddNumberRangeVariance()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeVariance();

}

Tuesday, January 13, 2015

Today we continue discussing TKM techniques. Let us review the discovery of contradictions. Let us say res(ti) is a procedure performing resolution on the set of clauses obtained from ti, giving the value false in case it finds a contradiction. Let ti, tj be the concatenation of texts ( the corresponding clauses ) ti and tj. Let PT be the set of all possible concatenations of subsets of T of any size. V represents the set of concatenations that do not contain contradictions. Then the algorithm for discovering contradictions is as follows. We find a candidate V1 with text that has no contradictions. We concatenate it to V. This is our initialized set. For the rest of the n iterations, we find text t' candidate and t'' initialized such that they are from disjoint sets and their combination doesn't contain a contradiction. We check with all the subsets of text containing the candidate to rule out contradictions and perform a union for non-contradiction. Finally, with the set of all possible concatenations of subsets of T of any size and with V that contains a set without contradictions, we can exclude the ones that do contain contradictions.
We now look at the challenges of Text knowledge mining. There is a good analogy with data mining in that existing techniques worked for data mining and similarly there are existing techniques that can work for TKM. These include the ones for knowledge representation, reasoning algorithms for performing deductive and abductive inference, and knowledge based systems, as well as natural language processing techniques. As in the case of data mining, these techniques must be adapted for TKM. In addition, new techniques may be needed. We look at some of these challenges as well.
Knowledge based systems and TKM systems have very different objectives that affect the way techniques coming from knowledge representation and reasoning can be adapted to TKM. In a knowledge based system, a knowledge base is built containing the knowledge needed by the system to solve a specific problem. In TKM, the knowledge managed by the systems is collected from texts each of which is treated as a knowledge base as in historical or normative reports as opposed to something that builds a knowledge system. Another difference is that the knowledge based systems are intended to give answer to every possible question, or to solve any possible problem posed to the system while in TKM, new hypothesis and potentially useful knowledge is derived from a collection of text. In fact, TKM may exclude or not know about a set of knowledge pieces.
#codingexercise
Double GetAlternateOddNumberRangeMode()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeMode();

}

Monday, January 12, 2015

Today we will continue our discussion on Text Knowledge Mining. We discussed a variety of techniques and how they differ from or are related to TKM. Some advanced data mining techniques that can be considered TKM are related to literature-based discovery. The idea behind literature based discovery is this. Let A1, A2, A3, ... , An, B1, B2 .. C, be a set of concepts and suppose that we are interested in finding the relations between C and some of Ai. Suppose that no Ai and C appear simultaneously in the collection, but they can appear simultaneously with some of Bi. If Bi is large enough then a transitive relation between Ai and C can be hypothesized. This therefore can be used to create new hypothesis that can be been later confirmed by experimental studies.
In this technique, the identification of concepts to obtain intermediate forms of documents is a key point. The main problem with this point is that a lot of background knowledge about the text is necessary and it varies from one domain to another. In addition, the authors state that the relation between concepts is based in most of the cases on their simultaneous appearance in the same title or abstract. Also, the generation of hypothesis does not follow a formal procedure. That said, some formal model for conjectures, consequences and hypothesis are available in the literature which then can be a tool for literature based discovery and TKM.
A text mining technique that can be considered TKM is one involving the discovery of contradictions in a collection of texts. The starting point is a set of text documents T = t1, .. tn of arbitrary length. A first order logic representation in clausal normal form of the content of the text is employed as intermediate form and a deductive inference procedure based on the resolution is performed. The intermediate form is obtained by a semi automatic procedure in two steps. First an interlingua representation is obtained from text. Then the first order clausal form is obtained by means of information extraction techniques even involving background knowledge.
The mining process here is similar to the level wise plus candidate generation exploration in frequent item sets discovery, but replacing the minimum support criteria by non-contradiction in the knowledge base obtained by the union of the intermediate forms of texts. Texts are identified first that do not have a contradiction Texts containing contradiction are not considered in the following level. Candidate generation for the next level is performed in the same way as in frequent item set mining in successive levels. The final result is a collection of subsets of texts in the collection that generate a contradiction.
#codingexercise
Double GetAlternateOddNumberRangeMedian()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeMedian();

}

One of the interesting things about the discussion of TKM is that it seems applicable to domains other than text.

Sunday, January 11, 2015

We continue the discussion on Text Knowledge Mining. We now look at what KM does not mean. In other words, we look at what people are saying about Knowledge mining. For example, knowledge mining is employed to specify that the intermediate form employed contains knowledge richer than simple data structures though inductive data mining techniques are used on the intermediate forms and the final patterns contain only those that are interesting to the user. Another example is when knowledge mining is employed to indicate that the background knowledge and user knowledge is incorporated in the mining process in order to ensure that the intermediate form and the final patterns contain only those concepts interesting for the user. In yet another example, knowledge mining refers to using background knowledge to evaluate the novel and interesting patterns after an inductive process.
Sometimes another term deductive mining is used but it is vague and even this is different from the proposal made so far. Deductive mining is used for a group of text mining techniques where the better known example is said to be information extraction. This is referred to as the mapping of natural language texts into predefined, structured representation, or templates, which when filled represent an extract of key information from the original text or alternatively as the process of filling the fields and records of a database from unstructured text. This implies that no new knowledge is found. Therefore this falls in the deductive category based on the starting point but the difference is that the deductive inference performed is translating from one representation to the other. One specific application automatic augmentation of an ontology relations is a good example of information extraction. TKM can use such techniques.
Some have added that deductive data mining be performed by adding deductive capabilities to mining tools in order to improve the assessment of the knowledge obtained. The knowledge is supposed to be found mathematically and taken into account hidden data assumptions.
A deductive data mining paradigm is also proposed where the term refers to the fact that the discovery process is performed on the result of the user queries that limit the possibility to work on corrupt data. However in these cases, the generation of new knowledge is purely inductive.
Having discussed what TKM is different from, we now see what TKM is related to. Some have alluded to non-novel investigation like information retrieval, semi-novel investigation like standard text mining and KDD for pattern discovery and novel investigation or knowledge creation like Intelligent Text Mining which implies the interactions between users and text mining tools and/or artificial intelligence techniques. In the last category here, there is no indication about the kind of inference used to get the new knowledge.
Some advanced text mining techniques that can be considered TKM are related to literature based discovery where these techniques provide possible relations between concepts appearing in the literature about a specific topic although some authors beg to differ from discovery and mention assisting human experts in formulating new hypothesis based on an interactive process.
#codingexercise
Double GetAlternateOddNumberRangeMax()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeMax();
}
#codingexercise

Double GetAlternateOddNumberRangeAvg()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeAvg();

}

#codingexercise

Double GetAlternateOddNumberRangeSum()(Double [] A)

{

if (A == null) return 0;

Return A.AlternateOddNumberRangeSum();

}

Saturday, January 10, 2015

We continue the discussion on Text knowledge mining : An alternative to text data mining paper, We saw that the authors proposed the TKM as one based on abduction and includes techniques that generates new hypothesis. Data mining techniques are based on inductive inference. TDM can be both subjective and objective. When TDM is subjective, typically domain knowledge is involved. When TDM is objective, typically statistics is used. Together they represent simple facts in data structures from which new knowledge can be obtained. This is mining and even though there is strong association with inductive learning, the word induction is generally not used with reference to it The prevalence of such approach implies that data mining has been useful. Data mining however is different from text mining in that the former depends on data model while the latter depends on natural language. Data models have limited expressive power. Natural language is richer. It is able to represent not only simple facts but also general knowledge, from relation like rules to complex procedures. The disadvantages with natural language involve computationally expensive to manage structures, vagueness, ambiguity etc. There is no restriction to what kind of techniques whether deductive and/or adductive can be used with mining. By proposing text mining based on non-inductive inference, the authors propose a new direction. TKM is mining by reasoning using knowledge contained in text.TKM is a particular case of what the authors call knowledge mining. The latter defined as obtaining non-trivial previously unknown and potentially useful knowledge from knowledge repositories. The main difference between the definition of knowledge mining and the definition of data mining is that the former intends to discover knowledge from knowledge repositories while the latter attempts to discover knowledge from data repositories. The difference is the starting point, the basic material from which new knowledge is to be obtained. The starting point also makes the approach clear in that the former is deductive or adductive inference while the latter is inductive reference. Reading this it might feel like we are taking existing techniques and applying to a higher layer. In fact there can be other layers as well. Knowledge and data are not the only layers for example, the actions are based on knowledge. Having actions recommended from knowledge is yet another strata that we haven't reached yet. Coming back to text mining, the phases of the TKM process are exactly the same as those of TDM i.e. first text refining for obtaining a computationally manageable intermediate form representing the text, a mining procedure to obtain new knowledge.and a final step for assessing and filtering the knowledge obtained.
#codingexercise
Double GetAlternateOddNumberRangeMin()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeMin();
}