Friday, January 23, 2015

Today  we discuss the paper on the Study of the Access Control Model in information security by Qing hai, Ying et al.  This paper compares and contrasts different access control mechanism, models and instruments specifically access control lists, access control capabilities list, mandatory access control policy, role-based control model and access control in Grid environment. They discuss this in the context of network security. Access control here is about the principals involved, their access and the permissions associated with a resource. The paper talks about three different modes, the discretionary access control (DAC), the mandatory access control (MAC) and the role based access control model (RBAC). DAC permits legal users to access the regulated objects by the identity of the user or user group. In addition users may delegate their authority to other users in a discretionary manner.  This used to exist on all flavors of UNIX, NT/Server etc. The system used to identify the user and then limit access to resources that the user can have access to.The resources that the user can access can be change by any member of a privileged user group.  It is implemented using an access control matrix, an access control list, and an access control capabilities list. The access control matrix is a two-dimensional matrix representing principals such as users, programs and user-agents versus the resources such as documents and services.The cells are filled with authorization permission. This matrix is very flexible to implement DAC. At the same time it suffers from the downsides that it cannot be transmitted and it may affect performance if its size is too big. Space and speed may degrade as the matrix grows. An Access Control List is a linked list of permissions for each principal against a resource.  This is probably the most prevalent mechanism and it is simple, convenient and practical.  An Access Control Capabilities list is also a linked list that subjoins the users list with the objects list so that for a given user, its ACCL describes the objects it has permissions to. Note that ACL was about an object and the users that have access to it. An ACCL determines the capabilities of a  user. Capability can be transferred. The capabilities list is generally considered insecure because by transferring capabilities the resource is not consulted and may lead to unauthorized access of a resource. Both lists suffer from the problem that they can grow to be arbitrarily large depending on the number of users and resources.

Thursday, January 22, 2015

We were discussing the Hearst paper on Text Data Mining. In the section on LINDI project, we talked about a facility for users to build and reuse sequences of query operations. In the gene example, this lets the user to specify a sequence of operations to one co-expressed gene and then iterate this sequence over  list of other co-expressed genes, The interface allows the following operations : iteration, transformation ,ranking, selection and reduction. The ability to record and modify sequences of actions is important because there is no predetermined exploration strategy because this is a new area.When strategies are found to be successful, then they can be considered for automation. Before that, if there are enough strategies then they can be used in an advisory or an assistant mode. The emphasis of this system is to help automate the tedious part of the text manipulation process and to integrate underlying computationally driven text analysis with human guided decision making.
To summarize, large online text collections can be used to discover new facts and trends.
#codingexercise
Double GetAlternateEvenNumberRangeProductSqRtCubes()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeProductSqRtCubes();
}
Tomorrow we discuss Two-layered Access control for Storage Area Network.

Wednesday, January 21, 2015

#codingexercise
Double GetAlternateOddNumberRangeProductSqRtCubes()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductSqRtCubes();
}
Today we continue our discussion on Hearst's paper on text data mining. We were discussing the LINDI project. The objectives of the LINDI project are to investigate how researchers can use large text collections in the discovery of new important information, and to build software systems to support this process. The main tools for discovering new information are of two types: 1) support for issuing sequences of queries and related operations across text collections. and 2) tightly coupled statistical and visualization tools for examination of association among concepts that co-occur within retrieved documents. Both set of tools make use of attributes associated with text collections and their metadata. That is why integration is recommended between the tools.
User and system interact iteratively. System proposes new hypotheses and strategies for investigating these hypotheses, and the user either uses or ignores these suggestions and decides on the next move.
In this project, newly sequenced genes are discovered by automation. Human genome researchers perform experiments in which they analyze co-expression of many new and known genes simultaneously. Given this, the goal is to determine which of the new ones are interesting. The strategy is to explore biomedical literature and come up with hypotheses about which genes are of interest.
In tasks like this, the user has to execute and keep track of tactical moves and repeat them often distracting from reasonings. This project provides a facility for users to build and so reuse sequences of query operations via a drag and drop interface. They allow the users to repeat the same sequence of actions for different queries.
The operations include :
Iteration of an operation over the items in a  set.
applying an operation and returning a transformed item.
applying an operation and returning an ordered set of items.
applying an operation and returning a selected set of items
applying an operation and returning a singleton result.

Tuesday, January 20, 2015

Today we continue our discussion on text data mining with another example for exploratory Data Analysis. We discuss using text to uncover social impact. Narin et al studied the effects of publicly financed research on industrial advances. After years of preliminary studies and building special purpose tools, the authors found that the technology industry, relies more heavily than ever on government sponsored research results. The relationships between patent text and published research literature was explored using a procedure as follows: The text from the front pages of the patents issued in two consecutive years was analyzed. There were hundreds of thousands of references from which those published in the last 11 years were filtered. These references were linked to known journals and authors' addresses. Redundant citations were eliminated and articles with no American author were discarded. The study had a core collection of 45000 papers. For these papers, the source of funding was found from the closing lines of the research paper. From these, it was revealed that there was an extensive reliance on publicly financed science.
Patents awarded to the industry were further filtered by excluding those awarded to schools and governments. From these, the study examined the peak year of literature references and found 5217 citations to science papers. 73.3 % of these were found to be written at public institutions.
 The example above illustrates a number of steps needed to perform the exploratory data analysis. Most of these steps at that time had to rely on data that was not available online and therefore had to be done by hand. With this example and the previous one to form hypotheses, we see that the exploratory data analysis uses text in a direct manner.
We next discuss the LINDI project to investigate how researchers can use large text collections in the discovery of new important information, and to build software systems to help support the process.
There are two ways  to discuss new information. Sequences of queries and related operations are issued across text collections. And concepts that co-occur within the retrieved documents are statistically and visually examined for associations.
 There are tools for both and these make use of attributes associated especially with text collections and their metadata. The steps in the second example should be tightly integrated with the analysis and integration tools needed in the first example.
#codingexercise
Double GetAlternateOddNumberRangeProductCubes()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubes();
}

Monday, January 19, 2015

#codingexercise
Double GetAlternateEvenNumberRangeProduct()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeProduct();
}
Today we will continue to discuss Hearst's paper on untangling text data mining.
 We had cited an example from the previous paper about using text to form hypotheses about Disease. Today we look at this example straight from one of the sources. The fact that new hypotheses can be drawn from text has been alluring for a long time but virtually untapped. Experts can only read  a small subset of what is published in their fields and are often unaware of developments in related fields. Thus it should be possible to find useful linkages between information in related literatures but the authors of those literatures rarely refer to one another's work.
The example of tying migraine headache to deficiency in magnesium was suggested in Swanson and Smalheiser's efforts.  Swanson extracted various titles of articles in the biomedical literature and paraphrased them as we had seen earlier
stress is associated with migraines
stress can lead to loss of magnesium
calcium channel blockers prevent some migraines
magnesium is a natural calcium channel blocker
spreading cortical depression is implicated in some migraines
high levels of magnesium inhibit spreading cortical depression
migraine patients have high platelet aggregability
magnesium can suppress platelet aggregability

This led to the hypotheses and its confirmation via experimental means. This approach has been only partially automated. By that Hearst means that there are many more possibilities as can be hinted with combinatorial explosion. Beeferman explored certain links via lexical relations using WordNet.  However, sophisticated new algorithms are needed for helping in the pruning process. Such process needs to take into account various kinds of semantic constraints. This therefore falls in the domain of computational linguistics.

#codingexercise
Double GetAlternateOddNumberRangeProduct()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProduct();
}

Sunday, January 18, 2015

Today we continue to read from Hearst: Untangling Text Data Mining. (TDM)
We saw the mention that TDM can yield tools that indirectly aid in the information access process. Aside form providing these tools, Hearst mention that it can also provide tools for exploratory data analysis.  Hearst compares TDM with computational linguistics. He says empirical computational linguistics computes statistics over large text collections in order to discover useful patterns. These patterns then inform the algorithms of various sub problems within natural language processing. For example, co-occurrence of prices, prescription and patent are highly likely to co-occur with the medical sense of the "drug" while "abuse, paraphernalia and illicit" are likely to co-occur with the illegal use of the word drug. This kind of information can also be used to improve information retrieval algorithms. However, these are different from TDM. TDM can instead help with the efforts to automatically augment existing lexical structures such as WordNet relations as mentioned by Fellbaum. Hearst also gave such an example of identifying lexicosyntactic patterns that help with WordNet relations. Manning gave an example of automatically acquiring sub-categorization data from large text corpora.
We now review TDM and category metadata. Hearst notes that text categorization is not TDM. He argues that text categorization reduces the document to  a set of labels but does not add new data.  However an approach that compares distributions of category assignments within subsets of the document collection to find interesting or unexpected trends can be considered TDM. For example, distribution of commodities in country C1 can be compared with those of C2 to constitute an economic unit. Another effort that can be considered as TDM is DARPA topic detection and Tracking initiative where it receives a stream of new stories in chronological order and the system marks a yes or no on arrival for whether the story is a first reference to an  event.
By citing these approaches Hearst says TDM says something about the world outside the text collection.
TDM can be considered to be a process of exploratory data analysis which results in new and yet undiscovered information or the answers for questions for which the answer is not currently known. The effort here is to use text for discovery in a more direct manner than inferencing manually. Two examples are provided.  First is using text to form hypothesis about disease and second is to use text to uncover social impact.
#codingexercise
Double GetAlternateEvenNumberRangeProductOfSquares()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeProductOfSquares();
}
Double GetAlternateEvenNumberRangeSqRtProductOfSquares()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSqRtProductOfSquares();
}

Saturday, January 17, 2015

#codingexercise
Double GetAlternateEvenNumberRangeSqRtSumOfSquares()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSqRtSumOfSquares();
}
Today we read a paper from Hearst : Untangling text data mining. Hearst reminds us that the possibilities of extracting information from text is virtually untapped because text expresses a vast range of information but encodes it in a way that is difficult to decipher automatically. We recently reviewed the difference between Text Knowledge Mining and Text Data Mining. In this paper, the focus is on text data mining. Some of the new problems encountered in computational linguistics are called out in this paper and outlines ideas about how to pursue exploratory data analysis over text.
Hearst differentiates between TDM and Information Access. The goal of information access is to help users find documents that satisfy their information needs. The standard procedure is akin to looking for needles in a needlestack. The analogy comes from the fact that the problem is about finding information that coexists with other valid pieces of information. The homing in on to the information that the user is interested in is the problem here. As per Hearst, the goal of data mining on the other hand, is to derive new information from data, finding patterns across datasets, and/or separating signal from noise. The fact that an information retrieval system can return a document that contains the information the user requested implies that no new discovery is made.
Hearst points out that text data mining is sometimes discussed together with search on the web. For example, the KDD-97 panel on data mining stated that the two challenges predominant for data mining are finding useful information on the web and discovering knowledge about a domain that is represented by a collection of web-documents as well as to analyze the transactions run in a web based system. This search-centric view misses the point that the web can be considered a knowledge base that is helpful to extract new never before encountered information.
The results of certain types of text processing can yield tools that indirectly aid in the information access process. Examples include text clustering to create thematic overviews of text collection.
#codingexercise
Double GetAlternateEvenNumberRangeSumOfSquares()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateEvenNumberRangeSumOfSquares();
}