Cluster computing

Tuesday, January 20, 2015

Today we continue our discussion on text data mining with another example for exploratory Data Analysis. We discuss using text to uncover social impact. Narin et al studied the effects of publicly financed research on industrial advances. After years of preliminary studies and building special purpose tools, the authors found that the technology industry, relies more heavily than ever on government sponsored research results. The relationships between patent text and published research literature was explored using a procedure as follows: The text from the front pages of the patents issued in two consecutive years was analyzed. There were hundreds of thousands of references from which those published in the last 11 years were filtered. These references were linked to known journals and authors' addresses. Redundant citations were eliminated and articles with no American author were discarded. The study had a core collection of 45000 papers. For these papers, the source of funding was found from the closing lines of the research paper. From these, it was revealed that there was an extensive reliance on publicly financed science.
Patents awarded to the industry were further filtered by excluding those awarded to schools and governments. From these, the study examined the peak year of literature references and found 5217 citations to science papers. 73.3 % of these were found to be written at public institutions.
The example above illustrates a number of steps needed to perform the exploratory data analysis. Most of these steps at that time had to rely on data that was not available online and therefore had to be done by hand. With this example and the previous one to form hypotheses, we see that the exploratory data analysis uses text in a direct manner.
We next discuss the LINDI project to investigate how researchers can use large text collections in the discovery of new important information, and to build software systems to help support the process.
There are two ways to discuss new information. Sequences of queries and related operations are issued across text collections. And concepts that co-occur within the retrieved documents are statistically and visually examined for associations.
There are tools for both and these make use of attributes associated especially with text collections and their metadata. The steps in the second example should be tightly integrated with the analysis and integration tools needed in the first example.
#codingexercise
Double GetAlternateOddNumberRangeProductCubes()(Double [] A)
{
if (A == null) return 0;
Return A.AlternateOddNumberRangeProductCubes();
}

Cluster computing

Tuesday, January 20, 2015

No comments:

Post a Comment