Today we conclude reading the Text Summarization with
multi-document summarization and evaluating summaries:
When summarizing multiple documents, one has to identify and
locate thematic overlaps. One also has to decide what to include of the
remainder to overcome potential inconsistencies between documents, and when
necessary to arrange events from various sources along a single timeline.
Assuming that all input documents are parsed into templates
(whose standardization makes comparison easier), systems like SUMMONs clusters
the templates according to their contents and then applies rules to extract
items of major import.
To determine what additional material should be included,
first identify the units most relevant to the user’s query and then estimate the
marginal relevance of all remaining units.
Lastly, some domains lend themselves to simple rules that can
even get near perfect results. For examples, news articles covering the same
event can be summarized by picking the most recent one in the timeline and
taking the first two or three paragraphs.
How can you evaluate the quality of a summary ? This can be
both intrinsic and extrinsic – the former determines output quality and the latter
measures user assistance in task performance. The intrinsic ones use sentence
or phrase recall and precision. A second intrinsic method is to require evaluators
who judge based on readability, informative-ness, fluency and coverage.
Extrinsic evaluation is easy to motivate.IR methods produce the best query
based extracts. Much of the scoring process becomes simplified to IR like
recall and precision measures against human extracts. That said the biggest
challenge here is that summaries receive
different scores with different measures.
There are two basic measures to capture the extent to which
a summary meet s the requirements to be smaller than the input text and contain
the most important information of the original. These are:
1)
Compression ratio : CR = (length S)/(length T)
2)
Retention ratio : RR = info in S / info in T,
where T is the text and S is the summary.
The latter can be judged in four different ways:
1)
The expert game: Ask experts to underline or
extract the most interesting or informative fragments of the text.
2)
The classification game: This is an extrinsic measure
3)
The Shannon game: This is an information theory
method that relies on the probability of the reader guessing the message, as
say p and the measure is –p log p.
4)
The question game: This measure approximates the
information content of S by allowing how well it allows readers to answer
questions drawn up about T
Big numbers can be formed if numbers in an array are
concatenated together. How do you print the minimum concatenated number of a
given array? For example, if the input array is {3, 32, 321}, there are six
concatenated numbers and the minimum one is 321323.
There are n! permutations for an array of size n. So the
brute force approach would be O(n!)
The key observation here is that sorting doesn’t work but if
we change the sort rule in such a way that when sorted this special way, their
concatenation yields the smallest number.
Void PrintConcatenations(int number[])
{
String strNumber = new String[number.length];
For (int I = 0; I < number.length; i++)
strNumber[I]
= number[i].ToString();
Array.Sort(strNumber, new NumericComparer());
For (int I = 0; I < number.length; i++)
Console.WriteLine(strNumber[I]);
}
Class NumericComparer implements IComparer<String> {
Public int Compare(String num1, String num2) {
String str1 = num1 +
num2;
String str2 = num2 + num1;
Return str1.CompareTo(str2);
}
}
#codingexercise
If two English words have the same characters and the occurrence number of each character is also identical respectively, they are anagrams. The only difference between a pair of anagrams is the order of characters. For example, “silent” and “listen”, “evil” and “live” are two pairs of anagrams. Please implement a function to verify whether two words are a pair of anagrams.
bool IsAnagram(string one, string two)
{
if (one.length == two.length) return false;
var h = new HashTable();
for (int i =0; i<one.length;i++)
InsertHash(ref h, one[i]);
for (int i = 0; i < two.length; i++)
if (DecrementHash(ref h, two[i]) == false)
return false;
return h.IsEmpty();
}
{
if (one.length == two.length) return false;
var h = new HashTable();
for (int i =0; i<one.length;i++)
InsertHash(ref h, one[i]);
for (int i = 0; i < two.length; i++)
if (DecrementHash(ref h, two[i]) == false)
return false;
return h.IsEmpty();
}
No comments:
Post a Comment