Cluster computing

Saturday, November 9, 2013

Both the search log and case log mentioned in the previous post are required to perform the search. The search log keeps track of the search string that customers formulate and the case log maintains the history of actions, events and dialogues when the case is open. The implementation of the approach mentioned involves searches in both these logs. The search logs often have noise that comes with the nature of the web browsing required for matches of interest and availability to take place. Hence the search logs are subjected to both a pre-processing as well as a post-filtering technique Further the same content is viewed differently for different topics i.e. there are search views of document or content view in a hot topic to identify the extraneous ones. This identification of extraneous documents is not only beneficial for obtaining higher quality topics but to pinpoint documents that are being returned as noise to certain queries.
The case logs are different from the search log and hence their processing is somewhat indirect. The excerpts are generated from case log and mined Note that the case documents are in general very long documents because they capture all the information on the action taken on the case. In addition, there may be input from more than one party and hence there is a lot of noise to deal with. In the author's approach there is both a pre-processing as well as a clean up of the actions involved. Here the noise filtering is done by normalizing typos, misspellings and abbreviations. Even the words are normalized to a known jargon. This is done with a help of a thesaurus. Excerpt generation and summarization is composed of a variety of techniques. Techniques range from dealing with the characteristics of the text to making use of the domain knowledge. Sentences are identified regardless of tables and cryptic text that they wrap around. Sentences are ranked and primarily based on the technical content rather than the logistics. Techniques can be enabled or disabled independently.
The author proposes a novel approach to search hot topics from the search logs. Here the search view and the content view are combined to get high-quality topics and these have a higher match with the user's perspective.

Cluster computing

Saturday, November 9, 2013

No comments:

Post a Comment