Monday, September 23, 2013

In the previous posts, we looked at possible solutions for keyword detection. Most of those relied on large corpus for statistical modeling. I want to look into solutions on the client side or at least rely on web based requests to a central server. When we relied on the keyword detection using corpus data, there was a large volume of text parsed and statistics gathered. These are useful when we are doing server side computing but in the case of client tools such as a word doc plugin we hardly have that luxury unless we rely on web requests and responses. If the solution works on the server side, the text on the client side can be sent to the server side for processing. If the server side is implemented with APIs, then it can enable several clients to connect and process.
This means we can write a website that works with different handheld devices for a variety of text.  Since the text can be parsed from several types of document, the proposed services can work with any.
The APIs can be simple in that they take input text and generate a list of keywords in the text in the form of word offsets, leaving the pagination and rendering to the client. This works well when a text or a collection of words can be looked up based on relative offsets which guarantee a consistent and uniform way to access any incoming text.
Also, some of the things we could do with server side processing is to build our corpus if that is permissible. The incoming text is something that we could get more representative text for which this strategy is important. The representative text is not only important from a collection perspective but also gives common keywords that are most prevalent. The identification of this subset alone can help with pure client side checks and processing.

No comments:

Post a Comment