Cluster computing

Saturday, June 15, 2013

Finding the convex hull

The convex hull of a set Q of points is the smallest convex polygon for which each point in Q is either on the boundary or inside the polygon. We discuss two techniques to solve these called Graham's scan and Jarvis' march.
Graham's approach is based on the following steps:
1) Choose the point with the lowest y-coordinate and the left most in case of a tie as the starting point.
2) Push this point and the next two points visited in the counter clockwise order on stack S
3) For the next points if the angle formed by the next to top and the top and the candidate point makes a non-left turn, then pop it from the stack
otherwise push the next point on the stack and proceed
The stack returns the convex hull vertices.
The complexity is O(nlogn)
Jarvis' approach also known as package wrapping is based on the following steps:
We start with the lowest point and we go around the board building a sequence such that the next vertex in the convex hull has the smallest polar angle with respect to the previous point and in case of ties we pick the point farthest from the previous point. When we reach the highest vertex, breaking ties by choosing the farthest such vertex.
The complexity is O(NM)

Friday, June 14, 2013

Computational Geometry is the branch of computer science that studies algorithms for solving geometric problems.The input to a problem in this area is typically is a set of geometric objects such as a set of points, a set of line segments, or the vertices of a polygon in counter-clockwise order.
To determine whether a set of n line-segments contains any intersections, we review a technique called sweeping.
Before we discuss sweeping, let us use the following:
Two consecutive segments turn left or right if their cartesian product (p2-p0)*(p1-p0) is positive or negative.
Two line segments intersect each other if each segment straddles the line containing the other.
segments-intersect (p1, p2, p3, p4)
We determine if line segments straddle by finding the directions of the four line segments and checking that there are pairs that are opposite. We also check that if any of the directions are zero, then we check that the opposite end is colinear with a segment.
Now we look at the sweeping technique that describes whether any two line segments in a set of segments intersect.
In sweeping, an imaginary vertical sweep line passes through the given set of geometric objects, usually from the left to the right. This technique provides a way of ordering the geometric objects by placing them in a dynamic data structure. We further assume that no input segment is vertical and that no three input segments intersect at a single point.
The first assumption tells us that any segment crossing a vertical sweep line intersects it at only one point.
Where the line segments intersect the sweeping line, the intersection points are comparable and are taken in the order of the increasing y coordinates. Where the segments intersect, this order is reversed. Any sweep line that passes through the shaded region has e and f intersect, they reverse their orders.

Review of Text analytics 2011 talk by Seth Grimes
Text analysis adds value where transactional information stops. From the information retrieval perspective, people want to publish, manage, archive, index and search, categorize and classify and extract metadata. Text analytics add semantic understanding of named entities, pattern based entities, concepts, facts and relationships, concrete and abstract attributes, subjectivity etc. Text analytics applications lets users search terms, retrieve material from large scale structures, search features such as entities or topics, retrieve materials such as facts and relationships, group results based on topics and visually explore information. Some examples are SiloBreaker, FirstRain, Bing, Google. etc. Text analytics includes metadata and metadata population Search results are measued based on precision and recall. Accuracy is measured with the combination of the two in a term called f-score which is defined as 2 * (precision * recall)/ (precision + recall). Typical steps in text analytics include : identify and retrieve documents for analysis, apply techniques to discern, tag and extract entities and apply techniques to classify documents and organize extracted features. BeyeNetwork and Ranks.NL are some examples of these. Applications such as Connexor and VisuWords talk display part of speech tagging and ontology. Search logs suggest that

Thursday, June 13, 2013

Slide review of text analytics user perspectives on solution and providers by Seth Grimes ( Continued )
Text analysis involves statistical methods for a relative measure of the significance of words, first for individual words and then for sentences. Vector space models is used to represent documents for information retrieval, classification, and other tasks. The text content of a document is viewed as an unordered bag of words and measures such as TF-IDF (term-frequency-inverse-document-frequency) represent their distances in the vector space. Additional analytic techniques to group the text is used to identify the salient topics.
However, the limitation of statistical methods is that the statistical method have a hard time making sense of nuanced human language. Hence natural language processing is proposed where one or a sequence or pipeline of resolving steps are applied to text. These include:
tokenization - identification of distinct elements
stemming - identifying variants of word bases
Lemmatization - use of stemming and analysis of context and parts of speech
Entity Recognition - Lookup lexicons and gazetteers and use of pattern matching
Tagging - XML markup of distinct elements
Software using the above approaches have found applications in business, scientific and research problems. The application domains include: brand management, competitive intelligence, content management, customer service, e-discovery, financial services, compliance, insurance, law enforcement, life sciences, product / service design, research, voice of the customer etc. Text analytics solution providers include young and mature software vendors as well as software giants. Early adopters have very high expectations on the return of investments from text analytics. Among the survey conducted on adopters, some more findings are as follows:
Bulk of the text analytics users have been using it for four years or more.
Primary uses include brand management, competitive intelligence, customer experience, voice of the customer, Research and together they represent more than 50% of all applications.
Textual information sources are primarily blogs, news articles, e-mails, forums, surveys and technical literature.
Return on investment is measured by increased sales to existing customers, higher satisfaction ratings, new-customer acquisition, and higher customer retention.
A third of all spenders had budget below $50,000 and a quarter used open-source.
Software users also showed likes and dislikes primarily on flexibility, effectiveness, accuracy, and ease of use.
More than 70% of the users wanted the ability to extend the analytics to named entities such as people, companies, geographic locations, brands, ticker symbols, etc.
Ability to use specialized dictionaries, taxonomies, or extraction rules was considered more important than others.
This study was conducted in 2009.

A moving point within a bounded square

Q: If you are given a bounded square with cartesian co-ordinates of a 100 * 100 and a point within the square that moves in straight lines and rebounds of the edges, write a function that gives the new position of the point in each update.
A: Here's the code for the update method where previous and current are two points on the board.

Point Update ( Point previous, Point current )
{
   var next = new Point();

   if (previous.X < current.X)
   {
       next.X = current.X + (current.X - previous.X);
       if (next.X > 99) next.X = previous.X;
    }

   if (previous.Y < current.Y)
{
     next.Y = current.Y + (current.Y - previous.Y);
     if (next.Y > 99) next.Y = previous.Y;
   }

if (previous.X > current.X)
{
     next.X = current.X - (previous.X - current.X);
     if (next.X < 0) next.X = previous.X;
   }

if (previous.Y > current.Y)
{
     next.Y = current.Y - (previous.Y - current.Y);
     if (next.Y < 0) next.Y = previous.Y;
   }

if (previous == current)
{
    var random = new Random();
    next.X = random.Next(99);
    next.Y = random.Next(99);
  }

return next;
}

Wednesday, June 12, 2013

Slide review of text analytics user perspectives on solution and providers by Seth Grimes:
Text analytics is seeing active market growth. The technology - text mining and related visualization and analytical software - continues to deliver unmatched capabilites fueling growth. There are applications in media and publishing, financial services and insurance, travel and hospitality and consumer products and retail. The slides claim that no single organization or approach dominates the market.The findings reported are :
1) Top business applications of text analysis for respondents are a) brand management, b) competitive intelligence and c) voice of the customer
2) These applications take data from online sources such as blogs, articles, forums,
3) Experienced users of these applications prefer specialized dictionaries, taxonomies, or extraction rules and they often like open source.
Text Analytics describes software and transformational steps that discover business value in "unstructured" text.
These steps are:
Compose, publish, manage and archive
Index and search,
categorize and classify according to metadata and contents
summarize and extract information

A quick recap of design patterns:
Creational patterns:
Abstract Factory : This provides an interface for creating families of related or dependent objects without specifying their concrete classes.
Builder: Specifes the construction of a complex object from its representation so that the same process can be applied elsewhere.
Factory : Define an interface for creating an object, but lets subclasses decide which class to instantiate.
Prototype: Specify the kind of objects to create using a prototypical instance, and create new objects by copying this prototype.
Singleton: Ensure a class only has one instance and provides a global point of access to it.
Structural Patterns:
Adapter : Convert the interface of a class into another interface clients expect/
Bridge : decouple an abstraction from its implementation so that the two can vary independently
Composite : Compose objects into tree structure to represent part-whole hierarchies.
Decorator : Attach additional responsibilities to an object dynamically.
Facade: Provide a unified interface to a set of interfaces in a subsystem
Flyweight: Use sharing to support a large number of fine-grained objects efficiently
Proxy: Provides a placeholder for another object to control access to it.
Behavioural Patterns:
Chain of responsibility: Avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request.
Command : Encapsulate a request as an object. so that requests can be queued, logged and support undo
Interpreter : Given a language, define a representation for its grammar and use it to interpret sentences in the language.
Iterator : Provide a way to access the elements of an aggregate object sequentially
Mediator: Define an object that encapsulates how a set of objects interact.
Memento: Without violating encapsulation, capture and externalize an objects state so that the object can be restored to that state:
Observer: Define a one-to-many dependency between objects so that when one object changes state, all its dependents are notified and updated automatically.
State: Allow an object to alter its behavior when its internal state changes.
Strategy: Define a family of algorithms, encapsulate each one, and make them interchangeable
Template method: Define the skeleton of an algorithm in an operation, deferring some steps to sub classeses.
Visitor: Represent an operation to be performed on the elements of an object structure without changing the elements and defining new operations.