Monday, May 26, 2014

In this post, I want to talk about a coloring problem of graph theory. Let us take the problem of coloring a grid. No grid cell may touch another cell of the same color. Each color when used has a cost. The goal is to color the map as cheaply as possible. If we take the cells to be represented by vertices, no adjacent vertices can have the same color. Such a coloring is known as minimum vertex coloring A vertex coloring of a graph k or fewer colors  is said to be  a k-colorable graph. The chromatic number is the number of colors used. There's said to be a theorem called the four color problem that states that any planar graph can have a chromatic number of at most 4. This was first conjectured in 1852 but its difficult to prove manually. Microsoft Research folks announced in 2004 that they had verified a proof by Robertson et al by formulating the problem in equational logic and conforming the validity of each step.
Some more interesting facts about a simple planar graph.
The number of edges <= 3V - 6
The degree of any vertex <= 6
The set E of the edges of a graph without any loops represents the adjacency relation on V. By definition such a relation is irreflexive ( not onto to the same vertex ) and symmetric.
Edges can be traversed in paths, cycles, trails and circuits.
A walk is an alternating sequence of vertices and connecting edges.
A path is a walk that does not included any vertex twice except that the first might be the last as well.
A cycle is a path that begins and ends on the same vertex.
A trail is a walk that does not pass over the same edge twice though it can cross the same vertex again.
A circuit is a trail that begins and ends on the same vertex.
Alongwith spanning trees, other graph traversal methods include Eulerian and Hamiltonian cycles, network flows from source to sink, and graph colorings where edges or vertices are colored. The network flow problem can be shown to be solved in O(n^3)
An Eulerian cycle or cirucit is a trail that starts and ends at the same vertex. It is a cycle which uses each graph edge exactly once.
Euler showed that a connected graph has an Eulerian cycle iff it has no graph vertices of odd degree.
A Hamiltonian cycle or circuit is a graph cycle that visits each node exactly once. A connected graph with two vertices does not have a Hamiltonian cycle. A graph with three vertices has two Hamiltonian cycles. A graph with four vertices has ten Hamiltonian cycles
The problem of finding a Hamiltonian cycle is NP-complete. If a general graph is given, then it is exhaustively searched to find the Hamiltonian cycle. Rubin's method describes an efficient search procedure that reduces backtracking and guesswork in finding some or all Hamiltonian paths. All Platonic solids also called regular solids which are composed of congruent convex regular polygonal faces have Hamiltonian paths
With the above description on graphs, the coloring of the vertexes and the different ways to traverse the graphs, we wanted to find the keywords in a text and to list them such that we can summarize the text.
However the path to the vertices and the traversal seem to be less useful than finding the keywords themselves either with coloring or by some other ways of costing. When we used the ontology of words such that we could find similarity measures, it considered only pair wise words along their semantical relationship but it did not consider the co-occurrences as it is available in the given text. If we find that the words are similar because they appear in a thesaurus or can be categorized in a hierarchical classification, it is less useful than when they appear with other words as the author intended. If the text is not short, then these co-occurrences matter more and could be considered more reliable than the graph method. So at the very least, we should not consider that the graph based traversal of words as a substitute for the higher dimension based clustering methods. That said graph based traversals need not be with ontologies only.

No comments:

Post a Comment