Cluster computing

In this post, we will cover some graph theory problems from the book by Douglas B. West and another by Reinhard Diestel.
A graph consists of a vertex set V(G), an Edge set E(G) and a relation that associates with each edge two vertices that are its endpoints.
If we have three vertex sets 1,2,3,4 and 5. then there can be many edges that are associated with the vertices, some redundant and some forming loop. A graph is considered complete when set of vertex have all edges, i.e each pair has an edge. A simple graph is one that contains no loops and has no multiple edges. A simple graph may be either connected or disconnected. A connected graph is one in which there is a path from any one point to another point. A graph that is not connected is a disconnected graph.
Graphs have an interesting problem set where vertices can be colored. Together these sets of problems that involve associating a color to the vertex, are called coloring problems.
If there are only two colors to the paths, then their chromatic number is 2. Why is coloring important because it lets us discern some vertices from others. We can also arrange or treat these vertices differently. For example, a bipartite graph also called a bigraph is a set of graph vertices decomposed into two disjoint sets such that no two graph vertices within the same set are adjacent. By adjacent, we mean vertices joined by an edge.
In any case, returning to our discussion on why we wanted to visit graph theory for keyword extraction or text summarization. We started with an assumption that every keyword contributes a meaning to the text while holding some relation with the others. This relation can be described in the form of competition, weighted competition, cooperation or are isolated and independent from each other. In other words, there are linkages between the keywords we want to exploit with the help of graphs. Moreover, the graph in the given text can be partitioned based on the eigenvalues and eigenvectors of their adjacency matrices. To compute the eigenvalues we start with the Weighted Laplacian Matrix. A Laplacian matrix is one where the diagonal elements are the degree of each node. For example, if a node 1 has 4 edges then the degree of the node 1 is four. When the node i is adjacent to the node j, then the value is -1, otherwise the values are zero. We could also normalize the Laplacian Matrix. This matrix serves can serve as an input to an algorithm that partitions the graphs. These ideas come from such studies in biodiversity.
We will also attempt to explore directed graphs in the text.
For the undirected graphs, what I wanted to do was to establish relations between terms in a text using similarity or distance measures such as with a thesaurus or WordNet. These become the edges in the graph and the higher the degree of the vertices the higher their candidature as a keyword.
Once we establish the set of keywords we want to work with, we could then work with that graph or partition it further.

Cluster computing

Saturday, May 24, 2014

No comments:

Post a Comment