Cluster computing

Thursday, July 13, 2017

Domain Specific Text Classification:
The writeup here introduced domain specific text summarization which unlike the general approach to text summarization utilizes the desired outcome as a vector in itself. Specifically, we took the example of a resume matching to a given skillset required for a job. We said that the role can be described in the form of a vector of features based on skillsets and for a given candidate, we can determine the match score between the candidate’s skill sets and that of the role.
We could also extend this reasoning to cluster resumes of more than one candidate as potential match for the role. Since we compute the similarity score between vectors, we can treat all the resumes as vectors in a given skills matrix. Then we can use a range based separation to draw out resumes that have similarity score in k ranges of the scores between 0 and 1. This helps us determine the set of resumes that are the closest match.
We could also extend this technique between many resumes and many positions. For example, we can match candidates to roles based on k-means clustering. Here, we have the roles as centroids of the clusters we want to form. Each resume matches against all the centroids to determine the cluster closest to it. All resumes are then separated into clusters surrounding the available roles.
By representing the skills in the form of a graph based on similarity, we can even do page rank on those skills. Typically we need training data and test data for this. The training data helps build the skills weight matrix.
Conclusion: With the help of a skillset match and a similarity score, it is possible to perform matches between jobs and candidates as a narrow domain text classification. The similarity measure here is a cosine similarity between the vectors.
#exercise
yesterday we were discussing the length of the longest substring as a subsequence of another. We could do this with dynamic programming in a bottom up approach where the required length is one more than that computed for the previous match if the characters match or its the same as what was computed for the previous length of the substring.

Cluster computing

Thursday, July 13, 2017

No comments:

Post a Comment