Cluster computing

Wednesday, July 12, 2017

Domain Specific Text Summarization:

The writeup here introduced text summarization which is a general approach to reduce the content so we can get the gist of a text with fewer sentences to read and it is available here. This would have been great if it could translate to narrow domains where the jargon and the format also matter and the text is not necessarily arranged in the form of a discourse. For example, software engineering recruiters find it hard to read through resumes because it does not necessarily appear as sentences to read. Further, even the savviest technical recruiter may find that the candidate meant one thing when the resume says another thing. This is especially true for missing technical or buzz words in the resume. On the other hand, recruiters want to sort the candidates into roles. Given a resume, the goal is to label the resume with one of the labels that the recruiter has come up with for the jobs available with her. If such was possible, it would then avoid the task of reading a resume for translations to see if it is a fit for a role. Such translations are not obvious even with a conversation with the candidate.

How do we propose to solve this? When we have pre-determined labels for the open and available positions, we can automate the steps taken by a recruiter to decide if the label is a fit or not for the candidate. The steps are quite comprehensive and rely on a semantic network to do correlation between the resume text and an available vocabulary to determine a score for the match between the candidate’s resume and the role requirements for the label. If the score exceeds a threshold, we determine the candidate can be assigned the label and given the green signal to take the screening test. Both the general text summarization and the narrow domain resume matching rely on treating the document as a bag of words. The key differences however are the features used with the documents. For example, we use the features that include the skill sets in terms of technologies, languages and domain specific terms. By translating the word in the resume to be vectors of features, we are able to better classify a requirement to a role where the role is also described in terms of the features required to do the job. This tremendously improves the reliability of a match and works behind the scenes.

Conclusion: With the help of the text summarization but a predetermined word-vector space for narrow domains, it is possible to avoid a lot of work for the recruiters while relying on latent knowledge about what is being mentioned in the resume.
#word vector match http://ideone.com/2NUwvu
#codingexercise
Find the maximum length of the subsequence of string X which is a substring of another string Y
int GetMax(String X, String Y)
{
int max = 0;
for (int i = 0; i < Y.Length; i++)
{
var single = Y.SubString(i,1);
if (single.IsSubsequence(X) && single.Length > max)
max = single.Length;
for (int j = i+1; j < Y.Length; j++)
{
var sub = Y.SubString(i, j-i+1);
if (sub.IsSubsequence(X) && sub.Length > max)
max = sub.Length;
}
}
return max;
}
http://ideone.com/RNERDx

Cluster computing

Wednesday, July 12, 2017

No comments:

Post a Comment