Domain Specific Text Summarization:
The writeup here introduced
text summarization which is a general approach to reduce the content so we can
get the gist of a text with fewer sentences to read and it is available here.
This would have been great if it could translate to narrow domains where the
jargon and the format also matter and the text is not necessarily arranged in
the form of a discourse. For example, software engineering recruiters find it
hard to read through resumes because it does not necessarily appear as
sentences to read. Further, even the savviest technical recruiter may find that
the candidate meant one thing when the resume says another thing. This is
especially true for missing technical or buzz words in the resume. On the other
hand, recruiters want to sort the candidates into roles. Given a resume, the
goal is to label the resume with one of the labels that the recruiter has come
up with for the jobs available with her. If such was possible, it would then
avoid the task of reading a resume for translations to see if it is a fit for a
role. Such translations are not obvious even with a conversation with the
candidate.
How do we propose to solve this? When we have pre-determined
labels for the open and available positions, we can automate the steps taken by
a recruiter to decide if the label is a fit or not for the candidate. The steps are quite comprehensive and rely on
a semantic network to do correlation between the resume text and an available
vocabulary to determine a score for the match between the candidate’s resume
and the role requirements for the label. If the score exceeds a threshold, we
determine the candidate can be assigned the label and given the green signal to
take the screening test. Both the general text summarization and the narrow
domain resume matching rely on treating the document as a bag of words. The key
differences however are the features used with the documents. For example, we
use the features that include the skill sets in terms of technologies,
languages and domain specific terms. By translating the word in the resume to
be vectors of features, we are able to better classify a requirement to a role
where the role is also described in terms of the features required to do the
job. This tremendously improves the
reliability of a match and works behind the scenes.
Conclusion: With the help of the text summarization but a
predetermined word-vector space for narrow domains, it is possible to avoid a
lot of work for the recruiters while relying on latent knowledge about what is
being mentioned in the resume.
#word vector match http://ideone.com/2NUwvu
#codingexercise
Find the maximum length of the subsequence of string X which is a substring of another string Y
int GetMax(String X, String Y)
{
int max = 0;
for (int i = 0; i < Y.Length; i++)
{
var single = Y.SubString(i,1);
if (single.IsSubsequence(X) && single.Length > max)
max = single.Length;
for (int j = i+1; j < Y.Length; j++)
{
var sub = Y.SubString(i, j-i+1);
if (sub.IsSubsequence(X) && sub.Length > max)
max = sub.Length;
}
}
return max;
}
http://ideone.com/RNERDx
#word vector match http://ideone.com/2NUwvu
#codingexercise
Find the maximum length of the subsequence of string X which is a substring of another string Y
int GetMax(String X, String Y)
{
int max = 0;
for (int i = 0; i < Y.Length; i++)
{
var single = Y.SubString(i,1);
if (single.IsSubsequence(X) && single.Length > max)
max = single.Length;
for (int j = i+1; j < Y.Length; j++)
{
var sub = Y.SubString(i, j-i+1);
if (sub.IsSubsequence(X) && sub.Length > max)
max = sub.Length;
}
}
return max;
}
http://ideone.com/RNERDx
No comments:
Post a Comment