Cluster computing

Friday, July 14, 2017

We discussed the Similarity measure between skills vector from a resume and a role to be matched. We could also consider using an ontology of skills for measuring similarity. For example, we can list all the skills a software engineer must have and connect the skills that have some degree of similarity using domain knowledge and human interpretation or from a weighted skills collocation matrix as resolved from a variety of resumes in a training set. With the help of this skills graph, we can now determine similarity as a measure of distance between vertices. This enables translation of skills into semantics based similarity.
The collocation based weights matrix we had come up with so far also can be represented as a graph which we can use for page rank to determine the most important features.
This concludes the text analysis as a service discussion and we now look into the store discussions for text content. In this regard, we briefly mentioned content libraries such as Sharepoint but we are going to discuss their cloud based versions. systems design for cloud based text analysis as a service can make use of such document libraries as an alternative to using S3. We discussed cloud native technologies. Let us now take a look at cloud versions of document libraries.
Sharepoint as an implementation of Content Databases. OneDrive is also a document library. In fact, this is one of the earliest file hosting service which is operated by Microsoft. Every user gets a quota which can be enhanced with subscriptions. The service was initially named SkyDrive and was made available in many countries. Later, Photos and videos were allowed to be stored on SkyDrive via Windows Live Photos, which allowed users to access their photos and videos stored on SkyDrive. It was therafter expanded to include Office Live Workspace. Files and folders became accessible to Windows Live Users and Groups which made sharing and file management easier. Subsequently SkyDrive began to be used with AppStore and Windows Phone Store via the applications released. APIs are also available for OneDrive.
#codingexercise
we discussed the methods of finding the length of the longest subsequence of one string as a substring of another.
Let us compare the performance:
1) Iterative approach brute force O(N^2) works well when substring is small and subsequence is large
2) dynamic programming based on increasing finds is also O(N^2) but it is more efficient because it is supposed to reuse overlapping subproblems. But the dp solution is based for 0 to current index substring.

Cluster computing

Friday, July 14, 2017

No comments:

Post a Comment