The following are some use cases for building smart self-service portal
for data science in a health-care organization.
Use case 1) Making recommendations.
Users often have a variety of choices when they come to request resources
or infrastructure from public cloud. One way to help them would be to tell them
what others like them have been leasing from the cloud provider. If we search a
large group of people and smaller set with tastes like that of customers in
terms of what they have been liking or borrowing, then we can make a ranked
list of suggestions by finding the people who are similar and combining their
choices to make a list.
Different people and their preferences can be represented as a nested
dictionary. To find similarity among people, we could use some sort of
similarity measures based on Euclidean distance or Pearson correlation. We can
then score the items by producing a weighted score that ranks the
participants.
Use case 2) Discovering groups.
Users can be pigeonholed by clustering and categorizing them based
on their requests. By determining the frequency of a certain type of
resource or infrastructure units in the requests made by them, we may be able
to find those who are similar in their needs or have similar planning. Such a
result could be very useful in searching, listing and discovering the themes in
the vast number of ad hoc requests made over time.
We can employ hierarchical clustering or K-means clustering to categorize
the requests and relate personas to this leasing pattern.
Use case 3) Searching and Ranking
Searching and Ranking form the core of analytics for a user who wants
to filter through support documents and literature. Very often one will
get a buzzword to search for and help oneself, but s/he might not
know the site layout or the categories to walk down to filter it.
Such a user can be helped with a page ranking that gives a
weighted score based on the fraction of in-references to out references. Other
ways of arranging them or finding results based on clustering can also
help.
Use case 4) Optimization.
Users may want the self-service capabilities to search
for the lowest cost set of resources when there are many choices for their
solution when there are many different solutions. Optimization finds the best
solution to a problem by trying many different infrastructure solutions and
workload combinations and scoring them to determine their best fit and optimum
use. Right sizing the resources and determining the scope where the
improvements or upgrades need to be made or the scale out of units to serve
peak usage are some of the variations that can be optimized.
A cost function can be used to determine the costs for
different combinations. Then hill climbing or simulated annealing can be used to
reduce the cost towards the optimum.
Use case 5) Document Filtering
This use case is about training a classifier to classify
documents that are published as resource usages and logs. A classifier for
this purpose starts off very uncertain and increases in certainty as it learns
which features are important for making a distinction.
By classifying the documents, we expect to find resources that require
attention either immediately or soon.
This will require a training set as well as a test set. The
training set will be used to tune the classifier and the classifier will then
be run on the test sample usually with a 80/20 split. If the features are
independent of each other, they can be used in a naïve Bayes
classifier.
Use Case 6) Determine users that might benefit from premium
infrastructure and service levels.
To predict the likelihood that a user will invest more than
others, we collect information and put it in say a table with the
attributes that we pick from server logs, such as location, help articles read,
pages viewed, and whether premium SKUs were chosen.
Then, decision tree classifier can be used to construct a
tree from real data that is easy to understand where each node tells how
important it is and how it will impact the outcome.
Use case 7) Building price models for premium service
levels and their users.
A model for this purpose will predict a price. Initially,
the prices will be manually computed in a training set but it will be used to
predict once the model has learned from the training set and is applied to a
testing set. By finding the items that are like the item that interests the
customer, the algorithm can average their prices and make a guess at what the
price should be for this item.
No comments:
Post a Comment