Cluster computing: Providing smart self-service capabilities:

Sunday, July 23, 2023

Providing smart self-service capabilities:

The following are some use cases for building smart self-service portal for data science in a health-care organization.

Use case 1) Making recommendations.

Users often have a variety of choices when they come to request resources or infrastructure from public cloud. One way to help them would be to tell them what others like them have been leasing from the cloud provider. If we search a large group of people and smaller set with tastes like that of customers in terms of what they have been liking or borrowing, then we can make a ranked list of suggestions by finding the people who are similar and combining their choices to make a list.

Different people and their preferences can be represented as a nested dictionary. To find similarity among people, we could use some sort of similarity measures based on Euclidean distance or Pearson correlation. We can then score the items by producing a weighted score that ranks the participants.

Use case 2) Discovering groups.

Users can be pigeonholed by clustering and categorizing them based on their requests. By determining the frequency of a certain type of resource or infrastructure units in the requests made by them, we may be able to find those who are similar in their needs or have similar planning. Such a result could be very useful in searching, listing and discovering the themes in the vast number of ad hoc requests made over time.

We can employ hierarchical clustering or K-means clustering to categorize the requests and relate personas to this leasing pattern.

Use case 3) Searching and Ranking

Searching and Ranking form the core of analytics for a user who wants to filter through support documents and literature. Very often one will get a buzzword to search for and help oneself, but s/he might not know the site layout or the categories to walk down to filter it.

Such a user can be helped with a page ranking that gives a weighted score based on the fraction of in-references to out references. Other ways of arranging them or finding results based on clustering can also help.

Use case 4) Optimization.

Users may want the self-service capabilities to search for the lowest cost set of resources when there are many choices for their solution when there are many different solutions. Optimization finds the best solution to a problem by trying many different infrastructure solutions and workload combinations and scoring them to determine their best fit and optimum use. Right sizing the resources and determining the scope where the improvements or upgrades need to be made or the scale out of units to serve peak usage are some of the variations that can be optimized.

A cost function can be used to determine the costs for different combinations. Then hill climbing or simulated annealing can be used to reduce the cost towards the optimum.

Use case 5) Document Filtering

This use case is about training a classifier to classify documents that are published as resource usages and logs. A classifier for this purpose starts off very uncertain and increases in certainty as it learns which features are important for making a distinction. By classifying the documents, we expect to find resources that require attention either immediately or soon.

This will require a training set as well as a test set. The training set will be used to tune the classifier and the classifier will then be run on the test sample usually with a 80/20 split. If the features are independent of each other, they can be used in a naïve Bayes classifier.

Use Case 6) Determine users that might benefit from premium infrastructure and service levels.

To predict the likelihood that a user will invest more than others, we collect information and put it in say a table with the attributes that we pick from server logs, such as location, help articles read, pages viewed, and whether premium SKUs were chosen.

Then, decision tree classifier can be used to construct a tree from real data that is easy to understand where each node tells how important it is and how it will impact the outcome.

Use case 7) Building price models for premium service levels and their users.

A model for this purpose will predict a price. Initially, the prices will be manually computed in a training set but it will be used to predict once the model has learned from the training set and is applied to a testing set. By finding the items that are like the item that interests the customer, the algorithm can average their prices and make a guess at what the price should be for this item.

Cluster computing

Sunday, July 23, 2023

Providing smart self-service capabilities:

No comments:

Post a Comment