Cluster computing

Saturday, April 3, 2021

Applications of Data Mining to Reward points collection service

Machine learning techniques form an altogether separate category of their own. The traditional data mining methods used clustering and statistics which are relevant to machine learning, but we did not include the neural networks with data mining, and we call it out with others in this category. Machine learning is very helpful to inform users about their activities that generate the most appreciation and the changing of these activities depending on the audience. It can also detect fraud in the employee appreciations which may be of interest to employers. For example, Feedzai uses real-time behavioral profiling as well as historical profiling that has been proven to detect 61% more fraud than earlier. Discovering groups, searching and ranking are a few more examples.

Regions of interest are used to determine space and time focus on appreciation activity. This is helpful to detect events that would have otherwise gone unnoticed as a flurry of activities on the reward points table. Together with the classifier and this regressor, the latent events can be detected thus eliminating the need to hold formal events and determine winners.

One of the aspects of using neural networks with employee appreciation data is that the management can gain insights that would not otherwise have been possible with formal interactions. By classifying reward points based on vector features and using softmax classification, the neural networks can detect the hidden appreciation. Each neuron assigns a weight usually based on probability for each feature and the weights are normalized across resulting in a weighted matrix that articulates the underlying model in the training dataset. Then it can be used with a test data set to predict the outcome probability. Neurons are organized in layers and each layer is independent of the other and can be stacked so they take the output of one as the input to the other This is a technique that has found applications in a variety of domains starting from natural language processing.

Neural networks can be applied in layers and they can be combined with regressors so the technique can be used for a variety of use cases. There are four different types of neural networks. The fully connected layer connects every neuron in one layer to every neuron in another layer. This is great for rigorous encoding, but it becomes expensive for large inputs and scalability. The convolutional layer is mostly used as a filter that brings out salient features from the input set. The filter sometimes called kernel is represented by a set of n-dimensional weights and describes the probabilities that a given pattern of input values represents a feature. A deconvolutional layer comes from a transposed convolutional process where the data is enhanced to increase resolution or to transform. A recurrent layer includes a looping capability such that its input consists of both the data to analyze as well as the output from a previous calculation performed by that layer. This is helpful to maintain state across iterations and for transforming one sequence to another.

The choice to apply machine learning techniques is dependent both on the applicability of the algorithm as well as the data.

Friday, April 2, 2021

Applications of Data Mining to Reward points collection service

Continuation of use cases:

Collaborative filtering can be applied via Item-based filtering as well. This is a different use case from the earlier cited for user-based filtering in that the item-based filtering avoids the divulging of users in the participant group and instead focuses on item similarity from a lookup table which makes it fast albeit storage expensive. In both cases, the similarity scores are computed but this approach allows us to answer the question whether the set of grants are like others which helps us rank them. This is useful for sparse data set which is typical for the matrix of appreciation across users.

Sequence clustering provides insights into the activities that generated the appreciation because it determines patterns across users and grants by finding paths in sequence. A sequence is a series of events such as a set of appreciations in the form of reward point grants. This kind of sequence analysis helps us understand the activities that were most popular for appreciation purposes between employees and target those actively on other forums. Sequence clustering is a data driven approach. It helps with determining sequences from existing appreciation activities.

Regions of interest are used to determine space and time focus on appreciation activity. This is helpful to detect events that would have otherwise gone unnoticed as a flurry of activities on the reward points table. Together with the classifier and this regressor, the latent event and awardees can be detected thus eliminating the need to hold formal events and determine winners.

Conclusion: There are several algorithms in data mining that are applicable to the Reward points repository.

Thursday, April 1, 2021

Applications of Data Mining to Reward points collection service

Continuation of use cases:

Collaborative filtering is another use case where the binary conditions apply. This is particularly useful when there are multiple participants in a group whose opinions determine the best grant of reward points. In the earlier approaches, the algorithms were articulating conditions. In this algorithm, we avoid the use of conditions and replace it with ratings. The participants in the group can be selected such that they form a diverse set or a cohesive set depending on the purpose. The calculation of grants based on existing reward points can be determined with the help of this opinion group and it helps to avoid many of the pitfalls with the logic associated with conditions. Some of these include the disclosure of rules, taking advantage of the rules, and circumventing them.

Hierarchical clustering is helpful when we want to cluster the reward points to match with the organizational hierarchy to give credit to the manager when their reporting employees do well. This is a standard practice in many companies. It may not be evident from the flat independent grants assigned to individuals that the reward points can be grouped based on the hierarchy to which the user belongs. Distance between members based on organizational hierarchy can also be used as a metric to determine the hierarchical clustering of reward point grants.

Conclusion: There are several algorithms in data mining that are applicable to the Reward points repository.

Wednesday, March 31, 2021

Applications of Data Mining to Reward points collection service

Continuation of use cases: Outliers can also be detected by data mining algorithms where the choices for similarity measures between rows could include distance functions such as Euclidean distance, Manhattan distance, graph-distance, and L2 metrics. The choices for aggregate dissimilarity measures are the distance of K nearest neighbors, the density of neighborhood outside the expected range, and the attribute differences with nearby neighbors. Outliers are important to discover new strategies to encompass them. If there are numerous outliers, they will significantly increase organizational costs. If they were not, then the patterns help identify efficiencies. A Decision Tree algorithm uses the attributes of the service requests to make a prediction such as the relief time on a case resolution. The ease of visualization of split at each level helps throw light on the importance of those attributes. This information becomes useful to prune the tree and to draw the tree. Logistic regression helps with the determination of user appreciations based on demographics and it can be used for finding repetitions in requests. Neural networks can be used with softmax classifier to classify appreciation terms in chat text on channels.

Naiive Bayes algorithm can be used for use-cases where binary conditions apply. The set of policies determined by organizations for reward point grants as employee appreciation is usually authored with the help of some conditions. These conditions can be maintained in the service. When these conditions pertain to attributes from the source where the reward points are published, then their probabilities become relevant to this algorithm especially when the input states are taken on a with or a without basis and when the input variables are independent. The simplicity of counting or summing reward points that meet the binary condition, together with the ease to visualize, debug and use as a predictor makes this algorithm quite popular. For example, the reward points can be counted based on whether the appreciation came from a specific person or otherwise.

Conclusion: There are several algorithms in data mining that are applicable to the Reward points repository.

Tuesday, March 30, 2021

Applications of Data Mining to Reward points collection service

Continuation of use cases: Some use cases that stand out from the others for reward points collection service include the following. Classification algorithms are useful for the categorization of reward point assignments based on source attributes. The primary use case is to see clusters of appreciation patterns that match based on attributes. By translating to a vector space and assessing the quality of a cluster with a sum of squares of errors, it is easy to analyze a large number of grants as belonging to specific clusters which can provide insight to management for group dynamics. Reward points grant for a user demonstrate elongated scatter plots in specific categories. Even when the grants come from varying contexts, the time to next appreciation can be plotted along the timeline. One of the best advantages of linear regression is the prediction of time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves like a good fit for all the data points. Customer segmentation based on reward points is a very common application of this algorithm. It helps prioritize the response to certain customers. Association data mining allows the management of an organization to see helpful patterns such as “employees who appreciated this user also appreciated this other user”. Sequence clustering can be used for patterns of appreciation from the same user. With these examples, it is possible for organizations to understand and appreciate what may be missing from mere performance evaluation from the organizational hierarchy.

Outliers can also be detected by data mining algorithms where the choices for similarity measures between rows could include distance functions such as Euclidean distance, Manhattan distance, graph-distance, and L2 metrics. The choices for aggregate dissimilarity measures are the distance of K nearest neighbors, the density of neighborhood outside the expected range, and the attribute differences with nearby neighbors. Outliers are important to discover new strategies to encompass them. If there are numerous outliers, they will significantly increase organizational costs. If they were not, then the patterns help identify efficiencies. A Decision Tree algorithm uses the attributes of the service requests to make a prediction such as the relief time on a case resolution. The ease of visualization of split at each level helps throw light on the importance of those attributes. This information becomes useful to prune the tree and to draw the tree. Logistic regression helps with the determination of user appreciations based on demographics and it can be used for finding repetitions in requests. Neural networks can be used with a softmax classifier to classify appreciation terms in chat text on channels.

Conclusion: The rewards point service is an investment for the organization where the costs and benefits are both improved by promoting organizational health, satisfaction, and productivity. The use of data mining algorithms on this data also empowers the management with better knowledge of group dynamics.

Monday, March 29, 2021

Applications of Data Mining to Reward points collection service

Problem statement: As with any data collected by a web service, analysis is not restricted to mere queries on the accumulated data. Deep learning techniques and data mining provide insights that can empower organizations beyond the employee appreciation for which the service is used. We review some of these uses cases in this article.

Solution: Data mining is a tried and tested method for gaining insights into relational data. The Reward points are collected with a relation between users and their accumulated peer appreciation. Standard mining techniques such as Clustering, sequence mining, decision tree, regression, segmentation, and association algorithms provide a lot of insights. When there are many choices between data mining algorithms that can be applied to a given dataset, it might require some exploration of the data.

If the use case was well articulated, the choice for the data mining algorithm becomes immediately clear. The use case becomes clear only when the data is well-known and the objective for the business purpose is known. Usually, only the latter is mentioned such as the prediction of an attribute associated with the data. For example, the dataset, suitable for supervised learning could have labels that are best determined with some exploration of training data. These techniques are required to determine the rules with which to assign labels to the raw data. If the rules were available for business purposes, then the assignment of labels is merely an automation task and helps prepare the training set for the data.

In the absence of business rules to assign labels to the data, the dataset for data mining is usually large and cannot be compared by mere inspection. Some visualization tools are necessary. In this regard, two algorithms stand out for making this task easier. First, the decision tree algorithm can be used to find the relationships between the rows, and the visualization in the form of attributes that are significant to the outcome can be established. The tree can be pruned to see which attributes matter and which do not matter. The split of the nodes on each level helps visualize the relative strength of those attributes across rows. This is very helpful when the tree is generated without supervision.

The other algorithm is the use of the Naive Bayes Classifier to assign data. This classifier is helpful to explore data, finding relationships between input columns and predictable columns, and then using the initial exploration to create additional algorithms. Since it compares across columns for a given row, it evaluates the binary probabilities for with and without that attribute in each column.

Together these attributes can help with the initial exploration of data to choose the right algorithm for a given purpose. Usually, the split between training data and test data for the purpose of prediction, is 70% for training data and 30% for test data.

Some use cases that stand out from the others for reward points collection service include the following. Classification algorithms are useful for the categorization of reward point assignments based on source attributes. The primary use case is to see clusters of appreciation patterns that match based on attributes. By translating to a vector space and assessing the quality of a cluster with a sum of squares of errors, it is easy to analyze a large number of grants as belonging to specific clusters which can provide insight to management for group dynamics. Reward points grant for a user demonstrate elongated scatter plots in specific categories. Even when the grants come from varying contexts, the time to next appreciation can be plotted along the timeline. One of the best advantages of linear regression is the prediction of time as an independent variable. When the data point has many factors contributing to their occurrence, a linear regression gives an immediate ability to predict where the next occurrence may happen. This is far easier to do than come with up a model that behaves like a good fit for all the data points. Customer segmentation based on reward points is a very common application of this algorithm. It helps prioritize the response to certain customers. Association data mining allows the management of an organization to see helpful patterns such as “employees who appreciated this user also appreciated this other user”. Sequence clustering can be used for patterns of appreciation from the same user. With these examples, it is possible for organizations to understand and appreciate what may be missing from mere performance evaluation from the organizational hierarchy.

Sunday, March 28, 2021

Hashes

A play on hash

The idea behind hashing is that an entity’s representation can be mapped to a value in a set with the nice side effect that those values have a uniform random distribution even when the entity’s representations do not. This allows us to count the number of distinct elements by finding prefixes of leading zeros in a binary representation of length sufficient to capture a number of possible values. For example, if k is the length of the longest sequence of leading zeros in a binary representation of length n, then Math.pow(10, k) represents the number of unique elements because on average k zeros will occur once every those many elements. We size the count of distinct elements with the help of the prefix. We ignore the bias and the outliers because they are known and can be corrected by a harmonic mean and constant correction factor respectively.

If we use m hash functions generating m hashes for the same entity’s representation and each map to a bit position starting from an all zero-bit array, then the presence of even a single zero at the hashing of an entity’s representation immediately discounts that entity as a member of the set.

If we could go a step further to find all bit positions where the hashes come out to be zero, then we can identify at least one unique hash in Math.pow(2,m) hashes, where the corresponding element is not part of the set by inverting those positions to 1 bit and keeping the others as 0 bit.