Saturday, January 2, 2021

How to perform text summarization with sequence-to-sequence RNNs

Recurrent Neural Networks (RNNs) are a special kind of neural network that work with sequences rather than symbols that constitute the sequence. In fact, this technique does not need to know what the parts of the sequence represent whether they are words or video frames. It can infer the meaning of those symbols. When raw data is shredded into sequences, the RNN keeps a state information per sequence that it infers from that sequence. This state is the essence of the sequence. Using this state, the RNN can simply translate input sequences (text) to output sequences (summary). It can also be used to interpret the input sequence to generate an output sequence (like a chatbot). The RNN encoder-decoder model was proposed by Bahdanau et al in 2014 and it can be used to write any kind of decoder that generates custom output. Text summarization merely restricts the scope of this approach to machine translation with its use of the decoder. 

There are a few differences between machine translation and sequence-to-sequence RNNs.  Summarization is a lossy conversion where only the key concepts are retained. Machine translation is a lossless translation with no restriction to size. Summarization restricts the size of the output regardless of the size of the input. Rush et al, in 2015, proposed a convolutional model that encodes the source and uses a context sensitive additional feed-forward neural network to generate the summary.   

The annotated Gigaword corpus has been popular in training the models used in both 2014 and 2015. Mikolov’s 2013 word2vec model makes use of a different dataset for creating a word-embeddings matrix but this 100-features as dimensions word-embeddings can still be further updated by training it on the Gigaword corpus. This has been the approach by Nallapati et al in 2016 for a similar task with the deviation being that the input size is not restricted to 1 or 2 sentences from each sample. The RNN itself uses a 200-dimension hidden state with the encoder being bidirectional and the decoder being uni-directional. The vocabularies for the source and target sequences can be kept separate although the words from the source along with some frequent words may re-appear in the target vocabulary. Using the words from the source cuts down on the number of epochs considerably. 

The summary size is usually set to about 30 words at maximum while the input size may vary. The encoder itself can be hierarchical with a second bi-directional RNN layer running at the sentence level. The use of pseudo-words and sentences as sequences are left outside the scope of this article. 

The sequence length for this model is recommended to be in the 10~20 range and for that purposes and the timesteps are per word, so it is best to sample them from a few sentences. 

 

 

Friday, January 1, 2021

 Introduction: TensorFlow is a machine learning framework for JavaScript applications. It helps us build models that can be directly used in the browser or in the node.js server. We use this framework for building an application that can find similar requests so that they might be used for prediction. 

Description: The JavaScript application uses data from a CSV that has categorizations of requests and the resolution time. The attributes of each request include a category_id, customer id, a pseudo parameter attribute, and the execution time. The data used in this sample has 1200 records but the attributes are minimum to keep the application simple. 

As with any ML learning example, the data is split into 70% training set and 30% test set. There is no order to the data and the split is taken over a random set.  

The model chosen is a Recurrent Neural Network model. This is used for finding groups via paths in sequences. A Sequence Clustering algorithm is like a clustering algorithm mentioned above but instead of finding groups based on similar attributes, it finds groups based on similar paths in a sequence.  A sequence is a series of events. For example, a series of web clicks by a user is a sequence. It can be also be compared to the IDs of any sortable data maintained in a separate table. Usually, there is support for a sequence column. The sequence data has a nested table that contains a sequence ID which can be any sortable data type. 

This is very useful to find sequences of service requests opened across customers. Generally, a network failure could result in a database connection failure which could lead to an application failure. This sort of sequence determination in a data-driven manner helps find new sequences and target them actively even suggesting the same to the customers who open the request so that they can be better prepared. 

Recurrent Neural Networks (RNNs) are a special kind of neural network that works with sequences rather than symbols that constitute the sequence. In fact, this technique does not need to know what the parts of the sequence represent whether they are words or video frames. It can infer the meaning of those symbols. When raw data is shredded into sequences, the RNN keeps state information per sequence that it infers from that sequence. This state is the essence of the sequence. Using this state, the RNN can simply translate input sequences (text) to output sequences (summary). It can also be used to interpret the input sequence to generate an output sequence (like a chatbot). The RNN encoder-decoder model was proposed by Bahdanau et al in 2014 and it can be used to write any kind of decoder that generates custom output.  

TensorFlow makes it easy to construct this model using an API. It can only present the output after the model is executed. In this case, the model must be run before the weights are available.  The output of each layer can be printed using the summary () method.  

With the model and training/test sets defined, it is now as easy to evaluate the model and run the inference.  The model can also be saved and restored. It is executed faster when there is GPU added to the computing. 

The features are available with the feature_extractor. It is evaluated on the training set using model.compile() and model.fit(). The model can then be called on a test input. Additionally, if a specific layer was to be evaluated, we can call just that layer on the test input. 

When the model is tested, it predicts the resolution time for the given attributes of category_id and parameter attribute 

Conclusion: Tensorflow.js is becoming a standard for implementing machine learning models. Its usage is simple, but the choice of model and the preparation of data takes significantly more time than setting it up, evaluating, and using it. 
https://1drv.ms/w/s!Ashlm-Nw-wnWw1gSFq5VLqlNswb5?e=dyLu7I 

 

Thursday, December 31, 2020

Applying Naïve Bayes data mining technique for IT service request

Naïve Bayes algorithm is a statistical probability-based data mining algorithm and is considered somewhat easier to understand and visualize as compared to others in its family.

The probability is a mere fraction of interesting cases to total cases. Bayes probability is a conditional probability that adjusts the probability based on the premise. If the premise is to take a factor into account, we get one conditional probability and if we don’t take the factor into account, we get another probability. Naïve Bayes builds on conditional states across attributes and are easy to visualize. This allows experts to show the reasoning process and it allows users to judge the quality of prediction. All these algorithms need training data in our use case, but Naïve Bayes uses it for explorations and predictions based on earlier requests such as to determine whether the self-help was useful or not – evaluating both probabilities conditionally.

This is widely used for cases where conditions apply, especially binary conditions such as with or without. If the input variables are independent, if their states can be calculated as probabilities, and if there is at least a predictable output, this algorithm can be applied. The simplicity of computing states by counting for a class using each input variable and then displaying those states against those variables for a given value makes this algorithm easy to visualize, debug and use as a predictor.

The conditional probability can be used both for exploration as well as for prediction. Each input column in the dataset has a state calculated by this algorithm which is then used to assign a state to the predictable column. For example, the availability of a Knowledge Base article might show a distribution of input values significantly different from others which indicates that this is a potential predictor.

The viewer also provides values for the distribution so that KB articles that suggest opening service requests with specific attributes will be easier to follow, act upon, and get resolution. The algorithm can then compute a probability both with and without that criteria.

All that the algorithm requires is a single key column, input columns, independent variables, and at least one predictable column. A sample query for viewing the information maintained by the algorithm for a particular attribute would look like this:

SELECT NODE_TYPE, NODE_CAPTION, NODE_PROBABILITY, NODE_SUPPORT, NODE_SCORE FROM NaiveBayes.CONTENT WHERE ATTRIBUTE_NAME = 'KB_PRESENT';

The use of Bayesian conditional probability is not restricted just to this classifier. It can be used in Association data mining as well.

Implementation:

https://jsfiddle.net/za52wjkv/

Wednesday, December 30, 2020

Building a k-nearest neighbors tensorflow.js application:

 Introduction: TensorFlow is a machine learning framework for JavaScript applications. It helps us build models that can be directly used in the browser or in the node.js server. We use this framework for building an application that can find similar requests so that they might be used for prediction. 

Description: The JavaScript application uses data from a CSV that has categorizations of requests and the resolution time. The attributes of each request include a category_id, a pseudo parameter attribute, and the execution time. The data used in this sample has 1200 records but the attributes are minimum to keep the application simple. 

As with any ML learning example, the data is split into 70% training set and 30% test set. There is no order to the data and the split is taken over a random set.  

The model chosen is a KNN model. This model is appropriate for finding the k nearest neighbors to those it was previously shown. The default number of neighbors is 3. This model is suitable for one input and one output and where the tensors are distinct and not affecting each other. The output consists of a label with the most confidence which is a statistical parameter based on the support for the label, a class index, and a score set for the confidence associated with each label. 

TensorFlow makes it easy to construct this model using an API. It can only present the output after the model is executed. In this case, the model must be run before the weights are available.  The output of each layer can be printed using the summary () method.  

With the model and training/test sets defined, it is now as easy to evaluate the model and run the inference.  The model can also be saved and restored. It is executed faster when there is GPU added to the computing. 

The features are available with the feature_extractor. It is evaluated on the training set using model.compile() and model.fit(). The model can then be called on a test input. Additionally, if a specific layer was to be evaluated, we can call just that layer on the test input. 

When the model is trained, it can be done in batches of predefined size. The number of passes of the entire training dataset called epochs can also be set upfront. It is helpful to visualize the training with the help of a high chart that updates the chart with the loss after each epoch 

When the model is tested, it predicts the resolution time for the given attributes of category_id and parameter attribute 

Conclusion: Tensorflow.js is becoming a standard for implementing machine learning models. Its usage is fairly simple but the choice of model and the preparation of data takes significantly more time than setting it up, evaluating, and using it.