Wednesday, February 10, 2021

How to write a chatbot?


Problem statement: Many web sites provide a chatbot experience to their users where they can ask questions and receive answers relevant to their business. A chat bot can also be hosted on a mobile application so that it responds only to the mobile device owner. In such a case, the chatbot can be trained to be a translator, movie-based responder, a nurse, a sentiment analyzer or a humor bot. The implementation for a chatbot remains the same across these usages except that they are trained on different dataset. 

Solution: Writing a chatbot starts with a deep learning model. This model is easier to build on some well-known machine learning platforms. The model must be trained on the relevant dataset. It must also be tuned to serve satisfying responses. If the model evaluation is a black box, it will not perform well. That is why this article describes how to build and train such a model from scratch. 

A chat bot is well-served with a sequence-to-sequence model. More information about this type of model can be found in the documents listed under the reference section but at a high level, they work with sequences rather than symbols that constitute the sequence. Therefore, it does not need to know what the parts of the sequence represent and whether they are words or video frames. It can even infer the meaning of those symbols. When raw data is shredded into sequences, this model keeps a state information per sequence that it infers from that sequence. This state is the essence of the sequence. Using this state, this model can translate or interpret input sequences (text) to output sequences (responses). One of the popular sequence-to-sequence models is the Recurrent Neural Network or RNN for short. The RNN encoder-decoder model was proposed by Bahdanau et al in 2014 and it can be used to write any kind of decoder that generates custom output which makes it suitable for a wide variety of usages. 

This model is built on the following premise. It dissects the text into timesteps and encodes internal state pertaining to those timesteps. The context is learnt from the sequence which is the semantic content and basis for any follow-up. Neurons help remember information and expose just enough to build a context.  A sequence database helps stash the slices of ordered elements as sequences from the sentences. Given a threshold for support – a metric, the model finds the complete set of frequent subsequences. If the addition of an element to a sequence will not make it frequent, then none of its super-sequences will be frequent.  With the help of such a sequence, it can be followed up with an interpretation using corresponding sequence generation. The state is decoded, and a new output sequence is generated. This forms the response of the chatbot. 

Code for this sequence-to-sequence analysis is available with this one and machine learning frontends such as TensorFlow makes it easy to load a saved model and use from any client while the Keras backend on a Colab like environment can help train the model independently and save it for future use. 

No comments:

Post a Comment