Monday, February 14, 2022

Continuous Encoder

BERT is an algorithm for natural language processing that interprets search queries as almost humans do because it tries to understand the context of the words that constitute the query so results match better than without it. It was proposed by Google and stands for Bidirectional Encoder Representations from Transformers. To understand BERT, we must first understand the meaning of the terms Encoder and Bidirectional. These terms come from the machine learning neural network techniques where the term encoding and decoding refer to states between words in a sequence. A short introduction to neural networks is that it comprises of layers of sensors that calculate probabilities of the inputs, in this case these are words, with weighted probabilities across a chosen set of other inputs and are also called features. Each feature gets a set of weights as probabilities in terms of how likely it is to appear together with other words chosen as features. A bag of words from the text is run through the neural network and gets transformed into a set of output that resemble some form of word associations with other words but, in this process, it computes the weighted matrix of words with its features which are called embeddings. These embeddings are immensely useful because they represent words and their context in terms of the features that frequently co-occur with these words bringing out the latent meanings of the words. With this additional information on the words from their embeddings, it is possible to find how similar two words are or what topics the keywords are representing especially when a word may have multiple meanings.  

In the above example, the transformation was forward only with associations between the left to the right context for a layer, but the calculations performed in one layer can jointly utilize the learnings from both sides. This is called bidirectional transformation and since a neural network can have multiple layers with the output of one layer performing as input to another layer, this algorithm can perform the bidirectional transformations for all layers. When the input is not just words but a set of words such as from a sentence, it is called a sequence. Search terms form a sequence. BERT can unambiguously represent a sentence or a pair of sentences in the question/answer form. The state between the constituents of a sequence is encoded in some form that helps to interpret the sequence or to generate a response sequence with the help of decodings. This relationship that is captured between an input and output sequence in the form of encodings and decodings helps to enhance the language modeling and improve the search results.

Natural language processing relies on encoding-decoding to capture and replay state from text.  This state is discrete and changes from one set of tokenized input texts to another. As the text is transformed into vectors of predefined feature length, it becomes available to undergo regression and classification. The state representation remains immutable and decoded to generate new text. Instead, if the encoded state could be accumulated with the subsequent text, it is likely that it will bring out the topic of the text if the state accumulation is progressive. A progress indicator could be the mutual information value of the resulting state. If there is information gain, the state can continue to aggregate, and this can be stored in memory. Otherwise, the pairing state can be discarded. This results in a final state aggregation that continues to be more inclusive of the topic in the text.

State aggregation is independent of BERT but not removed from it. It is optional and useful towards topic detection. It can also improve the precision and relevance of the text generated in response by ensuring that their F-score remains high when compared to the aggregated state. Without the aggregated state, the scores for the response was harder to evaluate.

No comments:

Post a Comment