Continuous Encoder
BERT is an algorithm for natural language processing that
interprets search queries as almost humans do because it tries to understand
the context of the words that constitute the query so results match better than
without it. It was proposed by Google and stands for Bidirectional Encoder
Representations from Transformers. To understand BERT, we must first understand
the meaning of the terms Encoder and Bidirectional. These terms come from the
machine learning neural network techniques where the term encoding and decoding
refer to states between words in a sequence. A short introduction to neural
networks is that it comprises of layers of sensors that calculate probabilities
of the inputs, in this case these are words, with weighted probabilities across
a chosen set of other inputs and are also called features. Each feature gets a
set of weights as probabilities in terms of how likely it is to appear together
with other words chosen as features. A bag of words from the text is run
through the neural network and gets transformed into a set of output that
resemble some form of word associations with other words but, in this process,
it computes the weighted matrix of words with its features which are called
embeddings. These embeddings are immensely useful because they represent words
and their context in terms of the features that frequently co-occur with these
words bringing out the latent meanings of the words. With this additional
information on the words from their embeddings, it is possible to find how similar
two words are or what topics the keywords are representing especially when a
word may have multiple meanings.
In the above example, the transformation was forward only
with associations between the left to the right context for a layer, but the
calculations performed in one layer can jointly utilize the learnings from both
sides. This is called bidirectional transformation and since a neural network
can have multiple layers with the output of one layer performing as input to
another layer, this algorithm can perform the bidirectional transformations for
all layers. When the input is not just words but a set of words such as from a
sentence, it is called a sequence. Search terms form a sequence. BERT can
unambiguously represent a sentence or a pair of sentences in the
question/answer form. The state between the constituents of a sequence is
encoded in some form that helps to interpret the sequence or to generate a
response sequence with the help of decodings. This relationship that is
captured between an input and output sequence in the form of encodings and
decodings helps to enhance the language modeling and improve the search
results.
Natural language processing relies on encoding-decoding to
capture and replay state from text. This
state is discrete and changes from one set of tokenized input texts to another.
As the text is transformed into vectors of predefined feature length, it
becomes available to undergo regression and classification. The state
representation remains immutable and decoded to generate new text. Instead, if
the encoded state could be accumulated with the subsequent text, it is likely
that it will bring out the topic of the text if the state accumulation is
progressive. A progress indicator could be the mutual information value of the
resulting state. If there is information gain, the state can continue to
aggregate, and this can be stored in memory. Otherwise, the pairing state can
be discarded. This results in a final state aggregation that continues to be
more inclusive of the topic in the text.
State aggregation is independent of BERT but not removed
from it. It is optional and useful towards topic detection. It can also improve
the precision and relevance of the text generated in response by ensuring that
their F-score remains high when compared to the aggregated state. Without the
aggregated state, the scores for the response was harder to evaluate.
No comments:
Post a Comment