Transformers changed that and in fact, were developed for the purposes of translation. Unlike RNNs, they could be parallelized. This meant that transformers could be used to train on large data sets. GPT-3 that writes poetry and code and writes conversations was trained on almost 45 Terabytes of text data and including the entire world wide web. It scales really well with a huge data set.
Transformers work very well because of three components: 1.
Positional Encoding, 2. Attention and 3. Self-Attention. Positional encoding is about enhancing the
data with positional information rather than encoding it in the structure of
the network. As we train the network on lots of text data, the transformers
learn to interpret those positional encodings. It really helped transformers
easier to train than RNN. Attention refers to a concept that originated from
the paper aptly titled “Attention is all you need”. It is a structure that
allows a text model to look at every single word in the original sentence when making a
decision to translate the word in the output. A heat map for attention helps
with understanding the word and its grammar. While attention is for
understanding the alignment of words, self-attention is for understanding the
underlying meaning of a word so as to disambiguate it from other usages. This
often involves an internal representation of the word also referred to as its
state. When attention is directed towards the input text, there can be
differences understood between say “server, can I have the check” and the “I
crashed the server” to interpret the references to a human versus a machine
server. The context of the surrounding words helps with this state.
BERT, an NLP model make use of attention and can be used for
a variety of purposes such as text summarization, question answering,
classification and finding similar sentences. BERT also helps with Google search and Google cloud AutoML
language. Google has made BERT available for download via TensorFlow library
while Hugging Face company has made Transformers available in Python language.
No comments:
Post a Comment