Cluster computing: Previous article continued

Sunday, February 12, 2023

Previous article continued

Transformers changed that and in fact, were developed for the purposes of translation. Unlike RNNs, they could be parallelized. This meant that transformers could be used to train on large data sets. GPT-3 that writes poetry and code and writes conversations was trained on almost 45 Terabytes of text data and including the entire world wide web. It scales really well with a huge data set.

Transformers work very well because of three components: 1. Positional Encoding, 2. Attention and 3. Self-Attention. Positional encoding is about enhancing the data with positional information rather than encoding it in the structure of the network. As we train the network on lots of text data, the transformers learn to interpret those positional encodings. It really helped transformers easier to train than RNN. Attention refers to a concept that originated from the paper aptly titled “Attention is all you need”. It is a structure that allows a text model to look at every single word in the original sentence when making a decision to translate the word in the output. A heat map for attention helps with understanding the word and its grammar. While attention is for understanding the alignment of words, self-attention is for understanding the underlying meaning of a word so as to disambiguate it from other usages. This often involves an internal representation of the word also referred to as its state. When attention is directed towards the input text, there can be differences understood between say “server, can I have the check” and the “I crashed the server” to interpret the references to a human versus a machine server. The context of the surrounding words helps with this state.

BERT, an NLP model make use of attention and can be used for a variety of purposes such as text summarization, question answering, classification and finding similar sentences. BERT also helps with Google search and Google cloud AutoML language. Google has made BERT available for download via TensorFlow library while Hugging Face company has made Transformers available in Python language.

Cluster computing

Sunday, February 12, 2023

Previous article continued

No comments:

Post a Comment