Wednesday, December 27, 2023


Transformers work very well because of three components: 1. Positional Encoding, 2. Attention and 3. Self-Attention.  Positional encoding is about enhancing the data with positional information rather than encoding it in the structure of the network. As we train the network on lots of text data, the transformers learn to interpret those positional encodings. It really helped transformers easier to train than RNN. Attention refers to a concept that originated from the paper aptly titled “Attention is all you need”. It is a structure that allows a text model to look at every single word  in the original sentence when deciding to translate the word in the output. A heat map for attention helps with understanding the word and its grammar. While attention is for understanding the alignment of words, self-attention is for understanding the underlying meaning of a word to disambiguate it from other usages. This often involves an internal representation of the word also referred to as its state. When attention is directed towards the input text, there can be differences understood between say “server, can I have the check” and the “I crashed the server” to interpret the references to a human versus a machine server. The context of the surrounding words helps with this state.

BERT, an NLP model, make use of attention and can be used for a variety of purposes such as text summarization, question answering, classification and finding similar sentences. BERT also helps with  Google search and Google cloud AutoML language. Google has made BERT available for download via TensorFlow library while Hugging Face company has made Transformers available in Python language.

A recent study on Copilot by Gartner found that the most successful pilots focus on demonstrating business potential, not on technical feasibility. The difference between the two is the realization of the transformative potential of this technology. Since the technology is still broad and emerging, IT leaders find it hard to prioritize generative AI use cases. Mature AI partners involve business partners and software engineers as key members of their AI projects. Generative AI allows for faster development cycle than traditional AI projects. As always but more so from shorter development cycles, success is realized via rapid testing, refinement, and the elimination of low priority and severity use cases.

No comments:

Post a Comment