A conjecture about Artificial Intelligence:
ChatGPT is increasingly in the news for its ability to mimic
a human conversationalist and for being versatile. It introduces a significant
improvement over the sequential encoding with the GPT-3 family of
parallelizable learning. This article wonders if the state considered to be in
summation form so that as the text continues to be encoded, the overall state
is continuously accumulated in a streaming manner. But first a general
introduction to the subject.
ChatGPT is based on a type of neural network called a transformer.
These are models that can translate text, write poems and op-ed and even
generate code. Newer natural language processing (NLP) models like BERT, GPT-3
or T5 are all based on transformers. Transformers are incredibly impactful as
compared to their predecessors that were also based on neural networks. Just to
recap, neural networks are models for analyzing complicated data by discovering
hidden layers that represent the lateny semantics in a given data and often
referred to as the hidden layer between an input layer of data and an output
layer of encodings. Neural networks can handle a variety of data including
images, video, audio and text. There are different types of neural networks
optimized for different type of data. If we are analyzing images, we would
typically use a convolutional neural network so called because it often begins
with embedding the original data into a common shared space before it undergoes
synthesis, training and testing. The embedding space is constructed from an
initial collection of representative data. For example, it could refer to a
collection of 3D shapes drawn from real world images. The space organizes
latent objects between images and shapes which are complete representation of
objects. The use of CNN is a technique that focuses on the salient invariant
embedded objects rather than the noise.
CNNs worked great for detecting objects in images but did not do as well for languages or tasks such as summarizing text or generating text. The Recurrent Neural Network was introduced to adjust this for language tasks such as translation where an RNN would take a sentence from the source language and translate it to a destination language one word at a time and then sequentially generate the translations. Sequential is important because the words order matter for the meaning of a sentence as in “The dog chased the cat” versus “The cat chased the dog”. RNNs had a few problems. They could not handle large sequences of text, like long paragraphs. They were also not fast enough to handle large data because they were sequential. The ability to train on large data is considered a competitive advantage when it comes to NLP tasks because the models become more tuned.
(to be continued...)