Friday, August 25, 2023

 Sequences are excellent source of information that are usually not self-contained in the discrete units of an input stream such as words in a text, symbols in a language, or images in a video, yet they are under-utilized in many machine learning scenarios that have done so much in enhancing the information within the unit by means of features, coming up with various relative distance metrics or finding their relative co-occurrence similarities with classifications. This article explorer conventional and futuristic usages of sequences.

The inherent benefit of the sequence is that it is captured in the form of state that is independent of the units themselves. This powerful concept allows us to work with all kinds of input units be it words, symbols, images, or any other code. The conventional way to work with sequences belongs to a family of neural networks that is steeped in shredding data. It encodes the sequences and later decodes it to form a different output sequence. These recursive neural networks aka RNNs use this state as the essence of the sequence which is almost independent of the forms of the units comprising the sequence and infer the meaning of those units without knowing what they are. The original RNN proposed by Bahdanau et al in 2014 could be used with different kinds of decoder that resulted in different outputs but the sequences remained fixed in size and the state was accrued in a batch manner. In the future, if it could be possible to build one state in an aggregated manner that continuously evolved by leveraging growing size of the input stream from start to finish, that state is likely going to be a better representation of the overall import than ever before. The difference is in building sequences as records in table that are distinct from one another versus enriching the state in a streaming manner. The same state continually updates for each unit one at a time.

TensorFlow is a convenient library to write RNN. As with all machine learning models, at least 80% data is used for training and 20% used to test/predict. The model can be developed on high-performance computing servers and later exported to be used on low-resource-usage devices and clients. The model can be tuned with continuous feedback and its releases versioned.

Let us take an example of predicting the next word from a passage. This goal is particularly suited to the conventional RNNs because a sequence of three words at a time and one labeled symbol  will make the neural network predict the next symbol correctly. The model can only understand real numbers so a way to convert a symbol to a number is to assign a unique integer to each symbol based on the frequency of occurrence. The frequency table and a reverse dictionary help to articulate the next symbol.

As with any Softmax classifier used with neural networks, each symbol is associated with a vector of probabilities. The highest probability encountered can then be used towards finding the index in the reverse dictionary for determining the prediction.

Using TensorFlow, this is written as:

def RNN(x, weights, biases):

       x = tf.shape(x, [-1, n_input])

       x = tf.split(x, n_input, 1)

       rnn_cell = rnn.BasicLSTMCell(n_hidden)

       outputs, states = rnn.static_rnn(rnn_cell, x, dtype=tf.float32)

    return tf.matmul(outputs[-1], weights[‘out’]) + biases[‘out’]

 

The streaming form of RNN would use a summation form to continuously update the state.

No comments:

Post a Comment