Sequences are excellent source of information that are usually not self-contained in the discrete units of an input stream such as words in a text, symbols in a language, or images in a video, yet they are under-utilized in many machine learning scenarios that have done so much in enhancing the information within the unit by means of features, coming up with various relative distance metrics or finding their relative co-occurrence similarities with classifications. This article explorer conventional and futuristic usages of sequences.
The inherent benefit of the sequence is that it is captured
in the form of state that is independent of the units themselves. This powerful
concept allows us to work with all kinds of input units be it words, symbols,
images, or any other code. The conventional way to work with sequences belongs
to a family of neural networks that is steeped in shredding data. It encodes
the sequences and later decodes it to form a different output sequence. These
recursive neural networks aka RNNs use this state as the essence of the
sequence which is almost independent of the forms of the units comprising the
sequence and infer the meaning of those units without knowing what they are.
The original RNN proposed by Bahdanau et al in 2014 could be used with
different kinds of decoder that resulted in different outputs but the sequences
remained fixed in size and the state was accrued in a batch manner. In the
future, if it could be possible to build one state in an aggregated manner that
continuously evolved by leveraging growing size of the input stream from start
to finish, that state is likely going to be a better representation of the
overall import than ever before. The difference is in building sequences as
records in table that are distinct from one another versus enriching the state
in a streaming manner. The same state continually updates for each unit one at
a time.
TensorFlow is a convenient library to write RNN. As with all
machine learning models, at least 80% data is used for training and 20% used to
test/predict. The model can be developed on high-performance computing servers
and later exported to be used on low-resource-usage devices and clients. The
model can be tuned with continuous feedback and its releases versioned.
Let us take an example of predicting the next word from a
passage. This goal is particularly suited to the conventional RNNs because a
sequence of three words at a time and one labeled symbol will make the neural network predict the next
symbol correctly. The model can only understand real numbers so a way to
convert a symbol to a number is to assign a unique integer to each symbol based
on the frequency of occurrence. The frequency table and a reverse dictionary help
to articulate the next symbol.
As with any Softmax classifier used with neural networks,
each symbol is associated with a vector of probabilities. The highest
probability encountered can then be used towards finding the index in the
reverse dictionary for determining the prediction.
Using TensorFlow, this is written
as:
def RNN(x, weights, biases):
x = tf.shape(x, [-1, n_input])
x = tf.split(x, n_input, 1)
rnn_cell = rnn.BasicLSTMCell(n_hidden)
outputs, states =
rnn.static_rnn(rnn_cell, x, dtype=tf.float32)
return tf.matmul(outputs[-1], weights[‘out’]) + biases[‘out’]
The streaming form of RNN would
use a summation form to continuously update the state.
No comments:
Post a Comment