1.
Speech recognition - This is used to convert
voice to text. It requires an analog to digital converter. The conversion can
be visualized in a graph known as a spectrogram. A sound wave is captured and the
amplitude over time plot is drawn in units of decibels. The wave is then converted
to unit-second slices and stored in quantitative form. Frequency, intensity,
and time are also required to plot the spectrogram which converts the audio to frames
that are 20 to 40 milliseconds. Phenomes are used to disambiguate the sound. Variations
in accents are measured as allophones.
Hidden Markov Model, popular with many neural
network models, is also used in speech recognition. It comprises several
layers. The first layer assigns the probabilities that the phenomes detected is
the correct one. The second layer checks the co-occurrence of phenomes and assigns
the probabilities. The third layer
checks that just like the second layer but at the word level. It assigns probabilities
for their co-occurrence.
The model checks and rechecks all the
probabilities to come up with the most likely text that was spoken.
The advantage of using neural networks is that
they learn by training on data and are flexible and can change over time. Neural networks keep improving when it knows
the desired state and the actual state and corrects the error. It grasps a variety
of phenomes. It can detect the uniqueness of sounds originating from accents
and emotions and the Hidden Markov Model which improves the neural network. When
the output variables are arranged in a sequence or a linear chain, we get a
sequence model. This is the approach taken by a hidden Markov Model. An HMM
models a sequence of observations by assuming that there is a sequence of
states. Each state depends only on the previous state. And it is independent of
all its ancestors. An HMM assumes that each observation variable depends only
on the current state. This model is therefore specified by three probability
distributions: the distribution p(y) over initial states, the transition
distribution from one state to another, and finally the observation distribution
of the occurrence with respect to the state.
When we include interdependence between
features, we use a generative model. This is usually done in one of two ways -
enhance the model to represent dependencies among the inputs or make
simplifying independence assumptions. The first approach is difficult because
we must maintain tractability. The
second approach can hurt performance. The difference in their behaviors is
large since one is generative and the other is discriminative.
Generative means it is based on the model of the joint distribution of a state regarding observation. Discriminative means it is
based on the model of the conditional distribution of a state given the
observation. We can also form a model based on generative-discriminative pairs.
2.
Image recognition – Some image recognition
algorithms make use of pattern recognition. Pattern recognition refers to the
classification or description of objects or patterns. The patterns themselves
can range from characters in an image of printed text to biological waveforms.
The recognition involves identifying the patterns and assigning labels for
categories. We start with a set of training patterns. The main difference
between pattern recognition and cluster analysis is the role of pattern class
labels. In pattern recognition, we use the labels to formulate decision rules.
In cluster analysis, we use it to verify the results. Pattern recognition
requires extrinsic information. In cluster analysis, we use only the data.
There are two basic paradigms to classify a
pattern into one of K different classes. The first is a geometric or
statistical approach. In this approach,
a pattern is represented in terms of d features and the pattern features are as
independent of one another as possible. Then given training patterns for each
pattern class, the objective is to separate the patterns belonging to different
classes.
In statistical pattern recognition, the
features are assumed to have a probability density function that is conditioned
on the pattern class. A pattern vector x belonging to a class wj is a data
point drawn from the conditional probability distribution P(x/wj) where j is
one of the K different classes. Concepts from statistical decision theory and
discriminant analysis are utilized to establish decision boundaries between the
pattern classes. If the class conditional densities are known, then Bayes
decision theory gives optimal decision rule. Since they are generally not
known, a classifier is used based on the nature of the information available. A
classifier requires supervised or unsupervised learning in supervised
learning, in the form of class conditional densities is known, we use a
parametric or non-parametric decision rules. In unsupervised learning, the
density functions are estimated from training samples. Here the labels one each training pattern
represents the category to which the pattern belongs. The categories may be
known beforehand, or they may be unknown.
When the number of pattern classes is unknown, it tries to find natural groupings in the data.
No comments:
Post a Comment