Sunday, May 2, 2021

Special purpose algorithms


1.       Speech recognition - This is used to convert voice to text. It requires an analog to digital converter. The conversion can be visualized in a graph known as a spectrogram. A sound wave is captured and the amplitude over time plot is drawn in units of decibels. The wave is then converted to unit-second slices and stored in quantitative form. Frequency, intensity, and time are also required to plot the spectrogram which converts the audio to frames that are 20 to 40 milliseconds. Phenomes are used to disambiguate the sound. Variations in accents are measured as allophones.

Hidden Markov Model, popular with many neural network models, is also used in speech recognition. It comprises several layers. The first layer assigns the probabilities that the phenomes detected is the correct one. The second layer checks the co-occurrence of phenomes and assigns the probabilities.  The third layer checks that just like the second layer but at the word level. It assigns probabilities for their co-occurrence.

The model checks and rechecks all the probabilities to come up with the most likely text that was spoken.

The advantage of using neural networks is that they learn by training on data and are flexible and can change over time.  Neural networks keep improving when it knows the desired state and the actual state and corrects the error. It grasps a variety of phenomes. It can detect the uniqueness of sounds originating from accents and emotions and the Hidden Markov Model which improves the neural network. When the output variables are arranged in a sequence or a linear chain, we get a sequence model. This is the approach taken by a hidden Markov Model. An HMM models a sequence of observations by assuming that there is a sequence of states. Each state depends only on the previous state. And it is independent of all its ancestors. An HMM assumes that each observation variable depends only on the current state. This model is therefore specified by three probability distributions: the distribution p(y) over initial states, the transition distribution from one state to another, and finally the observation distribution of the occurrence with respect to the state.

When we include interdependence between features, we use a generative model. This is usually done in one of two ways - enhance the model to represent dependencies among the inputs or make simplifying independence assumptions. The first approach is difficult because we must maintain tractability.  The second approach can hurt performance. The difference in their behaviors is large since one is generative and the other is discriminative.

Generative means it is based on the model of the joint distribution of a state regarding observation. Discriminative means it is based on the model of the conditional distribution of a state given the observation. We can also form a model based on generative-discriminative pairs.

2.       Image recognition – Some image recognition algorithms make use of pattern recognition. Pattern recognition refers to the classification or description of objects or patterns. The patterns themselves can range from characters in an image of printed text to biological waveforms. The recognition involves identifying the patterns and assigning labels for categories. We start with a set of training patterns. The main difference between pattern recognition and cluster analysis is the role of pattern class labels. In pattern recognition, we use the labels to formulate decision rules. In cluster analysis, we use it to verify the results. Pattern recognition requires extrinsic information. In cluster analysis, we use only the data.

There are two basic paradigms to classify a pattern into one of K different classes. The first is a geometric or statistical approach.  In this approach, a pattern is represented in terms of d features and the pattern features are as independent of one another as possible. Then given training patterns for each pattern class, the objective is to separate the patterns belonging to different classes.

In statistical pattern recognition, the features are assumed to have a probability density function that is conditioned on the pattern class. A pattern vector x belonging to a class wj is a data point drawn from the conditional probability distribution P(x/wj) where j is one of the K different classes. Concepts from statistical decision theory and discriminant analysis are utilized to establish decision boundaries between the pattern classes. If the class conditional densities are known, then Bayes decision theory gives optimal decision rule. Since they are generally not known, a classifier is used based on the nature of the information available. A classifier requires supervised or unsupervised learning in supervised learning, in the form of class conditional densities is known, we use a parametric or non-parametric decision rules. In unsupervised learning, the density functions are estimated from training samples.  Here the labels one each training pattern represents the category to which the pattern belongs. The categories may be known beforehand, or they may be unknown.

When the number of pattern classes is unknown, it tries to find natural groupings in the data. 

No comments:

Post a Comment