Multi-dimensional optimization:
Introduction: This is a continuation of the previous article
on the dynamic walk for optimization. Here we try extending the same
optimization based on the stochastic optimization of more than two random
variables represented by higher dimension form of equations involving a greater
number of random variables. Instead of pair-wise treatment of random variables,
we now study optimization involving a vector space also called a Hilbert space.
Description:
The equation representing pair of random variables is always
in a quadratic form.
Where
A is the matrix, x and b are vectors and c is a scalar constant.
The
function is minimized by the solution to A x = b
The solution to the optimization involving these two pairs
of random variables is represented by Ax = b which is a linear function.
When we have several random variables, then each random
variable contributes a dimension and the simple contour map changes to
n-dimensional Euclidean space.
Just as the shortest distance between a point to a line is a
perpendicular, and the shortest distance from a point to a plane is an
orthogonal, the shortest distance between a point and a subspace must also be
orthogonal to the subspace. This is the basis for an optimization principle
involving an n-dimensional Euclidean space called the projection theorem. This
theorem might not be applicable to a normed space but it is applicable to a
Hilbert space. A normed linear space is a vector space having a measure of
distance or length defined on it. A
Hilbert space is a special form of normed space having an inner product defined
which is analogous to the dot product of the two vectors in analytical
geometry. Those vectors are orthogonal if their dot product is zero. With the
help of the orthogonality to determine the minimum, it is now possible to find
the optimum as a minimization problem. The least-squares minimization is a
technique that describes this minimization in Hilbert space.
The least squares regression involves the estimation of data
at each point which gives a set of equations. If the data had no noise, the
resulting set of equations would be written in the form of a matrix equation.
Solving a matrix equation is well-known. Since the data may not fit a matrix
equation perfectly, the least squares regression transforms the data point to an
estimation that is as close to a matrix equation as possible. An estimation
function can go through all the data points to determine if there will be a
solution. The least squares regression is written as Beta =
inverse-of(A-Transpose.A).A-Transpose.Y
An example of a least squares sample program in Python would
look like this:
import numpy as np
from scipy import optimize
import matplotlib.pyplot as plt
x = np.linspace(0, 1, 101)
y = 1 + x + x * np.random.random(len(x))
A = np.vstack([x, np.ones(len(x))]).T
y = y[:, np.newaxis]
alpha = np.dot((np.dot(np.linalg.inv(np.dot(A.T, A)), A.T)),
y)
print(alpha)
plt.figure(figsize = (10, 8))
plt.plot(x, y, ‘b.’)
plt.plot(x, alpha[0] * x + alpha[1], ‘r’)
plt.show()
Layering of neural networks is a technique that applies the
same technique at a higher abstraction but it does not transform a problem from
one space to another.
No comments:
Post a Comment