Today I want to review a chapter first from the programming collective intelligence book. In particular, let us look at a few numerical methods:
There are a number of numerical functions available:
1) Trigonometric functions like sine, cosine, and tangent.
2) Other mathematical functions like power, square root, and absolute value.
3) Statistical distributions, such as a Gaussian
4) Distance metrics, like Euclidean and Tanimoto distances
5) A three parameter function that returns -1 if the first parameter is between second and third
function (a, b, c){
if ( b <= a && a= < c) return -1;
return 0;
}
6) A three parameter function that returns 1 if the difference between the first two parameters is less than the third
function (a, b, c){
if ( Math.abs(a-b) < Math.abs(c)) return 1;
return 0;
}
Numerical methods are reactive. They are dependent on the input. This is the right approach for mathematical functions, however for efficiency programs have to store information for the next round.
One way to do this is to create tree nodes that can store and retrieve values from predefined slots. A store node has a single child and an index of a memory slot, it gets the result from its child and stores it in the memory slot and then passes this along to its parent. A recall node has no children and simply returns the value in the appropriate slot. If a store node is at the top of a tree, the final result is available to any part of the tree that has the appropriate recall node.
In addition to the individual memory, there can be shared memory set up where programs can share a set of slots that all have access to read and write. This might remind of parallel computing with higher levels of cooperation and competition.
Some other functions include :
Gini Impurity : This is a measure of how impure a set is. If you have a set of items such as [A,A, B,B,B,C,C] then Gini impurity tells you the probability that you would be wrong if you picked one item and randomly guessed its label. If the set were all A instead and your guess was A, then you would never be wrong and you would have a pure set.
Ig (i) = 1 - sum_j_=_1_to_m f(i,j)^2 = sum_j<>k f(i,j)f(i,k)
Entropy is another way to see how mixed a set is:
It's a measure of how surprising a randomly selected item from the set is. For a pure set, the entropy would be zero.
H(X) = sum_i_=1_to_n_p(xi)log2( 1 / p(xi)) = - Sum_i=1_to_n p(xi).log_2_p(xi)
Variance is how much a list of numbers varies from the average value. It is found by averaging the squared difference of every number from the mean:
Sigma ^ 2 = (1 / N ) Sum_i_1_to_N(xi-x-average) ^ 2
def variance(vals):
mean = float(sum(vals))/lens(vals)
s = sum([(v-mean)**2 for v in vals])
return s/len(vals)
Gaussian function is the probability density function of the normal curve:
G with a variance of sigma = (1/sigma.sqrt(2.pi)) exp(-(x-mu)^2 / 2 sigma^2 )
import math
def Gaussian(dist, sigma=10.0):
exp = math.e**(-dist**2/ 2*sigma**2)
return (1/(sigma*(2*math.pi)**.5))*exp
We discussed the non-negative matrix factorization method earlier that simply requires you to call the factorize function with a list of observations and the number of underlying features and gives you the weights and the features. It performed the factorization based on the following four update matrices:
hn the transposed weights matrix multiplied by the data matrix
hd the transposed weights matrix multiplied by the weights matrix multiplied by the features matrix.
wn the data matrix multiplied by the transposed features matrix.
wd the weights matrix multiplied by the features matrix multiplied by the transposed features matrix.
The weights and the features computed may not be the same all the time for small data but for larger data they are usually consistent.
There are a number of numerical functions available:
1) Trigonometric functions like sine, cosine, and tangent.
2) Other mathematical functions like power, square root, and absolute value.
3) Statistical distributions, such as a Gaussian
4) Distance metrics, like Euclidean and Tanimoto distances
5) A three parameter function that returns -1 if the first parameter is between second and third
function (a, b, c){
if ( b <= a && a= < c) return -1;
return 0;
}
6) A three parameter function that returns 1 if the difference between the first two parameters is less than the third
function (a, b, c){
if ( Math.abs(a-b) < Math.abs(c)) return 1;
return 0;
}
Numerical methods are reactive. They are dependent on the input. This is the right approach for mathematical functions, however for efficiency programs have to store information for the next round.
One way to do this is to create tree nodes that can store and retrieve values from predefined slots. A store node has a single child and an index of a memory slot, it gets the result from its child and stores it in the memory slot and then passes this along to its parent. A recall node has no children and simply returns the value in the appropriate slot. If a store node is at the top of a tree, the final result is available to any part of the tree that has the appropriate recall node.
In addition to the individual memory, there can be shared memory set up where programs can share a set of slots that all have access to read and write. This might remind of parallel computing with higher levels of cooperation and competition.
Some other functions include :
Gini Impurity : This is a measure of how impure a set is. If you have a set of items such as [A,A, B,B,B,C,C] then Gini impurity tells you the probability that you would be wrong if you picked one item and randomly guessed its label. If the set were all A instead and your guess was A, then you would never be wrong and you would have a pure set.
Ig (i) = 1 - sum_j_=_1_to_m f(i,j)^2 = sum_j<>k f(i,j)f(i,k)
Entropy is another way to see how mixed a set is:
It's a measure of how surprising a randomly selected item from the set is. For a pure set, the entropy would be zero.
H(X) = sum_i_=1_to_n_p(xi)log2( 1 / p(xi)) = - Sum_i=1_to_n p(xi).log_2_p(xi)
Variance is how much a list of numbers varies from the average value. It is found by averaging the squared difference of every number from the mean:
Sigma ^ 2 = (1 / N ) Sum_i_1_to_N(xi-x-average) ^ 2
def variance(vals):
mean = float(sum(vals))/lens(vals)
s = sum([(v-mean)**2 for v in vals])
return s/len(vals)
Gaussian function is the probability density function of the normal curve:
G with a variance of sigma = (1/sigma.sqrt(2.pi)) exp(-(x-mu)^2 / 2 sigma^2 )
import math
def Gaussian(dist, sigma=10.0):
exp = math.e**(-dist**2/ 2*sigma**2)
return (1/(sigma*(2*math.pi)**.5))*exp
We discussed the non-negative matrix factorization method earlier that simply requires you to call the factorize function with a list of observations and the number of underlying features and gives you the weights and the features. It performed the factorization based on the following four update matrices:
hn the transposed weights matrix multiplied by the data matrix
hd the transposed weights matrix multiplied by the weights matrix multiplied by the features matrix.
wn the data matrix multiplied by the transposed features matrix.
wd the weights matrix multiplied by the features matrix multiplied by the transposed features matrix.
The weights and the features computed may not be the same all the time for small data but for larger data they are usually consistent.
No comments:
Post a Comment