Tuesday, March 16, 2021

Limited dimensions and sequences in the discovery of semantic embeddings:


Words are all that are used to describe the thoughts of a person. Yet their relative significance is hard to find since everyone uses them in their own style. A formula to bring out the salient topics or keywords in a text has been an exciting journey and more so this last decade.  

 

Mikolov’s 2013 word2vec model introduced a neural net technique that used a classifier to discover words that occur frequently together which brings out the latent meaning in the text. Similar words have similar vectors. This hidden matrix of feature weights that were assigned to the words was called word embeddings. The matrix was limited because the words represented as a vector formed with a set of other words as features could not be arbitrarily large otherwise the sparse matrix would become even more unwieldy and the iterations even in a single layer of hundreds of neurons would become expensive.  A corpus of text also dramatically increases the input so this adds to the computations. Fortunately, the curse of limited dimensions is overcome by choosing several training samples per class to be at least five to ten times the number of features. 

 
 

Dimensionality reduction is a technique that eliminates noise by choosing important features and lowers cost without significant loss of information. It can use both linear and non-linear transformations. Given a data set X with n data points that needs to be reduced to d dimensions, a linear transformation proceeds by selecting V data set that have d dimensions corresponding to each other and matrix multiplying their transpose to each of the points in the X data set to get Y. Since the V has d dimensions the resulting linear transformations also have d dimensions. Non-linear dimensionality reduction techniques may even learn an internal model within the data. This is the case with a technique called manifold learning. In this case, a high dimensionality data set may be projected onto a smaller dimension set while trying to preserve the structure of inter-point distances from the high dimensional space in the lower dimension projection. It is called non-linear because the mapping cannot be represented as a linear combination of original variables.  


(...to be continued)

No comments:

Post a Comment