Tuesday, March 20, 2018

Today also we continue discussing  Convolutional Neural Network (CNN) in image and object embedding in a shared space, shape signature and image retrieval.
The CNN approach consists of four major components: embedding space construction, training image synthesis, CNN training phase, and the final testing phase. In the first phase, a collection of 3D images is embedded into a common space. In the second phase, the training data is synthesized using 3D shapes in a rendering process which yields annotations as well.  In the third phase, a network is trained to learn the mapping between images and 3D shape induced embedding space. Lastly, the trained network is applied on new images to obtain an embedding into the shared space. This facilitates image and shape retrieval.
The embedding space is where both real-world images and shapes co-exist.  The space organizes latent objects between images and shapes. In order to do this, the objects are initialized from a set of 3D models.  The are pure and complete representation of objects. They don't suffer from the noise in images. The distance between 3D models is both informative and precise. With the help of 3D models, the embedding space becomes robust.
The shape distance metric computes the similarity between two shapes by the aggregate of similarities among corresponding views. This method is called Light field descriptor. The input is a set of 3D shapes although two would do.The shapes are aligned by applying a transformation using a rotation matrix and a translation vector. Then they are projected from k viewpoints to generate projection images
The CNN uses this distance metric to form a pairwise comparison between the 3D models. Since the metric is informative and accurate, the models can be organized in space along increasing dimensions.
The 2D distance matrix formed from word embeddings in text documents that is dimensionality reduced and classified using the softmax function  is similarly put to use with the distance matrix between 3D models although the feature vector, distance calculation, algorithm and error function are different. Neural nets are applied to embedding in both text documents and images.
#proposal for login screens:  https://1drv.ms/w/s!Ashlm-Nw-wnWtWiX5uxOG6zc4a8K

No comments:

Post a Comment