Cluster computing

Saturday, March 3, 2018

We were discussing Signature verification methods. We reviewed the stages involved with Signature verification. Then we also enumerated the feature extraction techniques. After that, we compared online and offline verification techniques. Yesterday we discussed the limitations of image processing and the adaptations for video processing. Today we continue with the discussion on relevant improvements for signature processing.

Unlike earlier, when image processing was confined to research labs and industrial automations, there are now software libraries, packages and applications available. Moreover, services for image processing are no longer restricted in compute and storage because they can now be hosted in the cloud. Many cloud providers now also provide libraries for image processing. For example, Microsoft and Google both provide image processing libraries. Perhaps Clarifai has a dedicated offering in this discipline.

The reason I bring out these companies is that this area of study also benefits from a multidisciplinary approach. For example, Microsoft's machine learning algorithms and R-package covered earlier in this post may also be relevant to image processing after images are transformed to a vector space model. Similarly Google's application of word2vec to perform word embeddings may provide insight into object embedding in images. Clarifai provides an api library and makes image processing just as commercial to develop as it is fun to experiment in Matlab.

Signature processing benefits incredibly with the right choice of algorithms. We don't need to perform edge segmentation since the data may already be smoothed and made clear in the preprocessing step. Gaussian smoothing helps in this regard because it adjusts the value of the current pixel based on the values of the surrounding pixels. After the pre-processing, the offline verification of signature becomes straightforward as we rely on a choice of algorithms from the previously covered list to perform this verification. If we have the luxury of performing these comparisions simultaneously, we can then perform a collaborative filtering of the given sample as a valid or invalid in a serverless computing paradigm. This is a break from the previously mentioned software for signature verification.

Technically this does not seem impossible but as we fine tune the algorithm and user acceptance may determine the success of such a venture. Signatures unlike passwords are handwritings. They are susceptible to the mood and circumstance. Since the input may change each time, the verification has to give such latitude to the user.

Friday, March 2, 2018

We were discussing Signature verification methods. We reviewed the stages involved with Signature verification yesterday. We also enumerated the feature extraction techniques. Then we compared online and offline verification techniques.

One of the reasons offline image processing is preferred is that good image processing algorithms are often computationally expensive and require more time than say network roundtrip for packets. This makes it costly to include as an interactive web page analysis widget. Time taken to execute image processing algorithms have taken even eight seconds. That is why image processing finds it difficult to keep up with the frame rate of a video. However, significant advances have been made that improve processing for streaming of images to be processed. For example, Active contour model can help track movement of object in images for a frame rate that matches the rate used for video. Signatures are considered a lot simpler to work with in image processing. They are generally small sized, binary color and easy to capture and process. As long as the image processing can tell apart a real signature from forged specimens, an image processor can work in the backend for a signature pad widget in the front-end.

We talked about the acceptance criteria for an image processing technique that is largely measured by the precision and recall. By training the processor on a signature dataset, these processors become highly effective in determining even forged from real specimens. Today we will take a closer look at how this verification is done. Since we read how classifiers work in text processing to convert the document into a vector space model and then classify the document based on euclidean distance between feature vectors, the signature verification should also be familiar. The features extracted from the image as described in the previous posts is transformed into the vector space and then compared with the master. If the euclidean distance is within tolerance threshold, the signature is accepted. Since the image processor is already trained and tested on a variety of images and measured with precision and recall, it is reliable to convert the given specimen into a representative feature vector. This concludes the signature verification technique.

#codingexercise

We were discussing combinations with duplicates and that too in a greedy manner. instead of enumerating combinations to the whole length, we can leverage stars and bars theorem to be more efficient. With this theorem, we already know the number of combinations that can exist with duplicates and therefore do not enumerate them but directly count them towards the goal such as the price of the accessories shopped. The theorem mentioned used a binomial coefficient.

Thursday, March 1, 2018

We were discussing Signature verification methods. We reviewed the stages involved with Signature verification yesterday. We also enumerated the feature extraction techniques. Now let us proceed to comparing online and offline verification techniques.

An offline signature processing algorithm requires all the information before the algorithm starts. This gives us opportunity to perform all the pre-processing required to normalize the dataset for the algorithm to work effectively. The online algorithm might work on the data while the data is being made available. The processor may reside as close to the sensing device as necessary to make this happened. In the offline case, the processor may even be in a backend system of the office. The image recognition for handwritten signatures has traditionally been offline processing. Even as such, it has been more optical based and not magnetic based. With the list of features compared between the two in the online and offline systems, we see the difference in what can be used online. Online techniques have been said to be more accurate because the system is getting the data as the user feeds it. Offline comparision can eliminate the quirks of the device on which the data is being submitted and can work effectively across a variety of devices and vendors. Online processing helps standalone processors that can be mobile and may have its own local database.

The acceptance criteria for an image processing technique is largely measured by the precision and recall. Precision in this case is the ratio that explains number of selected items that are relevant. It is the ration of the true positives to all that were selected by the image processor for this image. A true positive is one that improves the feature matching. A false positive doesn’t but shows up in match threshold. Recall on the other hand is a metric that determines how many relevant items are selected. It is the ratio of the true positives to all those that would have improved the feature matching from the global set feature matches including ones that the processor did not select. Together precision and recall yields a different metric called F-score which gives the effectiveness of retrieval with respect to a given image. By training the processor on a signature dataset, these processors become highly effective in determining even forged from real specimens.

#codingexercise

We were discussing combinations with duplicates and that too in a greedy manner. instead of enumerating combinations to the whole length, we can leverage stars and bars theorem to be more effecient.

Wednesday, February 28, 2018

We were discussing Signature verification methods. We reviewed the stages involved with Signature verification yesterday. Let us continue to list and compare online and offline verification techniques.

The feature extraction techniques involved include:
1) using an SVM classifier to extract random transform and fractal dimension
2) using neural network to extract curvlet transform, Hough transform
3) using Euclidean distance and least square error classifier for point density and spatial frequency
4) using statistical analysis techniques and chi-square test
5) using feature vector correlation for projection and local point density
6) using svm for Radon transformation
7) using learning techniques
8) using neural network for directional features

Online signature feature extraction also include:
1) signing time
2) signature width and height
3) number of pen-ups and pen-downs
4) total signature length and
5) velocity of pen

Feature extraction depends on pre-processing. Images may need to be loaded, resized, thinned, rotated and cropped.

grayscale is made into binary image with the use of threshold as
(mu1 +mu2) / 2

#codingexercise
We were discussing a coding exercise as shown below:
A person wants to buy L items from her favorite store such that a subset of N items must contain D distinct items. the items range from 1 to A in price. Determine the maximum amount of money the person can spend.

We discussed a technique for building the combinations in a greedy manner by choosing the highest priced items first. We also discussed an alternate way to enumerate all possible combinations and select only the ones that match the criteria and return the one that has the maximum purchase.
Another way to reduce enumerations of unnecessary combinations would be to use the enumerations only from combinations with repetitions instead of exhaustive combinations.

Tuesday, February 27, 2018

We were discussing Signature verification methods. Let us review the stages involved with Signature verification. We will also compare online and offline verification techniques afterwards.
The first stage for the image processing is the image acquisition. This is a crucial stage of any recognition system as the quality of image may considerably affect the subsequent stages. Moreover the devices capturing the image may wear over time since this is touch based technique. Therefore, the consistency of image quality over time is also an important factor.
The second stage for the image processing is the pre-processing that removes noise and may even introduce normalization. Some pre-processing steps may also involve resizing, binary color conversion and cleaning, rotation, thinning and cropping. Binary image that highlights only the signature may be achieved by determining the extremes of gray values and finding the mid point between them as the threshold. For example, if mu1 and mu2 are the gray values for both groups of pixels, the threshold may be set as ( mu1 + mu2 ) /2
The third stage of the image processing is the feature extraction. This is a critical stage for the signature verification because the type and quality of feature may make the verification accurate, predictable and consistent. While both online and offline verification techniques may vary in feature extraction, both may also involve common techniques. Feature extraction is generally termed global or local depending on the features extracted.
The last stage of the image processing is the signature verification. This may be the Euclidean distance computed in the feature space. If the distance is less than a threshold, the signature may be considered as verified.

#codingexercise
We were discussing a coding exercise as shown below:
A person wants to buy L items from her favorite store such that a subset of N items must contain D distinct items. the items range from 1 to A in price. Determine the maximum amount of money the person can spend.

Since the price has to be maximized, the algorithm has to be greedy in its strategy to select the next item. when we can no longer purchase the highest priced item because it violates the given restriction, we make the subsequent selection from the next lower priced item. we determine the threshold from the range 1 to n/d. The rest is recursive combination as shown earlier.
Another way to do this would be to enumerate all possible combinations and select only the ones that match the criteria and return the one that has the maximum purchase.

Monday, February 26, 2018

Signature detection and segmentation is a known field of study and techniques involve shape matching. While some of this processing involve offline techniques, there are online techniques also mentioned in the associated literature. Moreover, MYCT-Signature corpus, Susig database and GPDS-960 provide well known databases for evaluating algorithms. For example, one method of non-rigid shape matching involves a spatial histogram aka shape context computed for each point which describes the distributions of the relative positions of all remaining points. The correspondences between points are solved through weighted bipartite graph matching before the signatures are matched. Another method of non-rigid shape matching formulates it as an optimization problem that preserves a local neighborhood structure. This method has an intuitive graph matching interpretation where each point represents a vertex and two vertices are considered connected in the graph if they are neighbors. The problem of finding optimal match between shapes is therefore equivalent to maximizing the number of matched edges between their corresponding graphs under a one-to-one matching constraint. In this optimization approach, an iterative framework is used to estimate the correspondences and the transformation. In each iteration, graph matching is initialized using shape context distance and subsequently updated through relaxation labeling which is a well-known formal method of expressing low level contextual information, and applying it to complete the extraction of image features.
Image processing generally involves multiple subsequent stages of processing the images. Signatures have the nice property that they are like the results of sobel edge detection and the edges are expected to be more continuous in their formation. Moreover, signature pads are small images, with similar curves and accents and purely black and white, so they are near consistent and this helps with their processing.
#codingexercise
A person wants to form teams by selecting as many participants from a list as possible. The participants have skills represented by an integer. The skills selected as such must be distinct and contiguous even if they are negative. By making the team as large as possible, more problems can be solved. What is the size of the team he can form ?
one way to do this would be to sort the skills and find the largest distinct unit incremental subsequence.

Another way to do this is with longest increasing sequence.
Int GetLongestIncreasingSubsequence(List<int> A)
{
var best = new int[A.Length+1];

for (int i = 0; i < best.Length; i++)

best[i] = 1;

for (int i = 1; i < A.Length; i++)

for (int j=0; j < i; j++)

if (A[i] == A[j] + 1)

{

best[i] = Math.Max(best[i], best[j]+1);

}
return best.ToList().max();
}

The above assumes distinct elements.

another exercise

A person wants to buy L items from her favorite store such that a subset of N items must contain D distinct items. the items range from 1 to A in price. Determine the maximum amount of money the person can spend.

Since the price has to be maximized, the algorithm has to be greedy in its strategy to select the next item. when we can no longer purchase the highest priced item because it violates the given restriction, we make the subsequent selection from the next lower priced item. we determine the threshold from the range 1 to n/d. The rest is recursive combination as shown earlier.

Courtesy: hackerrank

Sunday, February 25, 2018

Yesterday we were discussing how to enable user logins with something that they draw such as their signature on a signature pad. Efficient image processing algorithms can then compare signatures. Moreover, what people draw on the signature pads is completely their call and can even handwrite passwords instead of signature. Since the data is private both at rest and transit, this cannot be divulged with anybody else and provides a layer of security on top of the known passwords. Signature detection and segmentation is a known field of study and techniques involve shape matching. While some of this processing involve offline techniques, there are online techniques also mentioned in the associated literature. Moreover, MYCT-Signature corpus, Susig database and GPDS-960 provide well known databases for evaluating algorithms. For example, one method of non-rigid shape matching involves a spatial histogram aka shape context computed for each point which describes the distributions of the relative positions of all remaining points. The correspondences between points are solved through weighted bipartite graph matching before the signatures are matched. Another method of non-rigid shape matching formulates it as an optimization problem that preserves a local neighborhood structure. This method has an intuitive graph matching interpretation where each point represents a vertex and two vertices are considered connected in the graph if they are neighbors. The problem of finding optimal match between shapes is therefore equivalent to maximizing the number of matched edges between their corresponding graphs under a one-to-one matching constraint. In this optimization approach, an iterative framework is used to estimate the correspondences and the transformation. In each iteration, graph matching is initialized using shape context distance and subsequently updated through relaxation labeling which is a well-known formal method of expressing low level contextual information, and applying it to complete the extraction of image features.
Image processing generally involves multiple subsequent stages of processing the images. Signatures have the nice property that they are like the results of sobel edge detection and the edges are expected to be more continuous in their formation. Moreover, signature pads are small images, with similar curves and accents and purely black and white, so they are near consistent and this helps with their processing.