Thursday, October 9, 2014

I came across a method implemented in a library that gives a programmatic way to establish SSH sessions (Python pexpect library - pxssh.py). This library had a method to compute the Levenshtein distance as follows:


        '''This calculates the Levenshtein distance between a and b.

        '''


        n, m = len(a), len(b)

        if n > m:

            a,b = b,a

            n,m = m,n

        current = range(n+1)

        for i in range(1,m+1):

            previous, current = current, [i]+[0]*n

            for j in range(1,n+1):

                add, delete = previous[j]+1, current[j-1]+1

                change = previous[j-1]

                if a[j-1] != b[i-1]:

                    change = change + 1

                current[j] = min(add, delete, change)

        return current[n]

As we may know already, Levenshein distance is a metric for measuring the difference between two string sequences. The distance is computed in terms of single-character edits specifically (insertions, deletions and substitutions). The method takes a shorter string and transforms it to the longer string. The delete is calculated from the position to the left in the sequence. The change is to the position on the left in the candidate. The add is at the immediate position in the candidate. The positions are iterated for both sequences. Match gets a value zero and a mismatch costs 1 by way of transformation at the specified positions.

No comments:

Post a Comment