Showing posts with label distance. Show all posts
Showing posts with label distance. Show all posts
Thursday, March 2, 2017
Edit Distance Java Similar String
Edit Distance Java Similar String
The edit distance of two strings commonly refers to the Levenshtein Distance of two strings- the amount of work it takes to transform one string to the other by either inserting, deleting, or changing one character at a time.
For example, changing sitting -> kitten takes 3
moves: sitting -> kitting, kitting -> kittin, and finally kittin -> kitten.
The algorithm to solve this in reasonable time takes dynamic programming, and has many applications such as word checking, and typing speed tests. Algorithm below:
public static int editDistance(String s, String t){
int m=s.length();
int n=t.length();
int[][]d=new int[m+1][n+1];
for(int i=0;i<=m;i++){
d[i][0]=i;
}
for(int j=0;j<=n;j++){
d[0][j]=j;
}
for(int j=1;j<=n;j++){
for(int i=1;i<=m;i++){
if(s.charAt(i-1)==t.charAt(j-1)){
d[i][j]=d[i-1][j-1];
}
else{
d[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+1));
}
}
}
return(d[m][n]);
}
public static int min(int a,int b,int c){
return(Math.min(Math.min(a,b),c));
}
It is important to note that there are only three possibilities one string can get to another: either deleting, inserting, or changing.
int m=s.length();
int n=t.length();
int[][]d=new int[m+1][n+1];
for(int i=0;i<=m;i++){
d[i][0]=i;
}
for(int j=0;j<=n;j++){
d[0][j]=j;
}
for(int j=1;j<=n;j++){
for(int i=1;i<=m;i++){
if(s.charAt(i-1)==t.charAt(j-1)){
d[i][j]=d[i-1][j-1];
}
else{
d[i][j]=min((d[i-1][j]+1),(d[i][j-1]+1),(d[i-1][j-1]+1));
}
}
}
return(d[m][n]);
}
public static int min(int a,int b,int c){
return(Math.min(Math.min(a,b),c));
}
It is important to note that there are only three possibilities one string can get to another: either deleting, inserting, or changing.
Thus, if we let L(i,j) indicate the cost it takes from changing i to j, if s and t are the original strings, the cost is:
Math.min(1+L(i-1,j-1),1+L(i,j-1),1+L(i-1,j)) unless the last characters of the strings are the same, then the minimum is L(i-1,j-1). This solves the problem, but note that it is very slow because of all the recursive calls, which repeat many similiar problems.
Thus, we use dynamic programming: Construct a matrix d[m][n] with sizes s.length() and t.length(). instead of recursive calls, we will use a matrix lookup which will store the smallest current cost to change a smaller i into j.
Thus, with he java algorithm code for edit distance above, the speed is O(mn) where m and n are the sizes of the strings.
Available link for download
Subscribe to:
Posts (Atom)