Fuzzy Bitap Algorithm
This is a fuzzy string matching version of bitap algorithm. The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance — if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.
/*****Please include following header files*****/
// string
// limits.h
/***********************************************/
/*****Please use following namespaces*****/
// std
/*****************************************/
static int SearchString(string text, string pattern, int k)
{
int result = -1;
int m = pattern.size();
unsigned long *R;
unsigned long patternMask[CHAR_MAX + 1];
int i, d;
if (pattern[0] == '\0') return 0;
if (m > 31) return -1; //Error: The pattern is too long!
R = new unsigned long[(k + 1) * sizeof *R];
for (i = 0; i <= k; ++i)
R[i] = ~1;
for (i = 0; i <= CHAR_MAX; ++i)
patternMask[i] = ~0;
for (i = 0; i < m; ++i)
patternMask[pattern[i]] &= ~(1UL << i);
for (i = 0; text[i] != '\0'; ++i)
{
unsigned long oldRd1 = R[0];
R[0] |= patternMask[text[i]];
R[0] <<= 1;
for (d = 1; d <= k; ++d)
{
unsigned long tmp = R[d];
R[d] = (oldRd1 & (R[d] | patternMask[text[i]])) << 1;
oldRd1 = tmp;
}
if (0 == (R[k] & (1UL << m)))
{
result = (i - m) + 1;
break;
}
}
free(R);
return result;
}
Example
int index = SearchString("The quick brown foax jumps over the lazy dog", "fox", 1);
Output
index: 16