Bitap Algorithm
This is a exact string matching version of bitap algorithm. The bitap algorithm (also known as the shift-or, shift-and or Baeza-Yates–Gonnet algorithm) is an approximate string matching algorithm. The algorithm tells whether a given text contains a substring which is "approximately equal" to a given pattern, where approximate equality is defined in terms of Levenshtein distance — if the substring and pattern are within a given distance k of each other, then the algorithm considers them equal. The algorithm begins by precomputing a set of bitmasks containing one bit for each element of the pattern. Then it is able to do most of the work with bitwise operations, which are extremely fast.
/*****Please include following header files*****/
// string
// limits.h
/***********************************************/
/*****Please use following namespaces*****/
// std
/*****************************************/
static int SearchString(string text, string pattern)
{
int m = pattern.size();
unsigned long R;
unsigned long patternMask[CHAR_MAX + 1];
int i;
if (pattern[0] == '\0') return 0;
if (m > 31) return -1; //Error: The pattern is too long!
R = ~1;
for (i = 0; i <= CHAR_MAX; ++i)
patternMask[i] = ~0;
for (i = 0; i < m; ++i)
patternMask[pattern[i]] &= ~(1UL << i);
for (i = 0; text[i] != '\0'; ++i)
{
R |= patternMask[text[i]];
R <<= 1;
if (0 == (R & (1UL << m)))
return (i - m) + 1;
}
return -1;
}
Example
int index = SearchString("The quick brown fox jumps over the lazy dog", "fox");
Output
index: 16