Sørensen–Dice Coefficient
The Sørensen–Dice coefficient also known as Sørensen–Dice index, Sørensen index, Dice's coefficient or Soerenson index, is a simple and elegant way to calculate a measure of the similarity of two strings. The values produces are bounded between zero and one. The algorithm works by comparing the number of identical character pairs between the two strings.
public static double DiceMatch(string string1, string string2)
{
if (string.IsNullOrEmpty(string1) || string.IsNullOrEmpty(string2))
return 0;
if (string1 == string2)
return 1;
int strlen1 = string1.Length;
int strlen2 = string2.Length;
if (strlen1 < 2 || strlen2 < 2)
return 0;
int length1 = strlen1 - 1;
int length2 = strlen2 - 1;
double matches = 0;
int i = 0;
int j = 0;
while (i < length1 && j < length2)
{
string a = string1.Substring(i, 2);
string b = string2.Substring(j, 2);
int cmp = string.Compare(a, b);
if (cmp == 0)
matches += 2;
++i;
++j;
}
return matches / (length1 + length2);
}
Example
double result = DiceMatch("algorithms are fun", "logarithms are not");
Output
result: 0.58823529411764708