Sørensen–Dice Coefficient
The Sørensen–Dice coefficient also known as Sørensen–Dice index, Sørensen index, Dice's coefficient or Soerenson index, is a simple and elegant way to calculate a measure of the similarity of two strings. The values produces are bounded between zero and one. The algorithm works by comparing the number of identical character pairs between the two strings.
Public Shared Function DiceMatch(string1 As String, string2 As String) As Double
If String.IsNullOrEmpty(string1) OrElse String.IsNullOrEmpty(string2) Then
Return 0
End If
If string1 = string2 Then
Return 1
End If
Dim strlen1 As Integer = string1.Length
Dim strlen2 As Integer = string2.Length
If strlen1 < 2 OrElse strlen2 < 2 Then
Return 0
End If
Dim length1 As Integer = strlen1 - 1
Dim length2 As Integer = strlen2 - 1
Dim matches As Double = 0
Dim i As Integer = 0
Dim j As Integer = 0
While i < length1 AndAlso j < length2
Dim a As String = string1.Substring(i, 2)
Dim b As String = string2.Substring(j, 2)
Dim cmp As Integer = String.Compare(a, b)
If cmp = 0 Then
matches += 2
End If
i += 1
j += 1
End While
Return matches / (length1 + length2)
End Function
Example
Dim result As Double = DiceMatch("algorithms are fun", "logarithms are not")
Output
result: 0.58823529411764708