目录
Levenshtein 距离简介
Open AI:
我写的(网上抄的):
Levenshtein 距离简介
Perplexity AI:
https://www.perplexity.ai/
The Levenshtein distance is a measure of the similarity between two strings, which is calculated by counting the number of deletions, insertions, or substitutions required to transform one string into another[1][2][3]. It can be used to measure the difference between two words or phrases[4][5].
Levenshtein 距离是两个字符串之间相似性的度量,它是通过计算将一个字符串转换为另一个字符串所需的删除、插入或替换的次数来计算的 [1][2][3]。它可以用来衡量两个词或短语之间的差异[4][5]。
Open AI:
用Kotlin实现计算文本、字符串相似度:Levenshtein 距离的代码示例
/**
* 计算两个文本、字符串之间的Levenshtein距离
* @param s1 文本、字符串1
* @param s2 文本、字符串2
* @return Levenshtein距离
*/
fun levenshteinDistance(s1: String, s2: String): Int {
var n = s1.length
var m = s2.length
if (n == 0) return m
if (m == 0) return n
var d = Array(n + 1) { IntArray(m + 1) }
// 初始化
for (i in 0..n) d[i][0] = i
for (j in 0..m) d[0][j] = j
// 计算
for (i in 1..n) {
for (j in 1..m) {
val cost = if (s1[i - 1] == s2[j - 1]) 0 else 1
d[i][j] = minOf(
d[i - 1][j] + 1,
d[i][j - 1] + 1,
d[i - 1][j - 1] + cost
)
}
}
return d[n][m]
}
我写的(网上抄的):
package com.github.aixcode.utils
import com.github.aixcode.utils.TextDistanceUtil.getSimilarity
object TextDistanceUtil {
/**
* Returns the Levenshtein distance between a and b, result 0~1
*/
fun getLevenshteinDistance(X: String, Y: String): Int {
val m = X.length
val n = Y.length
val T = Array(m + 1) { IntArray(n + 1) }
for (i in 1..m) {
T[i][0] = i
}
for (j in 1..n) {
T[0][j] = j
}
var cost: Int
for (i in 1..m) {
for (j in 1..n) {
cost = if (X[i - 1] == Y[j - 1]) 0 else 1
T[i][j] = Integer.min(Integer.min(T[i - 1][j] + 1, T[i][j - 1] + 1),
T[i - 1][j - 1] + cost)
}
}
return T[m][n]
}
fun getSimilarity(x: String, y: String): Double {
val maxLength = java.lang.Double.max(x.length.toDouble(), y.length.toDouble())
return if (maxLength > 0) {
// 如果需要,可以选择忽略大小写
(maxLength - getLevenshteinDistance(x, y)) / maxLength
} else 1.0
}
}
fun main() {
println(getSimilarity("c", "cql")) // 0.3333333333333333
println(getSimilarity("cq", "cql")) // 0.6666666666666666
println(getSimilarity("cql", "cql")) // 1.0
}