0
点赞
收藏
分享

微信扫一扫

【编程实践】计算文本、字符串相似度:Levenshtein 距离


目录

​​Levenshtein 距离简介​​

​​Open AI:​​

​​我写的(网上抄的):​​

Levenshtein 距离简介

Perplexity AI:
​​​https://www.perplexity.ai/​​

The Levenshtein distance is a measure of the similarity between two strings, which is calculated by counting the number of deletions, insertions, or substitutions required to transform one string into another[1][2][3]. It can be used to measure the difference between two words or phrases[4][5].

Levenshtein 距离是两个字符串之间相似性的度量,它是通过计算将一个字符串转换为另一个字符串所需的删除、插入或替换的次数来计算的 [1][2][3]。它可以用来衡量两个词或短语之间的差异[4][5]。

Open AI:

用Kotlin实现计算文本、字符串相似度:Levenshtein 距离的代码示例


/**
* 计算两个文本、字符串之间的Levenshtein距离
* @param s1 文本、字符串1
* @param s2 文本、字符串2
* @return Levenshtein距离
*/
fun levenshteinDistance(s1: String, s2: String): Int {
var n = s1.length
var m = s2.length
if (n == 0) return m
if (m == 0) return n
var d = Array(n + 1) { IntArray(m + 1) }

// 初始化
for (i in 0..n) d[i][0] = i
for (j in 0..m) d[0][j] = j

// 计算
for (i in 1..n) {
for (j in 1..m) {
val cost = if (s1[i - 1] == s2[j - 1]) 0 else 1
d[i][j] = minOf(
d[i - 1][j] + 1,
d[i][j - 1] + 1,
d[i - 1][j - 1] + cost
)
}
}
return d[n][m]
}

我写的(网上抄的):

package com.github.aixcode.utils

import com.github.aixcode.utils.TextDistanceUtil.getSimilarity

object TextDistanceUtil {
/**
* Returns the Levenshtein distance between a and b, result 0~1
*/
fun getLevenshteinDistance(X: String, Y: String): Int {
val m = X.length
val n = Y.length
val T = Array(m + 1) { IntArray(n + 1) }
for (i in 1..m) {
T[i][0] = i
}
for (j in 1..n) {
T[0][j] = j
}
var cost: Int
for (i in 1..m) {
for (j in 1..n) {
cost = if (X[i - 1] == Y[j - 1]) 0 else 1
T[i][j] = Integer.min(Integer.min(T[i - 1][j] + 1, T[i][j - 1] + 1),
T[i - 1][j - 1] + cost)
}
}
return T[m][n]
}

fun getSimilarity(x: String, y: String): Double {
val maxLength = java.lang.Double.max(x.length.toDouble(), y.length.toDouble())
return if (maxLength > 0) {
// 如果需要,可以选择忽略大小写
(maxLength - getLevenshteinDistance(x, y)) / maxLength
} else 1.0
}
}

fun main() {
println(getSimilarity("c", "cql")) // 0.3333333333333333
println(getSimilarity("cq", "cql")) // 0.6666666666666666
println(getSimilarity("cql", "cql")) // 1.0
}

举报

相关推荐

字符串按相似度排序

0 条评论