0
点赞
收藏
分享

微信扫一扫

spark 一个dataframe的两个列的编辑距离join

慕犹清 2022-07-27 阅读 66


import org.apache.spark.sql.functions

val actualDF = sourceDF.withColumn(
"word1_word2_levenshtein",
functions.levenshtein(sourceDF.col("word1"), sourceDF.col("word2"))
)

actualDF.show()

+------+-------+-----------------------+
| word1| word2|word1_word2_levenshtein|
+------+-------+-----------------------+
| blah| blah| 0|
| cat| bat| 1|
| phat| fat| 2|
|kitten|sitting| 3|
+------+-------+-----------------------+


举报

相关推荐

0 条评论