Spark 利用udf只对DataFrame其中几列操作,而不对所有列map

阅读 47

2022-07-27


定义UDF

import org.apache.spark.sql.functions.udf

def theUDF = udf((inputColumn1: String, inputColumn2: BigInt)=>{
var resultColumn = 0
inputColumn1.split(",").foreach(item=>{
if(java.lang.Long.valueOf(item).equals(inputColumn2)) {
resultColumn = 1
}
})
resultColumn
})

调用

dataFrame.withColumn("result_column", 
theUDF(col("input_column1"), col("input_column2"))
)


精彩评论(0)

0 0 举报