不易OOM写法
import org.apache.spark.sql.functions._
inputDF.groupBy("the_key")
.agg(concat_ws(",", collect_set("string_column")) as "string_set_concat_column")
易OOM的写法(优点是可以对每个group里自定义操作)
inputDF.rdd.groupBy(row => row.getAs[Long]("the_key"))
.map( //each group ...