SQLTransformer
class pyspark.ml.feature.SQLTransformer(statement=None)
实现由 SQL 语句定义的转换。目前我们只支持 SQL 语法,如“SELECT … FROM THIS”,其中“THIS”表示输入数据集的基础表
基础表也支持临时表
01.创建数据:
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("SQLTransformer").master("local[*]").getOrCreate()
df = spark.createDataFrame([(0, 1.0, 3.0), (2, 2.0, 5.0)], ["id", "v1", "v2"])
df.show()
输出结果:
+---+---+---+
| id| v1| v2|
+---+---+---+
| 0|1.0|3.0|
| 2|2.0|5.0|
+---+---+---+
02.使用sql语句,作为计算字段查询
from pyspark.ml.feature import SQLTransformer
sqlTrans = SQLTransformer(
statement=\
"SELECT *, (v1 + v2) AS ADD, (v1 * v2) AS TAKE FROM __THIS__")
sqlTrans.transform(df).show()
输出结果:
+---+---+---+---+----+
| id| v1| v2|ADD|TAKE|
+---+---+---+---+----+
| 0|1.0|3.0|4.0| 3.0|
| 2|2.0|5.0|7.0|10.0|
+---+---+---+---+----+
03.使用临时表
df.createOrReplaceTempView("table")
sqlTrans = SQLTransformer(
statement=\
"SELECT *, (v1 + v2) AS ADD, (v1 * v2) AS TAKE FROM table")
sqlTrans.transform(df).show()
输出结果:
+---+---+---+---+----+
| id| v1| v2|ADD|TAKE|
+---+---+---+---+----+
| 0|1.0|3.0|4.0| 3.0|
| 2|2.0|5.0|7.0|10.0|
+---+---+---+---+----+