0
点赞
收藏
分享

微信扫一扫

Pyspark特征工程--SQLTransformer

星巢文化 2022-03-12 阅读 48

SQLTransformer

class pyspark.ml.feature.SQLTransformer(statement=None)

实现由 SQL 语句定义的转换。目前我们只支持 SQL 语法,如“SELECT … FROM THIS”,其中“THIS”表示输入数据集的基础表

基础表也支持临时表

01.创建数据:

from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("SQLTransformer").master("local[*]").getOrCreate()
df = spark.createDataFrame([(0, 1.0, 3.0), (2, 2.0, 5.0)], ["id", "v1", "v2"])
df.show()

​ 输出结果:

+---+---+---+
| id| v1| v2|
+---+---+---+
|  0|1.0|3.0|
|  2|2.0|5.0|
+---+---+---+

02.使用sql语句,作为计算字段查询

from pyspark.ml.feature import SQLTransformer
sqlTrans = SQLTransformer(
    statement=\
    "SELECT *, (v1 + v2) AS ADD, (v1 * v2) AS TAKE FROM __THIS__")
sqlTrans.transform(df).show()

​ 输出结果:

+---+---+---+---+----+
| id| v1| v2|ADD|TAKE|
+---+---+---+---+----+
|  0|1.0|3.0|4.0| 3.0|
|  2|2.0|5.0|7.0|10.0|
+---+---+---+---+----+

03.使用临时表

df.createOrReplaceTempView("table")
sqlTrans = SQLTransformer(
    statement=\
    "SELECT *, (v1 + v2) AS ADD, (v1 * v2) AS TAKE FROM table")
sqlTrans.transform(df).show()

​ 输出结果:

+---+---+---+---+----+
| id| v1| v2|ADD|TAKE|
+---+---+---+---+----+
|  0|1.0|3.0|4.0| 3.0|
|  2|2.0|5.0|7.0|10.0|
+---+---+---+---+----+
举报

相关推荐

0 条评论