0
点赞
收藏
分享

微信扫一扫

AttributeError: ‘RDD‘ object has no attribute ‘toDF‘


rdd = sc.parallelize([("Sam", 28, 88), ("Flora", 28, 90), ("Run", 1, 60)])
df = rdd.toDF(["name", "age", "score"])
df.show()
sc.stop()

我想使用RDD来创建SparkDataFrame,但是报错了

AttributeError: ‘RDD‘ object has no attribute ‘toDF‘_知乎

解决方案:增加三行代码,如下

from pyspark.sql.session import SparkSession
sc=SparkContext()
SparkSession(sc) #利用SparkSession来使sc具有处理PipelinedRDD的能力
rdd = sc.parallelize([("Sam", 28, 88), ("Flora", 28, 90), ("Run", 1, 60)])
df = rdd.toDF(["name", "age", "score"])
df.show()
sc.stop()


    如果报了下面的错误,使用sc.stop()运行一下下就可以了 ValueError: Cannot run multiple SparkContexts at once; existing SparkContext(app=test_SamShare, master=local[4]) created by __init__ at C:\Users\ADMINI~1\AppData\Local\Temp/ipykernel_20272/689659085.py:9


        运行成功:

AttributeError: ‘RDD‘ object has no attribute ‘toDF‘_解决方案_02

 附上一份大佬写的学习Pyspark的教程。知乎看到的

​​3万字长文 PySpark入门级学习教程,框架思维 - 知乎 (zhihu.com)​​

举报

相关推荐

0 条评论