0
点赞
收藏
分享

微信扫一扫

【极简spark教程】开始实战

code_balance 2022-04-13 阅读 57
  1. spark-shell

     

    1. dos命令行下输入
      spark-shell

    2. 引入依赖:
      spark-shell --jars /path/myjar1.jar,/path/myjar2.jar

    3. 指定资源:
      spark-shell --master yarn-client --driver-memory 16g --num-executors 60 --executor-memory 20g --executor-cores 2

    4. 自动加载内容
    5. 显示日志级别
      spark.sparkContext.setLogLevel("ERROR")

  1. intellij配置
    1. 修改pom文件添加依赖
      <properties>
      
      <maven.compiler.source>1.8</maven.compiler.source>
      
      <maven.compiler.target>1.8</maven.compiler.target>
      
      <encoding>UTF-8</encoding>
      
      <scala.version>2.11.8</scala.version>
      
      <spark.version>2.2.0</spark.version>
      
      <hadoop.version>2.7.1</hadoop.version>
      
      <scala.compat.version>2.11</scala.compat.version>
      
      </properties>
      
      <!--声明并引入公有的依赖-->
      
       
      
      <dependencies>
      
      <dependency>
      
      <groupId>org.scala-lang</groupId>
      
      <artifactId>scala-library</artifactId>
      
      <version>${scala.version}</version>
      
      </dependency>
      
      <dependency>
      
      <groupId>org.apache.spark</groupId>
      
      <artifactId>spark-core_2.11</artifactId>
      
      <version>${spark.version}</version>
      
      </dependency>
      
      <dependency>
      
      <groupId>org.apache.hadoop</groupId>
      
      <artifactId>hadoop-client</artifactId>
      
      <version>${hadoop.version}</version>
      
      </dependency>
      
      </dependencies>

  2. 定义spark和sc
    1. 定义spark
      val spark = SparkSession.builder().appName("Word Count").getOrCreate()

    2. 定义sc
      sc = spark.sparkContext()

举报

相关推荐

0 条评论