Spark集群安装(h15\h16\h18上)
注意:一定要两两之间设置免密登陆
1\配置zookeeper成功后,下载spark-1.3.1上传解压,为了避免命令和hadoop冲突,不用配置环境变量
2\先配置h15,
(1)#cp slaves.template slaves
(2)#cp spark-env.sh.template spark-env.sh
(3)修改文件/home/spark-1.3.1-bin-hadoop2.4/conf/slaves(配置worker所在机器)
# A Spark Worker will be started on each of the machines listed below.
#localhost
h16
h18
3\修改文件/home/spark-1.3.1-bin-hadoop2.4/conf/spark-env.sh
#!/usr/bin/env bash
#export SPARK_LOCAL_IP=localhost
export JAVA_HOME=/usr/java/jdk1.7.0_67
export SPARK_MASTER_IP=h15
#export SPARK_MASTER_IP=localhost
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1g
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=h15:2181,h16:2181,h17:2181"
#export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
#export YARN_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
#export SPARK_HOME=/usr/hadoopsoft/spark-1.3.1-bin-hadoop2.4
#export SPARK_JAR=/usr/hadoopsoft/spark-1.3.1-bin-hadoop2.4/lib/spark-assembly-1.3.1-hadoop2.4.0.jar
#export PATH=$SPARK_HOME/bin:$PATH
#SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/hadoopsoft/apache-mahout-distribution-0.10.2/lib/*"
#SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/hadoopsoft/apache-mahout-distribution-0.10.2/*"
#SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/hadoopsoft/tachyon-0.8.2/clients/client/target/tachyon-client-0.8.2-jar-with-dependencies.jar"
4\传输到其他2台机器上
#scp -r /home/spark-1.3.1-bin-hadoop2.4/ root@h16:/home/
#scp -r /home/spark-1.3.1-bin-hadoop2.4/ root@h18:/home/
5\修改h18的/home/spark-1.3.1-bin-hadoop2.4/conf/spark-env.sh文件(注意:h16是从节点,h18是standlby节点)
#!/usr/bin/env bash
#export SPARK_LOCAL_IP=localhost
export JAVA_HOME=/usr/java/jdk1.7.0_67
export SPARK_MASTER_IP=h18
#export SPARK_MASTER_IP=localhost
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1g
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=h15:2181,h16:2181,h17:2181"
#export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
#export YARN_CONF_DIR=$HADOOP_INSTALL/etc/hadoop
#export SPARK_HOME=/usr/hadoopsoft/spark-1.3.1-bin-hadoop2.4
#export SPARK_JAR=/usr/hadoopsoft/spark-1.3.1-bin-hadoop2.4/lib/spark-assembly-1.3.1-hadoop2.4.0.jar
#export PATH=$SPARK_HOME/bin:$PATH
#SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/hadoopsoft/apache-mahout-distribution-0.10.2/lib/*"
#SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/hadoopsoft/apache-mahout-distribution-0.10.2/*"
#SPARK_DIST_CLASSPATH="$SPARK_DIST_CLASSPATH:/usr/hadoopsoft/tachyon-0.8.2/clients/client/target/tachyon-client-0.8.2-jar-with-dependencies.jar"
6\分别启动zookeeper集群(h15\h16\h17)
#zkServer.sh start
7\在主节点h15上,进入spark解压目录下的sbin(/home/spark-1.3.1-bin-hadoop2.4/sbin)下启动spark
#sh start-all
8\页面查看是否正常
#http://h15:8080
注意检查:
(1)Master
(2)Status
(3)work是否为h16和h18
9\进入h18机器的spark的sbin下,启动h18(standlby master)
#sh /home/spark-1.3.1-bin-hadoop2.4/sbin/start-master.sh
10\在spark alive的master上进入spark-shell(单机模式进入)
#sh /home/spark-1.3.1-bin-hadoop2.4/bin/spark-shell --master local
看见:
即为进入成功,可以编写scala代码
11\在spark alive的master上进入spark-shell(集群模式进入)
#sh /home/spark-1.3.1-bin-hadoop2.4/bin/spark-shell --master spark://h15:7077,h16:7077,h18:7077
注意:该模式启动后,可以在页面http://h15:8080/看见应用执行信息
12\如果以上步骤都正常,那么开始实现h18(standlby)高可用接管
(1)停止h15的alive
#sh /home/spark-1.3.1-bin-hadoop2.4/sbin/stop-master.sh
(2)页面查看是否接管:http://h18:8080/
13\再进入h15开启master(standby)
#sh /home/spark-1.3.1-bin-hadoop2.4/sbin/start-master.sh
测试四种模式
4种模式执行task
一\local单机模式:
结果xshell可见:(在h15上的spark的bin目录下执行(alive))
#sh spark-submit --class org.apache.spark.examples.SparkPi --master local[1] /home/spark-1.3.1-bin-hadoop2.4/lib/spark-examples-1.3.1-hadoop2.4.0.jar 100
二\standalone集群模式:
需要的配置项
1, slaves文件
2, spark-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_67
export SPARK_MASTER_IP=h15
export SPARK_MASTER_PORT=7077
export SPARK_WORKER_CORES=1
export SPARK_WORKER_INSTANCES=1
export SPARK_WORKER_MEMORY=1g
(1)standalone集群模式之client模式:
结果xshell可见:
进入h15的bin目录下
#sh spark-submit --class org.apache.spark.examples.SparkPi --master spark://h15:7077 --executor-memory 1G --total-executor-cores 1 /home/spark-1.3.1-bin-hadoop2.4/lib/spark-examples-1.3.1-hadoop2.4.0.jar 100
(2)standalone集群模式之cluster模式:
结果h15:8080里面可见!
#sh spark-submit --class org.apache.spark.examples.SparkPi --master spark://h15:7077 --deploy-mode cluster --supervise --executor-memory 1G --total-executor-cores 1 /home/spark-1.3.1-bin-hadoop2.4/lib/spark-examples-1.3.1-hadoop2.4.0.jar 100
注意:一定要在alive的spark里面执行语句
三\Yarn集群模式:
1,进入spark master alive的h15中关闭所有spark
#sh stop-all.sh
再关闭h18的spark master standby
#sh stop-master.sh
需要的配置项(所有spark机器都需要)
2, 配置h15的spark-env.sh
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
export SPARK_HOME=/home/spark-1.3.1-bin-hadoop2.4
export SPARK_JAR=/home/spark-1.3.1-bin-hadoop2.4/lib/spark-assembly-1.3.1-hadoop2.4.0.jar
export PATH=$SPARK_HOME/bin:$PATH
复制到h16上
#scp /home/spark-1.3.1-bin-hadoop2.4/conf/spark-env.sh root@h16:/home/spark-1.3.1-bin-hadoop2.4/conf/
H18上不能覆盖!需要手动修改,因为h18是standby master
3, ~/.bash_profile
配置好hadoop环境变量
4\测试
启动hadoop集群
#start-dfs.sh
#start-yarn.sh
保证h15是active的!
(1)Yarn集群模式之client模式:
结果xshell可见:
在h15的spark的bin目录下执行:
#sh spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client
可在页面查看任务进度:
http://h15:8088/cluster
结果显示在xshell界面上,而不是在h15:8080里面
(2)Yarn集群模式、cluster模式:
结果在h15:8088监控里面--》点击任务ID--->依次点击进入可见pi的值
#sh spark-submit --class org.apache.spark.examples.SparkPi --master yarn-cluster --executor-memory 1G --num-executors 1 /home/spark-1.3.1-bin-hadoop2.4/lib/spark-examples-1.3.1-hadoop2.4.0.jar 100
结果:
备注:
离开安全模式 :hadoop dfsadmin -safemode leave
在虚拟机上执行spark 的standalone模式
sh spark-submit --master spark://de2:7077 --class 全类名 --driver-class-path /mysql-connector-java-5.1.26.jar sparkstreaming.jar
在虚拟机上执行spark 的standalone模式之cluster模式
sh spark-submit --class com.day6.scala.my.PresistMysqlWordCount --master yarn-cluster --driver-class-path /home/spark-1.5.1-bin-hadoop2.4/lib/mysql-connector-java-5.1.31-bin.jar /home/spark-1.5.1-bin-hadoop2.4/sparkstreaming.jar
如果是后台执行,那么需要使用
#nohup ****** &
一\H15上进行单词统计测试(h15上服务器敲代码测试)
1\启动zookeeper 、start-all.sh、start-master
2\进入单机测试模式:spark-shell --master local
3\准备数据文件:text.txt
4\编写代码
读入RDD
#var lines=sc.textFile(“file:opt/testspark/text.txt”)
压扁
#var world=files.flatMap(_.split(" "))
//map
#var pairs=world.map((_,1)) //注意元组
//统计
#var wd=pairs.reduceByKey(_+_)
//排序
#wd.sortByKey(true).collect() //实现key按字典排序
#wd.sortBy(c => c._2 ,true).collect //实现value按升序排序
二\eclipse上编写代码然后再服务器上执行
1\编写scala代码
导入需要的jar包:spark-assembly-1.3.1-hadoop2.4.0.jar
2\编写类
package com.scala.day2
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
/**
* 单词统计
*/
object WorldCount {
def main(args:Array[String]):Unit={
//初始化
val conf=new SparkConf()
conf.setAppName("woldcount")
// conf.setMaster("local") //设置是单机模式测试直接在eclipse上执行
conf.setMaster("spark://h15:7077") //用于standone服务器上执行
//上下文
val sc=new SparkContext(conf)
//读取文件
// val lines=sc.textFile("world.txt",1) //单机模式测试时的文件路径
file:///opt/testspark/world.txt") //如果是standalone模式之client方式,则需要用linux机器上的文件
//如果是hdfs上的数据,那么路径写”hdfs://????”
//压缩
val paris=lines.flatMap( _.split(" ") )
//根据相同的key统计
val map2=paris.map((_,1))
//分组
val wd=map2.reduceByKey((a,b)=>a+b)
// val wd=map2.reduceByKey(_+_)
//保存任务信息到文件夹中
// wd.saveAsTextFile("G:/world2")
//var wd =paris.reduce(x =>(x,1))
//排序并输出
// val sort=wd.sortBy( x => x._2 ,true).collect()
//第二种方式:val sort=wd.map(kv=>(kv._2,kv._1)).sortByKey(false).map(kv=>(kv._2,kv._1)).collect()
val sort=wd.map(kv=>(kv._2,kv._1)).sortByKey(false).map(kv=>(kv._2,kv._1)).collect()
for( stri <-sort) println(stri._1+" "+stri._2)
sc.stop()
}
}
3\在eclipse上将此类(其他包不用管)打成jar包:wd.jar
4\复制此wd.jar到h15\h16\h18的目录/home/spark-1.3.1-bin-hadoop2.4/lib下
5\将测试的txt文件放在h15\h16\h18的/opt/testspark目录下
6\在spark的bin目录下执行命令
# sh spark-submit --class com.scala.day2.WorldCount --master spark://h15:7077 --executor-memory 1G --total-executor-cores 1 /home/spark-1.3.1-bin-hadoop2.4/lib/wd.jar
#sh spark-submit --class com.scala.day2.WorldCount --master spark://h15:7077 /home/spark-1.3.1-bin-hadoop2.4/lib/wd.jar
#sh spark-submit --class com.scala.day2.WorldCount --master local /home/spark-1.3.1-bin-hadoop2.4/lib/wd.jar
//集群上执行
#sh spark-submit --class com.scala.day2.WorldCount --master spark://h15:7077 --deploy-mode cluster --supervise --executor-memory 1G --total-executor-cores 1 /home/spark-1.3.1-bin-hadoop2.4/lib/wd.jar