0
点赞
收藏
分享

微信扫一扫

运行 spark on k8s

首先 已有一个高于1.8的k8s集群 我的是CentOS7

下载spark 2.4.0 - https://www.apache.org/dyn/closer.lua/spark/spark-2.4.0/spark-2.4.0-bin-hadoop2.7.tgz

wget 下载地址

tar -xzvf 压缩包名


然后参考官方文档 - http://spark.apache.org/docs/2.4.0/running-on-kubernetes.html#running-spark-on-kubernetes   注意里面的prerequisites

此处我们尝试 spark on k8s 的 cluster mode


打包镜像:


首先 

$ docker login

登陆注册过的dockerhub账户

然后在spark解压的目录下,打包和发布docker 镜像

方法1:

$ ./bin/docker-image-tool.sh -r <repo> -t my-tag build  

$ ./bin/docker-image-tool.sh -r <repo> -t my-tag push

<repo>我用的是docker id   my-tag就是个tag名


方法2:

cd /path/to/spark-2.4.0-bin-hadoop2.7

docker build -t <your.image.hub/yourns>/spark:2.4.0 -f  kubernetes/dockerfiles/spark/Dockerfile  .

docker push <your.image.hub/yourns>/spark:2.4.0


之后可发现dockerhub里多了几个镜像

查看我发布的spark镜像

可以 docker pull morphtin/spark:sparkonk8s 下载我的镜像



To launch Spark Pi in cluster mode,

进入spark目录下,

$ bin/spark-submit \

    --master k8s://https://<k8s-apiserver-host>:<k8s-apiserver-port> \

    --deploy-mode cluster \

    --name spark-pi \

    --class org.apache.spark.examples.SparkPi \

    --conf spark.executor.instances=5 \

    --conf spark.kubernetes.container.image=<spark-image> \

    local:///path/to/examples.jar


可通过以下命令查看apiserver的url

$ kubectl cluster-info

    Kubernetes master is running at http://x.x.x.x:xxxx

此处我的是  k8s://http://localhost:8080


<spark-image>是运行的镜像  此处我用的是 morphtin/spark:sparkonk8s 与dockerhub一致


local:///path/to/examples.jar   指的是image内部的jar的路径   此处我的路径为 /opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar

所以我的提交如下


bin/spark-submit \

    --master k8s://https://ip:port \

    --deploy-mode cluster \

    --name spark-pi \

    --class org.apache.spark.examples.SparkPi \

    --conf spark.executor.instances=5 \

    --conf spark.kubernetes.container.image=morphtin/spark:sparkonk8s \

    local:///opt/spark/examples/jars/spark-examples_2.11-2.4.0.jar



Spark uses the following URL scheme to allow different strategies for disseminating jars:

file: - Absolute paths and file:/ URIs are served by the driver’s HTTP file server, and every executor pulls the file from the driver HTTP server.

hdfs:http:https:ftp: - these pull down files and JARs from the URI as expected

local: - a URI starting with local:/ is expected to exist as a local file on each worker node. This means that no network IO will be incurred, and works well for large files/JARs that are pushed to each worker, or shared via NFS, GlusterFS, etc.


查看spark-submit 详解

客户端模式运行spark详见官方文档



Cluster Manager Types

The system currently supports three cluster managers:

Standalone – a simple cluster manager included with Spark that makes it easy to set up a cluster.

Apache Mesos – a general cluster manager that can also run Hadoop MapReduce and service applications.

Hadoop YARN – the resource manager in Hadoop 2.

Kubernetes – an open-source system for automating deployment, scaling, and management of containerized applications.

A third-party project (not supported by the Spark project) exists to add support for Nomad as a cluster manager.



可能遇到的坑:

原文:https://blog.csdn.net/ZQZ_QiZheng/article/details/79540487

spark 自带的exemples是用jdk1.8编译的,如果启动过程中提示Unsupported major.minor version 52.0请更换jdk版本;

spark-submit默认会去~/.kube/config去加载集群配置,故请将k8s集群config放在该目录下;

spark driver 启动的时候报错Error: Could not find or load main class org.apache.spark.examples.SparkPi

spark 启动参数的local://后面应该跟你自己的spark application在容器里的路径;

spark driver 启动抛异常Caused by: java.net.UnknownHostException: kubernetes.default.svc: Try again, 请保证 k8d let节点间网络互通;

spark driver 启动抛异常system: serviceaccount: default: default" cannot get pods in the namespace "default, 权限问题,执行一下两条命令:

kubectl create rolebinding default-view --clusterrole=view --serviceaccount=default:default --namespace=defalut 和

kubectl create rolebinding default-admin --clusterrole=admin --serviceaccount=default:default --namespace=default 后就可以了


举报

相关推荐

0 条评论