Spark Kubernetes
BootcampBigdata2020-12-17
Apache Spark On Kubernetes
$ gcloud container clusters get-credentials sparky --zone us-east1-b --project swarm-1358
$ kubectl get pods
$ bin/spark-submit
--deploy-mode cluster
--class MovieLensALS
--master k8s://https://35.185.52.83
--kubernetes-namespace=dev
--conf spark.app.name=spark-movielens
--conf spark.executor.instances=3
--conf spark.kubernetes.driver.pod.name=spark-movielens
local:///dataset/movielens-als-assembly-0.1.jar
file:///dataset/movie-small/
file:///dataset/personalRatings.txt- Spark on Kubernetes Ecosystem: Kafka, Cassandra, HDFS, etc
Pod= 1 or more containers- Pods schedule on
nodes - Nodes run node agents called
kubelet - All communication happens through the
API Server - User/app creates “controllers”
Controllerscreate pods/other resources- Spark Driver is a custom controller
spark shell??
Demo
- Zeppelin, mvn, docker, kubectl get services, kubectl get pods, kubectl create -f zeppelin.yaml,
- Spark UI: localhost:4040
- Kubernetes UI: localhost:8001
