• Bootcamp (9)
    • 📱 236 - 992 - 3846

      đź“§ jxjwilliam@gmail.com

    • Version: ‍🚀 1.1.0
  • Spark Kubernetes

    BootcampBigdata2020-12-17


    Apache Spark On Kubernetes

    $ gcloud container clusters get-credentials sparky --zone us-east1-b --project swarm-1358
    
    $ kubectl get pods
    
    $ bin/spark-submit 
        --deploy-mode cluster 
        --class MovieLensALS 
        --master k8s://https://35.185.52.83 
        --kubernetes-namespace=dev 
        --conf spark.app.name=spark-movielens 
        --conf spark.executor.instances=3 
        --conf spark.kubernetes.driver.pod.name=spark-movielens 
        local:///dataset/movielens-als-assembly-0.1.jar 
        file:///dataset/movie-small/ 
        file:///dataset/personalRatings.txt
    • Spark on Kubernetes Ecosystem: Kafka, Cassandra, HDFS, etc
    • Pod = 1 or more containers
    • Pods schedule on nodes
    • Nodes run node agents called kubelet
    • All communication happens through the API Server
    • User/app creates “controllers”
    • Controllers create pods/other resources
    • Spark Driver is a custom controller
    • spark shell ??

    Demo

    • Zeppelin, mvn, docker, kubectl get services, kubectl get pods, kubectl create -f zeppelin.yaml,
    • Spark UI: localhost:4040
    • Kubernetes UI: localhost:8001