• Bootcamp (9)
    • 📱 236 - 992 - 3846

      📧 jxjwilliam@gmail.com

    • Version: ‍🚀 1.1.0
  • 201 Streaming 5

    BootcampBigdata2020-12-17


    Essential concepts in Spark Streaming:

    • StreamingContext
    • Stream Operators
    • Batch, Batch time and Job Set
    • Streaming Job
    • Discretized Streams (Dstream) 离散
    • Receivers

    Discretized Stream (DStream) is the fundamental concept of Spark Streaming. It is basically a stream of RDDs with elements being the data received from input streams for batch (possibly extended in scope by windowed or stateful operators).

    Spark streaming

    • Provides a way to consume continuous stream of data
    • Scalable, high-throughput, fault-tolerant
    • Built on top of Spark Core

      • Based on RDDs
      • API is similar to Core API
    • Supports many inputs
    • Integrates with other Spark modules
    • Micro batch architecture

    Spark sources

    • Basic sources

      • Packaged together with Spark Streaming library
      • Available through 
      • Socket Streams, File Streams, Actor Streams
    • Advanced sources

      • Packaged separately
      • Available through utility classes
      • Kafka, Flume, Kinesis, Twitter
    • Custom sources

    Kafka Integration

    spark-kafka-integration

    Reference