• Bootcamp (9)
    • 📱 236 - 992 - 3846

      📧 jxjwilliam@gmail.com

    • Version: ‍🚀 1.1.0
  • 8 Streaming

    BootcampBigdata2020-12-17


    Kafka

    Kafka has four core APIs:

    1. The Producer API allows an application to publish a stream of records to one or more Kafka topics.
    2. The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
    3. The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
    4. The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

    Kafka

    Apache Kafka is an open-source stream-processing software platform dveloped by the Apache Software Foundation written in Scala and Java. A distributed streaming platform called Apache Kafka.

    Kafka Brokers: clusters

    Zookeeper Servers to stroe metadata about brokers, topics and partitions. And Kafka provides a topics for a stream of records.

    • partitions
    • replica
    • integrate with hadoop batch jobs, spark stream
    • pub/sub
    • open source

    Kafka streams

    • Lightweight ETL library within Kafka
    • Java application
    • Highly-scalable and fault tolerant
    • No need to create cluster
    • Supports exactly-once processing capabilities
    • One record at time processing (no batching)
    • Viable (可行) for all types of application
    • First-class integration with Kafka
    • Supports interactive queires to unify the worlds of streams and databases
    • Millisecond processing latency
    • Open-source
    • Kafka to kafka platform - external systems not recormmended. Use Kafka Connect

    Spark

    Apache Spark is a unified analytics engine for lare-scale data processing: batch, streaming, machine learning, graph computation. Access data in hundreds of data sources.

    What Apache Spark can do:

    • Spark SQL and batch processing
    • Stram processing with Spark Streaming and Structured Streaming
    • Machine Learning with Mllib
    • Graph computations with GraphX