8 Streaming
BootcampBigdata2020-12-17
Kafka
Kafka has four core APIs:
- The
Producer APIallows an application to publish a stream of records to one or more Kafka topics. - The
Consumer APIallows an application to subscribe to one or more topics and process the stream of records produced to them. - The
Streams APIallows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams. - The
Connector APIallows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.
Kafka
Apache Kafka is an open-source stream-processing software platform dveloped by the Apache Software Foundation written in Scala and Java. A distributed streaming platform called Apache Kafka.
Kafka Brokers: clusters
Zookeeper Servers to stroe metadata about brokers, topics and partitions. And Kafka provides a topics for a stream of records.
- partitions
- replica
- integrate with hadoop batch jobs, spark stream
- pub/sub
- open source
Kafka streams
- Lightweight ETL library within Kafka
- Java application
- Highly-scalable and fault tolerant
- No need to create cluster
- Supports exactly-once processing capabilities
- One record at time processing (no batching)
- Viable (可行) for all types of application
- First-class integration with Kafka
- Supports interactive queires to unify the worlds of streams and databases
- Millisecond processing latency
- Open-source
- Kafka to kafka platform - external systems not recormmended. Use Kafka Connect
Spark
Apache Spark is a unified analytics engine for lare-scale data processing: batch, streaming, machine learning, graph computation. Access data in hundreds of data sources.
What Apache Spark can do:
- Spark SQL and batch processing
- Stram processing with Spark Streaming and Structured Streaming
- Machine Learning with Mllib
- Graph computations with GraphX
