Kafka

Kafka has four core APIs:

The Producer API allows an application to publish a stream of records to one or more Kafka topics.
The Consumer API allows an application to subscribe to one or more topics and process the stream of records produced to them.
The Streams API allows an application to act as a stream processor, consuming an input stream from one or more topics and producing an output stream to one or more output topics, effectively transforming the input streams to output streams.
The Connector API allows building and running reusable producers or consumers that connect Kafka topics to existing applications or data systems. For example, a connector to a relational database might capture every change to a table.

Kafka

Apache Kafka is an open-source stream-processing software platform dveloped by the Apache Software Foundation written in Scala and Java. A distributed streaming platform called Apache Kafka.

Kafka Brokers: clusters

Zookeeper Servers to stroe metadata about brokers, topics and partitions. And Kafka provides a topics for a stream of records.

partitions
replica
integrate with hadoop batch jobs, spark stream
pub/sub
open source

Kafka streams

Lightweight ETL library within Kafka
Java application
Highly-scalable and fault tolerant
No need to create cluster
Supports exactly-once processing capabilities
One record at time processing (no batching)
Viable (可行) for all types of application
First-class integration with Kafka
Supports interactive queires to unify the worlds of streams and databases
Millisecond processing latency
Open-source
Kafka to kafka platform - external systems not recormmended. Use Kafka Connect

Spark

Apache Spark is a unified analytics engine for lare-scale data processing: batch, streaming, machine learning, graph computation. Access data in hundreds of data sources.

What Apache Spark can do:

Spark SQL and batch processing
Stram processing with Spark Streaming and Structured Streaming
Machine Learning with Mllib
Graph computations with GraphX

8 Streaming

Kafka

Kafka

Kafka streams

Spark

7 Docker

9 Elasticsearch