Kafka
BootcampBigdata22020-12-17
Kafka
Apache Kafka is an open-source stream-processing software platform dveloped by the Apache Software Foundation written in Scala and Java
- A “high throughput distributed messaging system”
- Publish / subscribe
- Fast, scalable, durable, distributed
- Popular way to send data reliably over a cluster
- Kafka Brokers, clusters, streaming, topics
- producers - send
- consumers - receive
- broker - kafka server
- cluster - group of computers
- topic - a name for a kafka stream
- partition - part of a topic
- offset - unique id for a message within a partition
- consumer groups - a group of consumer acting as a single logical unit
Kafka高效地处理实时流式数据,可以实现与Storm、HBase和Spark的集成。作为群集部署到多台服务器上,Kafka处理它所有的发布和订阅消息系统使用了四个API,即生产者API、消费者API、Stream API和Connector API。它能够传递大规模流式消息,自带容错功能,已经取代了一些传统消息系统,如JMS、AMQP等。
$ brew cask install homebrew/cask-versions/java8
$ brew install kafkazookeeper
Zookeeper Servers to stroe metadata about brokers, topics and partitions. And Kafka provides a topics for a stream of records.
Spark
Apache Spark is a unified analytics engine for lare-scale data processing: batch, streaming, machine learning, graph computation. Access data in hundreds of data sources.
What Apache Spark can do:
- Spark SQL and batch processing
- Stram processing with Spark Streaming and Structured Streaming
- Machine Learning with Mllib
- Graph computations with GraphX
How to Install Scala and Apache Spark on MacOS
$ brew install scala
$ brew install apache-spark
$ spark-shellKafka with Spark Streaming
Kafka + Spark = Reliable, scalable event ingestion and real-time stream processing
Event stream processing architecture on Azure with Apache Kafka and Spark
As of Spark 13, Spark Streaming can connect directly to kafka
- Kafka is already replicated and durable, so no need to recreate that functionality with Spark receivers!
- Prior to Spark 1.3, you’d connect to a Zookeeper host instead, and it would be possible to lose data if that receiver went down. Need to get the spark-streaming-kafka package (it’s not built-in)
sbt
Commands
// install Java1.8的jdk
$ brew cask install homebrew/cask-versions/java8
$ brew install kafka
// To have launchd start kafka now and restart at login:
$ brew services start kafka
// Or, if you don't want/need a background service you can just run:
$ zookeeper-server-start /usr/local/etc/kafka/zookeeper.properties & kafka-server-start /usr/local/etc/kafka/server.properties
// To have launchd start zookeeper now and restart at login:
$ brew services start zookeeper
// Or, if you don't want/need a background service you can just run:
$ zkServer start