Kafka
Introduction
Kafka is a distributed event streaming platform. Kafka is suitable for low latency that suitable to process stream of data.
Kafka is a pull-model where each Kafka Consumer will pull for a message in a Kafka topic
Kafka has 2 mode
- Zookeeper mode
- Kafka Raft (KFRAFT) to manage the consensus
- Kraft is better since it's easier to scale, lighter
- New node joining the circle doesn't need to re-load the memory state in zookeeper. The configuration is already in Kraft in-memory
- Zookeeper is hard to scale because it's own a cluster.
Architecture
A producer can publish a message to a Kafka topic. It can publish to multiple topics.
- Each Kafka topic has multiple Kafka Partition
- Each Kafka Partition can be subscribed by 1 Kafka Consumer in a Kafka Consumer Group
- 1 Kafka Consumer in the same Kafka Consumer Group can subscribe to multiple Kafka Partition
- As a result, each Kafka Partition can be subscribed by many consumer in different Kafka Consumer Group.
- The message inside a Kafka Partition is ordered
- Each Kafka Partition can be subscribed by 1 Kafka Consumer in a Kafka Consumer Group
- 1 message from publisher to a Kafka topic will be hashed to 1 particular Kafka Partition
- Multiple Kafka Consumer from different Kafka Consumer Group can work on the same message
- Each Kafka Consumer store the Kafka Offset (next message id to read) and keep incrementing the offset as they read
- The message when finished will be stayed in the partition for 7 days (default configuration)
- Multiple Kafka Consumer from different Kafka Consumer Group can work on the same message
Publishing model
When publish a message, publisher should specify the key so that it's hashed to a partition. The producer will keep track of internal metadata map so it knows which partition it should send to. Producer will hash and put the message in the partition itself
If publisher does not specify the key:
- Old kafka (
< 2.4): we use Round robin to distribute the task - Newer version: we use Sticky partioner
graph TD
subgraph Round_Robin_Old[Round robin]
P1[Producer] --> M1((M1))
P1 --> M2((M2))
P1 --> M3((M3))
M1 --> Part0[Partition 0]
M2 --> Part1[Partition 1]
M3 --> Part2[Partition 2]
end
subgraph Sticky_New[Sticky partitioner]
P2[Producer] --> B1[Batch: M1, M2, M3]
B1 --> PartA[Partition 0]
P2 -.-> B2[Next Batch]
B2 -.-> PartB[Partition 1]
end
Consuming model
The same topic can be consumed by different consumer in a consumer group to process the message in parallel. Each Kafka Consumer store the Kafka Offset (next message id to read) and keep incrementing the offset as they read
[!danger]
1 topic partition can be subcribed by only 1 consumer in the same consumer group
[!tip]
1 consumer can also subscribe to mulitple topics, multiple partitions