Kafka

Introduction

Kafka is a distributed event streaming platform. Kafka is suitable for low latency that suitable to process stream of data.

Kafka is a pull-model where each Kafka Consumer will pull for a message in a Kafka topic

Kafka has 2 mode

Zookeeper mode
Kafka Raft (KFRAFT) to manage the consensus
- Kraft is better since it's easier to scale, lighter
- New node joining the circle doesn't need to re-load the memory state in zookeeper. The configuration is already in Kraft in-memory
- Zookeeper is hard to scale because it's own a cluster.

Architecture

A producer can publish a message to a Kafka topic. It can publish to multiple topics.

Each Kafka topic has multiple Kafka Partition
- Each Kafka Partition can be subscribed by 1 Kafka Consumer in a Kafka Consumer Group
  - 1 Kafka Consumer in the same Kafka Consumer Group can subscribe to multiple Kafka Partition
- As a result, each Kafka Partition can be subscribed by many consumer in different Kafka Consumer Group.
  - The message inside a Kafka Partition is ordered
1 message from publisher to a Kafka topic will be hashed to 1 particular Kafka Partition
- Multiple Kafka Consumer from different Kafka Consumer Group can work on the same message
  - Each Kafka Consumer store the Kafka Offset (next message id to read) and keep incrementing the offset as they read
- The message when finished will be stayed in the partition for 7 days (default configuration)

Publishing model

When publish a message, publisher should specify the key so that it's hashed to a partition. The producer will keep track of internal metadata map so it knows which partition it should send to. Producer will hash and put the message in the partition itself

If publisher does not specify the key:

Old kafka (< 2.4): we use Round robin to distribute the task
Newer version: we use Sticky partioner

graph TD
    subgraph Round_Robin_Old[Round robin]
    P1[Producer] --> M1((M1))
    P1 --> M2((M2))
    P1 --> M3((M3))
    M1 --> Part0[Partition 0]
    M2 --> Part1[Partition 1]
    M3 --> Part2[Partition 2]
    end

    subgraph Sticky_New[Sticky partitioner]
    P2[Producer] --> B1[Batch: M1, M2, M3]
    B1 --> PartA[Partition 0]
    P2 -.-> B2[Next Batch]
    B2 -.-> PartB[Partition 1]
    end

Consuming model

The same topic can be consumed by different consumer in a consumer group to process the message in parallel. Each Kafka Consumer store the Kafka Offset (next message id to read) and keep incrementing the offset as they read

[!danger]
1 topic partition can be subcribed by only 1 consumer in the same consumer group

[!tip]
1 consumer can also subscribe to mulitple topics, multiple partitions

Kafka

Introduction

Architecture

Publishing model

Consuming model

Kafka scaling