Deep Dive Into Concept of Kafka Consumer — Part III
This is the 3rd Part of our article on Kafka.
If you didn’t read the second article then I recommended to read the 2nd one before reading this — Part II
Consumer
Consumer is used to read messages from topic with the help of ConsumerRecord.
Consumer Groups
Consumer groups contain multiple consumers which are subscribed to a single topic. let say our application producing message rate is greater the single consumer consumption rate then in this case if we continue with single consumer then it will pile-up message in kafka for that topic and your application fall farther and farther behind, unable to keep up with the rate of incoming messages. In that case we need to scale topic consumption and we need to allow multiple consumers to read from the same topic, splitting the data between them.
** Single partition messages can only be consumed by a single consumer at a point of time**
A consumer can consume messages from different partition in a topic but a partition message can’t consume by multiple consumers.
The main way we scale data consumption from a Kafka topic is by adding more consumers to a consumer group. If we add more consumers than no. of partition in a single group subscribe to a topic. some of the consumers will be idle
There is no point in adding more consumers than you have partitions in a topic. some of the consumers will just be idle.
When we add a new consumer in group, it start consuming messages from partition that is previously consume by other consumer. The same thing happens when a consumer shuts down or crashes; it leaves the group, and the partitions it used to consume will be consumed by one of the remaining consumers.
Rebalancing: Partition ownership movement when a new consumer add in group or leave. (Consumer can’t consume messages during rebalancing, that’s why rebalancing is a short time).
Group Coordinator(broker): Every consumer sending heartbeat to Group Coordinator
(GC). as long as consumer send heartbeat to GC, it assumed to be alive and can process messages from the partition. when consumer stop send heart beat to GC. GC thinks consumer dead and trigger rebalance.
Commits and Offsets: Offsets
are the numbers which is assigned to every message in a partition. Commits
is no of messages that are consumed by consumers till now from a partition in a topic.
We poll some messages from the topic and process them. After that again when we ask for next batch of messages then commits
come in handy. This commit related details regarding a partition kafka store in __consumer_offsets
topic. so by any chance one consumer goes down and other consumer start consuming messages from that partition. it will get commit from __consumer_offsets
topic and start consuming messages after the last commit.
Thanks for reading.