Deep Dive Into Concept of Kafka Producer — Part II
This is the 2nd Part of the article on Kafka II.
If you didn’t read the first article then I recommended to read the 1st article before reading this — Part I
Producer
If we want to put a message in Kafka broker. we need a producer which is responsible to send message to kafka broker.
Simple Funda How Producer Works
We start producing messages to Kafka by creating a ProducerRecord
, which must include the topic and a message. Optionally, we can also specify a key and/or a partition. Once we send the ProducerRecord
, the first thing the producer will do is serialize the key and value objects to ByteArrays so they can be sent over the network.
ProducerRecord<String, String> record =
new ProducerRecord<>(String topic, Integer partition, K key, V value);
Next, the data is sent to a partitioner
. If we specified a partition in theProducerRecord
, the partitioner doesn’t do anything and simply returns the partition we specified. If we didn’t, the partitioner will choose a partition for us, usually based on the ProducerRecord
key. Once a partition is selected, the producer knows which topic and partition the record will go to. It then adds the record to a batch of records that will also be sent to the same topic and partition. A separate thread is responsible for sending those batches of records to the appropriate Kafka brokers.
When the broker receives the messages, it sends back a response. If the messages were successfully written to Kafka, it will return a RecordMetadata
object with the topic, partition, and the offset of the record within the partition.
How Partitionar assign partition for a message
Key which ProducerRecord
contain play an important role during partition assignment.
All messages with the same key will go to the same partition.
This make sure that all the messages with same keys always processed in same order in which they are store in a partition. if the key is null and default partitioner is used, then the message will send to one of the random partition of the topic. Round-robin algorithm will be used to balance the messages among the partitions.
If ProducerRecord
contain key then the partitioner hash the key using own hash algo and applies the modulo operation by the number of partitions to get partition for that message. Since it is important that a key is always mapped to the same partition, we use all the partitions in the topic to the mapping — not just the available partitions. This means that if a specific is unavailable when you write data to it, you might get an error.
Mapping of key to partition is consistent till no. of partition in that topic does not change. However, the moment you add new partitions to the topic, then all our hash function will change and key to partition mapping will also change.
Configuring Producers
One more concept in kafka in replica.
will discuss this later in detail. but for now, in short — a partition can also have replica (one or more). replica as name suggests copy of leader partition(master replica).
Acks
The acks
parameter controls how many partition replicas must receive the record before the producer can consider the write successful.
if ack=0
producer will not wait for a reply from broker before assuming the message was sent successfully. in this case, you will not know that your message successfully write to broker or not.
if ack=1
the producer will receive a success response from the broker the
moment the leader replica received the message. If the message can’t be written to the leader replica (master replica
) the producer will receive an error response and can retry sending the message, avoiding potential loss of data.
if ack=1
the producer will receive a success response from the broker once all
in-sync replicas received the message. Of course, this is the safest mode but this will increase your latency.
Thanks for reading.