Basic Introduction on Kafka — Part I

Basic intro on kafka

Dileep
4 min readMar 7, 2021

This is the 1st part of the article on Kafka.

After reading this section you will know about the basic concept of Kafka

  • What is Kafka
  • How Kafka Works
  • What are the different components of Kafka?

Lets Begin’s

In short Kafka is a distributed queuing system based on pub/sub model. Founded by Linkedin

Different components of kafka

  • Messages: A single unit of data within Kafka called a message. you can think message as row or record as in database.
  • Topic: Messages in kafka are categorized in topic. you can think topic as a database table or a folder in a filesystem.
  • Partition: Each topic is divided into multiple partitions. if you consider topic as folder then partition is a sub-folders.
  • Offset : Offset in a partition is a number which is assigned to a message, which is unique in a partition
Courtesy of O’Reily Media
  • Producers: Producers creates new messages, also know as a publisher. it put messages into kafka queue in a specific topic.
  • Consumer: Consumer read messages, also known as subscriber. it can subscribe to many topics and read messages in the order in which they are stores in topic.
  • Partitioner: Partitioner assigns partition for each message. (message is store inside a partition and a topic can contain multiple partition)
  • Broker: Broker is a kafka server, which runs inside a cluser

How Kafka Works on Top

Producer pushes a new message to Kafka broker in a specific topic. A topic may contain different partition. Partitioner assigns partition for that message and then message store in a partition in an append-only manner. We can’t change a message once it stores in Kafka. Consumer subscribes to topic and poll messages from that topic. We can use a collection of consumers subscribe to a topic to scale the consumption rate. (will talk in deep later) .

Why Kafka

  • Disk-Based Retention: Message are stored in a disk. so consumer do not always need to work in a real-time. messages are stored in disk-based on configuration rule (like only store last 7 days messages ). Kafka give durable retention (consumer can consume messages later if by change consumer down for some time)
  • Scalable: Kafka is highly scalable, as we start Kafka by single broker and as our requirement, we can run tens of Kafka servers in a cluster.

Some Kafka Configuration:

log.dirs : Kafka Stores all messages to disk in form of segments (kafka store message in segments in disk — will discuss later) are stored in the directory specified in logs.dirs parameter in config file.

num.partitions : Determine by default, how many partitions a new topic is created with.

log.retention.ms : How long Kafka will retain messages is by time. (way to expire messages)

log.retention.bytes : the total number of bytes of messages
retained and it is applied per-partition. This means that if you have a topic with 8 partitions, and log.retention.bytes is set to 1 GB, the amount of data retained for the topic will be 8 GB at most. (way to expire messages)

log.segment.bytes : The log-retention settings operate on log segments, not individual messages. when messages produced to Kafka broker, they are appended to log segment for the partition. Once the log segment has reached the size specified by the log.segment.bytes parameter. the log segment is closed and a new one is opened and the old one eligible for expiration. (way to expire messages)

Zookeeper

  • Kafka utilizes Zookeeper for storing metadata information about the brokers, topics, and partitions.
  • Every broker assigned a unique id.
  • Every time a broker process starts, it registers itself with its ID in Zookeeper by creating an ephemeral node.
  • Different Kafka components subscribe to the /brokers/ids path in Zookeeper where brokers are registered so they get notified when brokers are added or removed.

Thanks for Reading

Deep Dive Into Concept of kafka Producer — PART II

--

--

Dileep

Passionate about coding, cyber security | Software Engineer | IIT Roorkee.