What is Kafka?
Apache Kafka is a distributed event store and stream-processing platform created by LinkedIn. It is an open-source system developed by the Apache Software Foundation written in Java and Scala. The project aims to provide a unified, high-throughput, low-latency platform for handling real-time data feeds.
Kafka Is a Distributed Data Streaming Technology Leveraged by Over 70% of Fortune 500. Thousands of Companies Are Built on the Data Streaming Platform Apache Kafka.
Real-time data processing with Apache Kafka became the de facto standard to correlate and prevent fraud continuously before it happens.
Kafka Consumer Group working :
When messages are published to the topics by the producer than the rate through which messages are consumed by the consumer helps us to know the lag in time between the producer and consumer through the time rate of message consumption.An offset value is set for messages to measure the lag or the number of messages behind.
Here offset is actually the position of the published message in the topic.
To calculate the message lag -
Message Behind = (End offset - current offset)
The value of this lag should not be too high.Too high lag means messages are not consumed at a given rate and that could also mean there is some bug or error generated in the kafka server that messages are not getting consumed.
Kafka
Kafka de-couples data pipelines and solve the complexity problems.It is a distributed messaging system. Kafka has 3 building blocks -
- Producer
- Consumer
- Broker
Producer -are the applications that are producing messages to Kafka. It can be any service or any program that we have coded as
Producer API to produce the messages to the topics..This producer creates different messages which r consumed by the broker.
It sends or write data/messages to the topics.
Brokers - the cluster of servers , machine/ computers.it stores the messages in the topics in the form of partitions.A Kafka server is known as broker.It is a bridge between producer and consumer.If producer wishes to write data to the cluster it is sent to the kafka server.
Consumer - are the applications that are consuming the data from kafka instances like- DB server, security server,data warehouse etc. the consumer needs to consume the messages from the broker it needs the consumer API.
It reads or consume messages from the kafka cluster.It subscribe to the respective topics to fetch the data. Kafka Cluster -It comprises of different brokers,topics and their respective partitions.
Topics - It is common name or heading given to represent similar types of data.Data is written to the topics.
Partitions -The data/messages are divided into small subparts known as partition.Each partition carries data within it having an offset value.
Zookeeper - Use to store indormation about kafka cluster and details of the consumer.If any changes occur like - broker die,new topics occurs etc.Zookeeper sends notifiction to apache kafka.When kafka cluster will come to know which broker are down, more topics are added etc.It also handles load distribution among partitions.
Kafka APIs
4 major kafka APIs are -
- Producer APIs
- Consumer APIs
- Stream APIs
- Admin APIs
The Admin API for inspecting and managing Kafka objects like topics and brokers.
The Producer API writing (publishing) to topics.
The Consumer API for reading (subscribing to) topics.
The Kafka Streams API to provide access for applications and microservices to higher-level stream processing functions.
Kafka features
High Throughput : Deliver messages at network limited throughput using a cluster of machines with latency as low as 2msec.
Scalable : Scale production as allow to transmit trillions of messages per day ,PB of data.
Permanent Storage : Store streams of data safely in a distributed,durable,fault- tolerant cluster.
Kafka Usecase
To process payments and financial transactions in real time such as in stock exchanges, bank & insurances.
To track and monitor cars ,trucks in realtime.
To continuously capture and analyse sensor data from IOT devices and other equipments such as in factories.
To monitor patients in hospital care and predict changes in condition to ensure timely treatment in emergencies.
Companies using Kafka
Netflix : Uses kafka to apply recommendations in real-time while you are watching TV-shows.
Uber : Uses kafka to gather taxi,user and trip data in realtime to compute and forecast demand and surge in pricing in real time.
LinkedIn : To prevent spam,collect user interactions to make better communication ,connection recommendation in real time.
Stream APIs
Kafka Streams is an API for writing client applications that transform data in Apache Kafka. You usually do this by publishing the transformed data onto a new topic. The data processing itself happens within your client application, not on a Kafka broker. Kafka Connect is an API for moving data into and out of Kafka.
Connector APIs
Connector API - These APIs helps to connect the kafka broker to an external entity that may be an external server or a database to let the kafka topics to get consumed.
SUMMARY
Apache Kafka is a back-end application that provides a way to share streams of events between applications.
An application publishes a stream of events or messages to a topic on a Kafka broker. The stream can then be consumed independently by other applications, and messages in the topic can even be replayed if needed.
Kafka Streams is an API for writing client applications that transform data in Apache Kafka. You usually do this by publishing the transformed data onto a new topic. The data processing itself happens within your client application, not on a Kafka broker.
Kafka Connect is an API for moving data into and out of Kafka. It provides a pluggable way to integrate other applications with Kafka, by letting you use and share connectors to move data to or from popular applications, like databases.
Kafka Connect
Kafka Connect is a free, open-source component of Apache Kafka that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems.How does Kafka Connect to database?
Set Up MySQL to Kafka Connection Using Confluent Cloud Console
Step 1: Launch Confluent Cloud Cluster.
Step 2: Add MySQL Kafka Connector.
Step 3: Set Up MySQL to Kafka Connection.
Step 4: Verify and Launch MySQL Kafka Connector.
Step 5: Validate Your Kafka Topic.
Kafka Vs Kafka Connect
Apache Kafka is a distributed streaming platform and kafka Connect is framework for connecting kafka with external systems like databases, key-value stores, search indexes, and file systems, using so-called Connectors.Kafka Connect is only used to copy the streamed data, thus its scope is not broad.It executes as an independent process for testing and a distributed, scalable service support for an organization.
Kafka connect makes our task much easier to connect Kafka to the other systems, without having to write all the glue code yourself.
Kafka Vs Rabbit MQ
Apache Kafka and RabbitMQ are two open-source and commercially-supported publisher/subscriber systems, RabbitMQ is an older tool. It can deal with high- throughput use cases, such as online payment processing. It can handle background jobs or act as a message broker between microservices. No message ordering is provided . In RabbitMQ, you can specify message priorities and consume message with high priority first.RabbitMQ can be used when web servers need to quickly respond to requests.while kafka is newer. Uses High volume publish-subscribe messages and streams platform—durable, fast, and scalable . provides message ordering thanks to its partitioning. Messages are sent to topics by message key. It can achieve high throughput (millions of messages per second) with limited resources, a necessity for big data use cases. While
RabbitMQ can also process a million messages per second but requires more resources (around 30 nodes) RabbitMQ uses a smart broker that can intelligently route the message to different queues. Whereas Kafka relies on consumer to decide what kind of intelligence it want to use for message touring.Consumer just have to pull the messages whenever they r available for that .(ie... Kafka uses pull base mechanism)
Why we go for Kafka is bcoz of it's retention of messages ability .
while in RabbitMQ is just a point 2 point communication queue where we just pushed the message on the queue , once the consumer consumes the message ,it gets deleted from the queue. So, in Kafka messages have the retention period .so, consumer can re- consume the message from the broker.
Kafka Vs Kafka Streams
Every topic in Kafka is split into one or more partitions. Kafka partitions data for storing, transporting, and replicating it. Kafka Streams partitions data for processing it.Kafka Streams is an easy data processing and transformation library within Kafka used as a messaging service. Whereas, Kafka Consumer API allows applications to process messages from topics.
Kafka Vs Rest APIs
Kafka APIs store data in topics. With REST APIs, you can store data in the database on the server. With Kafka API, you often are not interested in a response. You are typically expecting a response back when using REST APIs.kafka provides bidirectional communication . The REST API is unidirectional, i.e., you can only send or receive a response at a time.
Kafka Broker Vs Message Broker[Reabbit MQ]
The broker consistently delivers messages to consumers and keeps track of their status. Kafka uses the dumb broker/smart consumer model. Kafka doesn't monitor the messages each user has read. Rather, it retains unread messages only, preserving all messages for a set amount of time.Requirement of message broker in microservices
Broker means a third party/middleMan that helps to perform or achieve our goal.So, Broker which helps services to do inter-communication via messaging is Message Broker. A message broker is a piece of software, which enables services and applications to communicate with each other using messages.broker ensures communication between different microservices is reliable and stable, that the messages are managed and monitored within the system and that messages don't get lose. Kafka message broker is more reliable for microservice communication than RabbitMQ broker.
Which Microservice Message Broker to choose?
- For RabbitMQ Queuing (One-to-one ) And Publish-subscribe(one-to-many): Both
Persistency: both persistent and transient messages are supported.
Messages are through both point-to-point and pub-sub methods by implementing Advanced Message Queuing Protocols (AMQP) - For Kafka Queuing (One-to-one ) And Publish-subscribe(one-to-many): One-to-many.
Persistency: yes
Any message queue that allows publishing messages decoupled from consuming them is effectively acting as a storage system for the in-flight messages. What is different about Kafka is that it is a very good storage system. It provides data persistency and stores streams of records that render .