Introduction

In this post we will go through Kafka introduction

In this series of posts we will cover (with demos) the basic building blocks of Kafka. We will cover the basic concepts related to each of the component. At the time of covering concepts we will use the Kafka console clients that are shipped along with Kafka installation. Eventually we will build a dotnet client to produce and consume messages. This series is broken into following posts:

Prerequisites

  • None

Kafka Introduction

Have you encountered a situation where you need to integrate systems with data produced by one system and consumed by others. If yes, you must have looked for mechanisms to create some kind of integration between the systems. Have you had the requirement that data produced by one system needs to be consumed by multiple systems. Have you had need of reprocessing the messages. 

One of the way to cater for above scenarios is to use Apache Kafka - a message oriented middleware that solves these complex problems by providing a solution which enables seamless integration between systems that are inaccessible to each other.

Apache Kafka is an open source commit-log-based publish-subscribe messaging system with built in partitioning, replication and fault tolerance which makes Kafka extremely scalable and fault tolerant message oriented middleware. Kafka is very easy to explain at high level but when looked at low level it has a vast level of concepts and technical details. Kafka enables producers to produce messages without knowing who will consume them and vice versa.

Kafka is built with following characteristics:

  • Persistent messaging: The messages produced are stored on the disk with configurable retention policies. This enables multiple systems to consume and act on the messages independently, allows already processed messages to be reprocessed if required.

  • Explicit support for producing messages over partitions that can be spread across machines. It also provides per partition ordering semantics.

  • Explicit support for distributed consumption of messages from partitions. Enables consumers to read from one or many partitions. 

  • Explicit support to prevent loss of messages by replicating partitions across brokers across machines. 

Kafka Use Cases

Since Kafka is a messaging system it can be used for any scenarios where brokered messaging is required to enable communication between systems. However, Kafka is used most effectively for high through put real time integrations:

  • Website activity tracking.
  • Replay of messages.
  • Shopping cart data.
  • Real time Analytics.
  • Multiple subscribers for the same messages. 
  • Communication – emails, sms.

Kafka Clients

Kafka has clients in C#, java, C, python and many other languages. Kafka also provides REST proxy (http & json) which makes integration even more easier. Kafka stores messages in binary format so effectively any serialization can be used provided producer and consumer use the same one. Kafka when used with Avro and its Schema Registry feature provides an ecosystem where messages produced by client in one language can be consumed by clients in other language using the serializers that are based on Avro specification.

Kafka at high level

Following image gives a high level view of components involved in Kafka messaging:

Kafka_Architecture

We will look all the above components in detail in Kafka Architecture post.

Conclusion

In this post we looked into what is Kafka, characteristics of Kafka, some of its use cases and its client ecosystem in various languages. In the next post we will look at how to setup and configure Kafka on Windows.