Typical use cases for Kafka

In my previous post I explained the main components of a Kafka system. In this post I will explain some of the typical use cases where Kafka is used.

By typical I do not mean that all systems built on top of Kafka follow these patterns, only that these use cases are wildly used.

Before we start

If you are not familiar with Apache Kafka, please take a look at my previous post Data streaming with Apache Kafka where go over the main concepts you need to know.

Typical use cases for Kafka

Kafka is a message brokering system. This means it is a system where messages are published and consumed afterwards.

The most common use case for Kafka is to isolate producers of data from the consumers of data. Producers of data will write records to a topic in Kafka and the consumers will read from that topic.

Another very common use case for Kafka is signaling. A system that needs to notify other systems of state changes will post messages to a Kafka topic. Systems that are interested in those state changes will subscribe to that topic and react accordingly.

The producer – consumer case

Decoupling producers of data from the consumers is a typical situation in many types of systems. By itself Kafka is very good at keeping the producers and consumers working at their own pace. If we incorporate a schema registry and a schema language such as Avro, we can provide a separation in the logic between the producer and consumer. The consumer and producer use the schema that is available in the schema registry.

In this scenario it is very important to select a partitioning scheme that maximizes the parallelism. The unit for parallelism in Kafka is the partition, so the number of partitions in a topic should be a function of the maximum number of consumers. In any case, it is much better to start with a lower number of partitions and increase it afterwards than to start with a very large number of partitions. The reason for this is that it is not possible to decrease the number of partitions in a topic, you can only increase it. If you need to decrease it, you need to create a new topic.

In addition to the number of partitions, it is important to select a partition key that distributes records in a uniform way. This means, that avoids that all records end up in one partition. The default partitioning algorithm computes a hash of the key and then applies modulo on the number of partitions. If the key does not vary or there are very few keys, it might be necessary to replace the default partitioner by a custom one.

Signaling

Any system that needs to notify other systems of a state change has many alternatives. One of the preferred ways to do this is by using the publish-subscribe pattern. In this pattern, a system will publish events and any other system that is interested in the changes will subscribe to the notifications.

Kafka is ideal for this kind of situations. The producer translates very well to the publisher role in this pattern, and the consumers behave exactly as the subscribers.

In this scenario it is better to have a small partition count. Each system that needs to be notified can subscribe to a topic using its own id. This case is different from the producer-consumer in the sense that the consumers are not trying to maximize the throughput, they need to be able to react to system changes at their own pace.

What’s next?

In my next post I will change gears again and I will start a series of post about different types of performance testing.

Published by carlosware

Busy dad of three with a passion for fly fishing and computers.

Leave a comment