An honest AWS MSK review - July 2019

Introduction

Almost nine months ago, AWS announced a new service: Managed Streaming for Apache Kafka, aka AWS MSK. AWS MSK is a fully managed service that enables you to build and run applications that use Apache Kafka to process streaming data.

Figure 1

Maintainability

Maintainability is really great. First of all, setting up a new cluster is easy. Just go to AWS console and find an MSK service panel. After a couple of clicks plus wait time of 20 minutes your cluster will be ready. I’ve set up a new multi-AZ cluster of six m5.large machines with a replication factor of 3. The cluster comes with managed Zookeeper. In terms of versions, you can choose between 1.1.1 and 2.1. A 1.1.1 is really old, I do not really know why any new starter would choose it. 2.1 is OK but not the newest one. Today, 2.3 is available.

Figure 2

Performance

I will start this section with another strange issue. During AWS MSK creation, you can choose only the underlying machine size. No machine types options. And this is weird because the storage type and the network bandwidth are the main indicators for the amount of traffic that Apache Kafka will be able to handle. I have this general feeling that using i3 instances is a much better choice than m5 for Apache Kafka. But in the case of AWS MSK, we are stuck with general-purpose m5 instances type. So, no ephemeral storage option is available and in order to get a 10 Gbps network bandwidth, we need to choose at least m5.12xlarge instance size. This issue affects the cost of AWS MSK in a very negative way and I will cover it in the cost section.

  1. Avg latency: 302 ms
  2. Max latency: 600 ms
  1. Machine type: kafka.m5.12xlarge
  2. Partitions per topic: 15
  3. Replication factor: 1
  4. Max record rate: ±310K rec/sec (310 MB /sec)
  5. Avg latency: 103 ms

Scalability

AWS MSK is not scalable. Currently, modifying a running cluster to add or remove broker nodes or change instance type is not supported. These operations require creating a new cluster. Only updating cluster configuration ‘update-cluster-configuration’ and increasing EBS storage associated with MSK brokers ‘update-broker-storage’ can be done without re-creating the cluster.

Reliability

Even if rare, failures can occur. Broker machines can fail, a broker can be unreachable because of networks issues or even it can be a problem with the whole availability zone or region. AWS region failures handling is out of scope here. But I do want to discuss the case of AZ or broker failures.

Figure 3

Security

Well, there were many complaints about the security of the first MSK version. Most of the issues were fixed, and now MSK security is much better. I will cover only those that were important for my project.

Cost

Finally, let’s discuss the cost of AWS MSK. Here you can find formulas for AWS MSK price calculations. Let’s go with a configuration that was provided by the AWS support team that handles 310 MB/sec: 15 brokers of m5.12xlarge. The monthly price will be:

Final thoughts

I have a long history of running things on AWS and frankly speaking, I like AWS very much. Usually, there are a lot of benefits of using AWS managed services like scalability, solid performance, easy cost management, and great technical support. But in the case of AWS MSK, it is a big ‘no’. I can not take it to production mainly because of cost and scalability issues.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store