I am Dennis Lattka, chief storage solutions engineer at Micron Technology. The true meaning of this title is that I am committed to determining how to use flash storage to improve the performance and results of workload applications. To this end, I decided to evaluate Apache Kafka, one of the most commonly used distributed messaging systems in the big data ecosystem, to test how best to apply Micron's solid-state storage to Apache Kafka and what benefits it will generate.
Introduction to Apache Kafka
By monitoring the various resources involved (ie CPU, memory, disk activity, and network), I found that the main bottlenecks were disk and network.
Everything depends on throughput
Through Apache Kafka I found that throughput is everything. Kafka developers do a great job of passing write data directly to the kernel page cache, minimizing I/O-related issues. However, no matter how good this development is, I/O will eventually be converted to continuous writes to Kafka partitions (theme log files). Therefore, the higher the throughput of the disk used, the greater the performance improvement.
After figuring out how to best test Apache Kafka and which parameters worked best, I did a simple test with its built-in generator test script. The three generators sent a total of 600 million 100-byte messages to a Kafka agent.
The tests include the following: (no adjustments, only the default configuration.)
- A total of 600 topics have been generated.
- Each generator is assigned 200 topics specific to the generator.
- Each generator creates 1 million messages for each topic.
- The message size used is 100 bytes per message.
Hardware used:
- Each server is equipped with 1 agent and 3 generators in the same configuration.
- Two Intel(R) Xeon(R) CPUs E5-2690 v3 @ 2.60GHz processor.
-384GB memory
- Two 10Gb NICs fixed in ALB mode.
A comparison was made using a 6TB 7.2k hard drive, a Micron 5100 ECO 1920GB SSD and a Micron 9100 Pro 3.2TB NVMe hard drive.
In each test, the Apache Kafka agent partition is located on the hard disk under test.
The results are as follows:
As you can see from the above table, the higher the throughput, the higher the I/O per second. This means that Apache Kafka means that a larger number of consecutive messages can be processed per second (display unit is MB/s).
in conclusion
The use of higher-capacity disk devices (such as Micron's 5100 Series SSDs or Micron NVMe SSDs) in Apache Kafka configurations will significantly improve the performance of Apache Kafka.
Computer Monitor Stand,Desk Computer Monitor Stand,Computer Adjustable Monitor Stand,Computer Monitor Stand Desktop
Shenzhen ChengRong Technology Co.,Ltd. , https://www.laptopstandsuppliers.com