flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Ortiz <konstt2...@gmail.com>
Subject Kafka Sink, bad distribution of data in the partitions.
Date Mon, 14 Dec 2015 22:52:25 GMT
I'm using a an architecture as:
Logs --> SpoolDir -->MemChannel --> AvroSink  -->
AvroSource --> MemChannel --> KafkaSink.

I have a cluster with three kafka nodes and have created a topic with six
partitions and replication factor one to make a POC.

I have seen that 95% of the data goes to two partitions, these two
partitions are in the same kafka node. I am not creating a "key" header on
my events in Flume. So, reading the documentation the key is generated
randomly. The messages are logs from different sources. Is it normal this
behavior?

Mime
View raw message