flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gonzalo Herreros <gherre...@gmail.com>
Subject Re: Kafka Sink, bad distribution of data in the partitions.
Date Tue, 15 Dec 2015 08:14:19 GMT
Unless you are using a custom partitioner, the DefaultPartitioner assigns
them randomly so the content of the headers shouldn't make any difference.
The only explanation I can see for what you are seeing is that somehow the
producer thinks there are only 2.
Are the msgs going just to 0 and 1 or different numbers? Can you try with
another topic and see if that happens too?

How are you checking where are the msg going?

Regards,
Gonzalo

On 14 December 2015 at 22:52, Guillermo Ortiz <konstt2000@gmail.com> wrote:

> I'm using a an architecture as:
> Logs --> SpoolDir -->MemChannel --> AvroSink  -->
> AvroSource --> MemChannel --> KafkaSink.
>
> I have a cluster with three kafka nodes and have created a topic with six
> partitions and replication factor one to make a POC.
>
> I have seen that 95% of the data goes to two partitions, these two
> partitions are in the same kafka node. I am not creating a "key" header on
> my events in Flume. So, reading the documentation the key is generated
> randomly. The messages are logs from different sources. Is it normal this
> behavior?
>

Mime
View raw message