flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason J. W. Williams" <jasonjwwilli...@gmail.com>
Subject Re: Kafka Sink random partition assignment
Date Tue, 07 Jun 2016 18:03:46 GMT
Thanks again Chris. I am curious why I see the round-robin behavior I
expected when using kafka-console-producer to inject messages though.

-J

On Tuesday, June 7, 2016, Chris Horrocks <chris@hor.rocks> wrote:

> It's by design of Kafka (and by extension flume). The producers are
> designed to be many-to-one (producers to partitions) and as such picking a
> random partition every 10 minutes prevents separate producer instances from
> all randomly picking the same partition.
>
> --
> Chris Horrocks
>
> From: Jason Williams <jasonjwwilliams@gmail.com>
> <javascript:_e(%7B%7D,'cvml','jasonjwwilliams@gmail.com');>
> Reply: user@flume.apache.org
> <javascript:_e(%7B%7D,'cvml','user@flume.apache.org');>
> <user@flume.apache.org>
> <javascript:_e(%7B%7D,'cvml','user@flume.apache.org');>
> Date: 7 June 2016 at 09:43:34
> To: user@flume.apache.org
> <javascript:_e(%7B%7D,'cvml','user@flume.apache.org');>
> <user@flume.apache.org>
> <javascript:_e(%7B%7D,'cvml','user@flume.apache.org');>
> Subject:  Re: Kafka Sink random partition assignment
>
> Hey Chris,
>
> Thanks for help!
>
> Is that a limitation of the Flume Kafka sink or Kafka itself? Because when
> I use another Kafka producer and publish without a key, it randomly moves
> among the partitions on every publish.
>
> -J
>
> Sent via iPhone
>
> On Jun 7, 2016, at 00:08, Chris Horrocks <chris@hor.rocks
> <javascript:_e(%7B%7D,'cvml','chris@hor.rocks');>> wrote:
>
> The producers bind to random partitions and move every 10 minutes. If you
> leave it long enough you should see it in the producer flume agent logs,
> although there's nothing to stop it from "randomly" choosing the same
> partition twice. Annoyingly there's no concept of producer groups (yet) to
> ensure that producers apportion the available partitions between them as
> this would create a synchronisation issue between what should be entirely
> independent processes.
>
> --
> Chris Horrocks
>
> On 7 June 2016 at 00:32:29, Jason J. W. Williams (
> jasonjwwilliams@gmail.com
> <javascript:_e(%7B%7D,'cvml','jasonjwwilliams@gmail.com');>) wrote:
>
>> Hi,
>>
>> New to flume and I'm trying to relay log messages received over netcat
>> source to Kafka sink.
>>
>> Everything seems to be fine, except that Flume is acting like it IS
>> assigning a partition key to the produced messages though none is assigned.
>> I'd like the messages to be assigned to a random partition, so that
>> consumers are load balanced.
>>
>> * Flume 1.6.0
>> * Kafka 0.9.0.1
>>
>> Flume config:
>> https://gist.github.com/williamsjj/8ae025906955fbc4b5f990e162b75d7c
>>
>> Kafka topic config: kafka-topics --zookeeper localhost/kafka --create
>> --topic activity.history --partitions 20 --replication-factor 1
>>
>> Python consumer program:
>> https://gist.github.com/williamsjj/9e67287f0154816c3a733a39ad008437
>>
>> Test program (publishes to Flume):
>> https://gist.github.com/williamsjj/1eb097a187a3edb17ec1a3913e47e58b
>>
>> Flume agent listens on 3132tcp for connections, and publishes messages
>> received to the Kafka activity.history topic.  I'm running two instances of
>> the Python consumer.
>>
>> What happens however, is all logs messages get sent to a single Kafka
>> consumer...if I restart Flume (leave consumers running) and re-run the
>> test, all messages get published to the other consumer. So it feels like
>> Flume is assigning a permanent partition key even though one is not defined
>> (and should therefore be random).
>>
>> Any advice is greatly appreciated.
>>
>> -J
>>
>

Mime
View raw message