flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gonzalo Herreros <gherre...@gmail.com>
Subject Re: KafkaSink vs KafkaChannel performance
Date Thu, 12 Nov 2015 11:39:08 GMT
I think your expectations are not realistic.
The MemoryChannel adds minimum overhead but is not reliable like the
KafkaChannel
In the first case you can lose 10k messages if you are unlucky while with
the KafkaChannel you won't lose a single one.
With more reliability normally you have a small performance hit

However, the differences you are seeing are too great so I also believe
it's related to the batch size.
While the sink it's using 10k batches, there is nothing configured for the
KafkaChannel (it could be committing every message or something like 100).
Not sure what is the default batch size there,
In the documentation there are no properties for batch or
transactionCapacity but the example it does set the capacity and
transactionCapacity. Not sure if they apply to this channel..

Regards,
Gonzalo


On 12 November 2015 at 11:23, Ahmed Vila <avila@devlogic.eu> wrote:

> Hi Guillermo,
>
> With KafkaSink you're passing 10k events at once to Kafka due to batchSize
> (transaction size) being that big.
>
> So, it's important to know how big batchSize is in your source in order to
> be able to compare. Set it to 10k and check it's performance again.
>
> Please keep in mind that Flume has to keep track of transactions and other
> housekeeping within any channel, so in my opinion it's supposed to be
> slower than Sink for the same output (Kafka, file or whatever).
>
>
>
> On Thu, Nov 12, 2015 at 12:05 PM, Guillermo Ortiz <konstt2000@gmail.com>
> wrote:
>
>> Hello,
>>
>> I'm using Flume with Kafka and I don't understand some performance
>> results that I'm getting.
>>
>> I have a topic with 3 nodes, 6 partitions, replication 2.
>> I'm ingesting messages of 1100bytes each one with a poolDirectory source.
>>
>> I tried with Source-MemoryChannel-KafkaSink and I get about
>> 50Kmessage/second - 54Mb/s in Kafka.
>>
>> If I use Source-KafkaChannel I just got about 1Kmessage/second - 1.2Mb/s
>> in Kafka
>>
>> I thought that I was going to get better performance with the
>> KafkaChannel and I'm getting 50x times better with KafkaSink.
>>
>> The first configuration is
>> agent.sources = seqGenSrc
>> agent.channels = memoryChannel
>> agent.sinks = kafkaSink
>>
>> #Source configuration
>> ...
>>
>> agent.sources.seqGenSrc.channels = memoryChannel
>> agent.sinks.kafkaSink.channel = memoryChannel
>> agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
>> agent.sinks.kafkaSink.batchSize = 10000
>> agent.sinks.kafkaSink.brokerList =
>> ose10kafkaelk:9092,ose11kafkaelk:9092,ose12kafkaelk:9092
>> agent.sinks.kafkaSink.topic = kafka-topic
>> agent.sinks.kafkaSink.requiredAcks = -1
>> agent.sinks.kafkaSink.channel = memoryChannel
>>
>> agent.channels.memoryChannel.type = memory
>> agent.channels.memoryChannel.capacity = 100000
>> agent.channels.memoryChannel.transactionCapacity = 10000
>>
>>
>>
>> The second is:
>> agent.sources = seqGenSrc
>> agent.channels = kafkaChannel
>>
>>
>> # Describe/configure the source
>> ###Configuration spoolDir source...
>> ...
>>
>> # The channel can be defined as follows.
>> agent.sources.seqGenSrc.channels = kafkaChannel
>>
>> agent.channels.kafkaChannel.type   =
>> org.apache.flume.channel.kafka.KafkaChannel
>>
>> agent.channels.kafkaChannel.brokerList=ose10kafkaelk:9092,ose11kafkaelk:9092,ose12kafkaelk:9092
>> agent.channels.kafkaChannel.topic=kafka-topic3
>> agent.channels.kafkaChannel.zookeeperConnect=ose10kafkaelk:2181
>>
>>
>>
>
>
> --
>
> Best regards,
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
>
> Office : +387 33 942 123
> Mobile: +387 62 139 348
>
> Website: www.devlogic.eu
> E-mail   : avila@devlogic.eu
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.
>
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended
> recipient(s) only. This email contains confidential information. It should
> not be copied, disclosed to, retained or used by, any party other than the
> intended recipient. Any unauthorised distribution, dissemination or copying
> of this E-mail or its attachments, and/or any use of any information
> contained in them, is strictly prohibited and may be illegal. If you are
> not an intended recipient then please promptly delete this e-mail and any
> attachment and all copies and inform the sender directly via email. Any
> emails that you send to us may be monitored by systems or persons other
> than the named communicant for the purposes of ascertaining whether the
> communication complies with the law and company policies.

Mime
View raw message