flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: KafkaSink vs KafkaChannel performance
Date Thu, 12 Nov 2015 19:23:42 GMT
I think your batch sizes are the key. What is your batch size from your source?

Thanks,
Hari Shreedharan




> On Nov 12, 2015, at 4:06 AM, Guillermo Ortiz <konstt2000@gmail.com> wrote:
> 
> Yes,, I tried as well changing the capacity of the KafkaChannel because there is an example
in the documentation, although the documentation doesn't say anything about what it means.
> 
> Anyway, I finally write messages in Kafka from the PoolDir Source or from a KafkaSink.
I take the measure in Kafka. Maybe it's not the same to write from a sink or directly as a
channel. I thought that it should be faster since there're less pieces though the complete
flow. 
> 
> Another theory that I have it's that I have taken a look to the code MemoryChannel and
KafkaChannel. It was a quick look, but I saw that in KafkaChannel it has to serialize the
events with Avro and in MemoryChannel I didn't see that transformation. There is a method
doCommit but I'm not sure when this method is called.
> 
> 
> 2015-11-12 12:39 GMT+01:00 Gonzalo Herreros <gherreros@gmail.com <mailto:gherreros@gmail.com>>:
> I think your expectations are not realistic.
> The MemoryChannel adds minimum overhead but is not reliable like the KafkaChannel
> In the first case you can lose 10k messages if you are unlucky while with the KafkaChannel
you won't lose a single one.
> With more reliability normally you have a small performance hit
> 
> However, the differences you are seeing are too great so I also believe it's related
to the batch size. 
> While the sink it's using 10k batches, there is nothing configured for the KafkaChannel
(it could be committing every message or something like 100). Not sure what is the default
batch size there, 
> In the documentation there are no properties for batch or transactionCapacity but the
example it does set the capacity and transactionCapacity. Not sure if they apply to this channel..
> 
> Regards,
> Gonzalo
> 
> 
> On 12 November 2015 at 11:23, Ahmed Vila <avila@devlogic.eu <mailto:avila@devlogic.eu>>
wrote:
> Hi Guillermo,
> 
> With KafkaSink you're passing 10k events at once to Kafka due to batchSize (transaction
size) being that big.
> 
> So, it's important to know how big batchSize is in your source in order to be able to
compare. Set it to 10k and check it's performance again.
> 
> Please keep in mind that Flume has to keep track of transactions and other housekeeping
within any channel, so in my opinion it's supposed to be slower than Sink for the same output
(Kafka, file or whatever).
> 
> 
> 
> On Thu, Nov 12, 2015 at 12:05 PM, Guillermo Ortiz <konstt2000@gmail.com <mailto:konstt2000@gmail.com>>
wrote:
> Hello, 
> 
> I'm using Flume with Kafka and I don't understand some performance results that I'm getting.

> 
> I have a topic with 3 nodes, 6 partitions, replication 2.
> I'm ingesting messages of 1100bytes each one with a poolDirectory source.
> 
> I tried with Source-MemoryChannel-KafkaSink and I get about 50Kmessage/second - 54Mb/s
in Kafka.
> 
> If I use Source-KafkaChannel I just got about 1Kmessage/second - 1.2Mb/s in Kafka
> 
> I thought that I was going to get better performance with the KafkaChannel and I'm getting
50x times better with KafkaSink.
> 
> The first configuration is
> agent.sources = seqGenSrc
> agent.channels = memoryChannel
> agent.sinks = kafkaSink
> 
> #Source configuration
> ...
> 
> agent.sources.seqGenSrc.channels = memoryChannel
> agent.sinks.kafkaSink.channel = memoryChannel
> agent.sinks.kafkaSink.type = org.apache.flume.sink.kafka.KafkaSink
> agent.sinks.kafkaSink.batchSize = 10000
> agent.sinks.kafkaSink.brokerList = ose10kafkaelk:9092,ose11kafkaelk:9092,ose12kafkaelk:9092
> agent.sinks.kafkaSink.topic = kafka-topic
> agent.sinks.kafkaSink.requiredAcks = -1
> agent.sinks.kafkaSink.channel = memoryChannel
> 
> agent.channels.memoryChannel.type = memory
> agent.channels.memoryChannel.capacity = 100000
> agent.channels.memoryChannel.transactionCapacity = 10000
> 
> 
> 
> The second is:
> agent.sources = seqGenSrc
> agent.channels = kafkaChannel
> 
> 
> # Describe/configure the source
> ###Configuration spoolDir source...
> ...
> 
> # The channel can be defined as follows.
> agent.sources.seqGenSrc.channels = kafkaChannel
> 
> agent.channels.kafkaChannel.type   = org.apache.flume.channel.kafka.KafkaChannel
> agent.channels.kafkaChannel.brokerList=ose10kafkaelk:9092,ose11kafkaelk:9092,ose12kafkaelk:9092
> agent.channels.kafkaChannel.topic=kafka-topic3
> agent.channels.kafkaChannel.zookeeperConnect=ose10kafkaelk:2181
> 
> 
> 
> 
> 
> -- 
> Best regards,
> 
> Ahmed Vila | Senior software developer
> DevLogic | Sarajevo | Bosnia and Herzegovina
> 
> Office : +387 33 942 123 <tel:%2B387%2033%20942%20123> 
> Mobile: +387 62 139 348 <tel:%2B387%2062%20139%20348>
> 
> Website: www.devlogic.eu <http://www.devlogic.eu/> 
> E-mail   : avila@devlogic.eu <mailto:avila@devlogic.eu>---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended recipient(s) only.
This email contains confidential information. It should not be copied, disclosed to, retained
or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination
or copying of this E-mail or its attachments, and/or any use of any information contained
in them, is strictly prohibited and may be illegal. If you are not an intended recipient then
please promptly delete this e-mail and any attachment and all copies and inform the sender
directly via email. Any emails that you send to us may be monitored by systems or persons
other than the named communicant for the purposes of ascertaining whether the communication
complies with the law and company policies.
> 
> ---------------------------------------------------------------------
> This e-mail and any attachment is for authorised use by the intended recipient(s) only.
This email contains confidential information. It should not be copied, disclosed to, retained
or used by, any party other than the intended recipient. Any unauthorised distribution, dissemination
or copying of this E-mail or its attachments, and/or any use of any information contained
in them, is strictly prohibited and may be illegal. If you are not an intended recipient then
please promptly delete this e-mail and any attachment and all copies and inform the sender
directly via email. Any emails that you send to us may be monitored by systems or persons
other than the named communicant for the purposes of ascertaining whether the communication
complies with the law and company policies.
> 
> 


Mime
View raw message