flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hari Shreedharan <hshreedha...@cloudera.com>
Subject Re: Flume benchmarking with HTTP source & File channel
Date Sun, 15 Nov 2015 03:11:06 GMT
Did you try with a different source? Is your sender multithreaded? Sending
from a single thread would obviously be slow. How many messages per batch?
The bigger your batch is, better your perf will be

On Saturday, November 14, 2015, Hemanth Abbina <HemanthA@eiqnetworks.com>
wrote:

> Thanks Gonzalo.
>
>
>
> Yes, it’s a single server. First we would like to confirm the max
> throughput by a single server with this configuration. Size of each message
> is around 512 bytes.
>
>
>
> I have tried with in-memory & null sink too. Performance increased by 50
> requests/sec or so, not beyond that.
>
>
>
> In some of the forums, I have seen Flume benchmark of 30K/40K per single
> node (I’m not sure about the configurations). So, trying to check the max
> throughput by a server.
>
>
>
> *From:* Gonzalo Herreros [mailto:gherreros@gmail.com
> <javascript:_e(%7B%7D,'cvml','gherreros@gmail.com');>]
> *Sent:* Saturday, November 14, 2015 2:02 PM
> *To:* user <user@flume.apache.org
> <javascript:_e(%7B%7D,'cvml','user@flume.apache.org');>>
> *Subject:* Re: Flume benchmarking with HTTP source & File channel
>
>
>
> If that is just with a single server, 600 messages per sec doesn't sound
> bad to me.
> Depending on the size of each message, it could be the network the
> limiting factor.
>
> I would try with the null sink and in memory channel. If that doesn't
> improve things I would say you need more nodes to go beyond that.
>
> Regards,
> Gonzalo
>
> On Nov 14, 2015 7:40 AM, "Hemanth Abbina" <HemanthA@eiqnetworks.com
> <javascript:_e(%7B%7D,'cvml','HemanthA@eiqnetworks.com');>> wrote:
>
> Hi,
>
>
>
> We have been trying to validate & benchmark the Flume performance for our
> production use.
>
>
>
> We have configured Flume to have HTTP source, File channel & Kafka sink.
>
> Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.
>
> Flume configuration:
>
> *svcagent.sources =
> http-source
> *
>
> *svcagent.sinks =
> kafka-sink1
> *
>
> *svcagent.channels = file-channel1*
>
>
>
> *# HTTP source to read receive events on port 5005*
>
> *svcagent.sources.http-source.type =
> http                                                              *
>
> *svcagent.sources.http-source.channels =
> file-channel1
>                                                                                     
                                                                                         
             *
>
> *svcagent.sources.http-source.port =
> 5005                                                              *
>
> *svcagent.sources.http-source.bind =
> 10.15.1.31                                                        *
>
>
>
>
> *svcagent.sources.http-source.selector.type =
> multiplexing                                             *
>
> *svcagent.sources.http-source.selector.header =
> archival                                               *
>
> *svcagent.sources.http-source.selector.mapping.true =
> file-channel1                                    *
>
> *svcagent.sources.http-source.selector.default =
> file-channel1                                         *
>
> *#svcagent.sources.http-source.handler
> =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler                *
>
>
>
>
> *svcagent.sinks.kafka-sink1.topic =
> flume-sink1                                                       *
>
> *svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092
> <http://10.15.1.32:9092>                                              *
>
> *svcagent.sinks.kafka-sink1.channel =
> file-channel1                                                   *
>
> *svcagent.sinks.kafka-sink1.batchSize =
> 5000
>     *
>
>
>
>
> *svcagent.channels.file-channel1.type =
> file                                                           *
>
> *svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint
> *
>
> *svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data
> *
>
> *svcagent.channels.file-channel1.transactionCapacity=10000
>        *
>
> *svcagent.channels.file-channel1.capacity=50000
> *
>
> *svcagent.channels.file-channel1.checkpointInterval=120000
> *
>
> *svcagent.channels.file-channel1.checkpointOnClose=true
> *
>
> *svcagent.channels.file-channel1.maxFileSize=536870912
> *
>
> *svcagent.channels.file-channel1.use-fast-replay=false
>           *
>
>
>
> When we tried to stream HTTP data, from multiple clients (around 40 HTTP
> clients), we could get a max processing of 600  requests/sec, and not
> beyond that. Increased the XMX setting of Flume to 4096.
>
>
>
> Even we have tried with a Null Sink (instead of Kafka sink). Did not get
> much performance improvements. So, assuming the blockage is the HTTP source
> & File channel.
>
>
>
> Could you please suggest any fine tunings to improve the performance of
> this setup.
>
>
>
>
> --regards
>
> Hemanth
>
>

-- 

Thanks,
Hari

Mime
View raw message