flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hemanth Abbina <Heman...@eiqnetworks.com>
Subject RE: Flume benchmarking with HTTP source & File channel
Date Sun, 15 Nov 2015 07:23:46 GMT
Hi Hari,

Thanks for the response.

I haven’t  tried with different source. Will try that.
We are sending through multiple HTTP clients (around 40 clients) and using single event per
batch.

First, we would like to validate & see the max supported HTTP source EPS for a single
Flume server ( we are testing with 8 core 32 GB RAM), when sent single event batch from multiple
clients.

After confirming the EPS at this stage, we are planning to check  the performance with batching
& multi node Flume support.

Thanks,
Hemanth

From: Hari Shreedharan [mailto:hshreedharan@cloudera.com]
Sent: Sunday, November 15, 2015 8:41 AM
To: user@flume.apache.org
Subject: Re: Flume benchmarking with HTTP source & File channel

Did you try with a different source? Is your sender multithreaded? Sending from a single thread
would obviously be slow. How many messages per batch? The bigger your batch is, better your
perf will be

On Saturday, November 14, 2015, Hemanth Abbina <HemanthA@eiqnetworks.com<mailto:HemanthA@eiqnetworks.com>>
wrote:
Thanks Gonzalo.

Yes, it’s a single server. First we would like to confirm the max throughput by a single
server with this configuration. Size of each message is around 512 bytes.

I have tried with in-memory & null sink too. Performance increased by 50 requests/sec
or so, not beyond that.

In some of the forums, I have seen Flume benchmark of 30K/40K per single node (I’m not sure
about the configurations). So, trying to check the max throughput by a server.

From: Gonzalo Herreros [mailto:gherreros@gmail.com<javascript:_e(%7B%7D,'cvml','gherreros@gmail.com');>]
Sent: Saturday, November 14, 2015 2:02 PM
To: user <user@flume.apache.org<javascript:_e(%7B%7D,'cvml','user@flume.apache.org');>>
Subject: Re: Flume benchmarking with HTTP source & File channel


If that is just with a single server, 600 messages per sec doesn't sound bad to me.
Depending on the size of each message, it could be the network the limiting factor.

I would try with the null sink and in memory channel. If that doesn't improve things I would
say you need more nodes to go beyond that.

Regards,
Gonzalo
On Nov 14, 2015 7:40 AM, "Hemanth Abbina" <HemanthA@eiqnetworks.com<javascript:_e(%7B%7D,'cvml','HemanthA@eiqnetworks.com');>>
wrote:
Hi,

We have been trying to validate & benchmark the Flume performance for our production use.

We have configured Flume to have HTTP source, File channel & Kafka sink.
Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.
Flume configuration:
svcagent.sources = http-source
svcagent.sinks = kafka-sink1
svcagent.channels = file-channel1

# HTTP source to read receive events on port 5005
svcagent.sources.http-source.type = http
svcagent.sources.http-source.channels = file-channel1
svcagent.sources.http-source.port = 5005
svcagent.sources.http-source.bind = 10.15.1.31

svcagent.sources.http-source.selector.type = multiplexing
svcagent.sources.http-source.selector.header = archival
svcagent.sources.http-source.selector.mapping.true = file-channel1
svcagent.sources.http-source.selector.default = file-channel1
#svcagent.sources.http-source.handler =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler

svcagent.sinks.kafka-sink1.topic = flume-sink1
svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092<http://10.15.1.32:9092>
svcagent.sinks.kafka-sink1.channel = file-channel1
svcagent.sinks.kafka-sink1.batchSize = 5000

svcagent.channels.file-channel1.type = file
svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint
svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data
svcagent.channels.file-channel1.transactionCapacity=10000
svcagent.channels.file-channel1.capacity=50000
svcagent.channels.file-channel1.checkpointInterval=120000
svcagent.channels.file-channel1.checkpointOnClose=true
svcagent.channels.file-channel1.maxFileSize=536870912
svcagent.channels.file-channel1.use-fast-replay=false

When we tried to stream HTTP data, from multiple clients (around 40 HTTP clients), we could
get a max processing of 600  requests/sec, and not beyond that. Increased the XMX setting
of Flume to 4096.

Even we have tried with a Null Sink (instead of Kafka sink). Did not get much performance
improvements. So, assuming the blockage is the HTTP source & File channel.

Could you please suggest any fine tunings to improve the performance of this setup.

--regards
Hemanth


--

Thanks,
Hari

Mime
View raw message