flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gonzalo Herreros <gherre...@gmail.com>
Subject RE: Flume benchmarking with HTTP source & File channel
Date Fri, 20 Nov 2015 07:07:42 GMT
14,000 ps with both brokers running likely on the same disk doesn't sound
slow to me.
How many partitions are you using in the topic.

You can make kafka faster relaxing the reliability like reducing ack or
using the async producer with batches.

Regards,
Gonzalo
On Nov 20, 2015 4:19 AM, "Hemanth Abbina" <HemanthA@eiqnetworks.com> wrote:

> Hi All,
>
>
>
> These are the follow up observations & issues on the benchmarking.
>
>
>
> Configuration is same as HTTP source -> File Channel -> Kafka Sink: When
> sent larger messages from the HTTP clients, observed EPS is around 140.
> Each single large message is batch of 100 individual log messages, so I can
> say the effective EPS is 14,000.
>
>
>
> When I further increase the streaming rate from the clients, the file
> channel is overflowing and throwing errors “Error appending event to
> channel. Channel might be full. Unable to put batch on required channel:
> FileChannel file-channel1 { dataDirs: [/etc/flume-kafka/data]”.
>
>
>
> I understood that the issue might be Kafka sink is slower than the HTTP
> source. How can I overcome this ? Tried creaking a sink group with load
> balancer support, but of no use.
>
>
>
> Could you pleaes suggest me something to overcome the slow Kafka sink
> problem ?
>
>
>
> *svcagent.sources = http-source*
>
> *svcagent.sinks = kafka-sink1*
>
> *svcagent.channels = file-channel1*
>
>
>
> *svcagent.sources.http-source.type = http*
>
> *svcagent.sources.http-source.channels = file-channel1*
>
> *svcagent.sources.http-source.port = 5005*
>
> *svcagent.sources.http-source.bind = 10.15.1.31*
>
> *svcagent.sources.http-source.handler
> =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler*
>
>
>
> *svcagent.sinks.kafka-sink1.type =  org.apache.flume.sink.kafka.KafkaSink*
>
> *svcagent.sinks.kafka-sink1.topic = flume-sink1*
>
> *svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092
> <http://10.15.1.32:9092>,10.15.1.32:9093 <http://10.15.1.32:9093>*
>
> *svcagent.sinks.kafka-sink1.channel = file-channel1*
>
> *svcagent.sinks.kafka-sink1.batchSize = 100*
>
> *svcagent.sinks.kafka-sink1.request.required.acks = 1*
>
> *svcagent.sinks.kafka-sink1.send.buffer.bytes = 1310720*
>
>
>
> *svcagent.channels.file-channel1.type=file*
>
> *svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint*
>
> *svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data*
>
> *svcagent.channels.file-channel1.transactionCapacity=1000*
>
> *svcagent.channels.file-channel1.capacity=10000*
>
> *svcagent.channels.file-channel1.checkpointInterval=120000*
>
> *svcagent.channels.file-channel1.checkpointOnClose=true*
>
> *svcagent.channels.file-channel1.maxFileSize=536870912*
>
> *svcagent.channels.file-channel1.use-fast-replay=false*
>
>
>
> *From:* 이승진 [mailto:sweetest.sj@navercorp.com]
> *Sent:* Sunday, November 15, 2015 7:49 PM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume benchmarking with HTTP source & File channel
>
>
>
> I found one day that Flume's HTTP source implementation is somewhat
> outdated and it's not really optimized for performance.
>
>
>
> Our requirement includes processing more than 10k requests within a single
> node, but as Hemanth said, Flume's HTTP source processed a few hundreds per
> second.
>
>
>
> We decided to implement our own Http source based on netty 4, and it
> processes 30~40k per second which perfectly meet our requirements.(without
> much optimization)
>
>
>
> Regards,
>
> Adrian Seungjin Lee
>
>
>
>
>
> -----Original Message-----
> *From:* "Hari Shreedharan"<hshreedharan@cloudera.com>
> *To:* "user@flume.apache.org"<user@flume.apache.org>;
> *Cc:*
> *Sent:* 2015-11-15 (일) 16:37:38
> *Subject:* Re: Flume benchmarking with HTTP source & File channel
>
>
> Single event batches are going to be really slow. Multiple reasons -
> protocol overhead, flume channels written to handle batches of events and
> not single events etc
>
> On Saturday, November 14, 2015, Hemanth Abbina <HemanthA@eiqnetworks.com>
> wrote:
>
> Hi Hari,
>
>
>
> Thanks for the response.
>
>
>
> I haven’t  tried with different source. Will try that.
>
> We are sending through multiple HTTP clients (around 40 clients) and using
> single event per batch.
>
>
>
> First, we would like to validate & see the max supported HTTP source EPS
> for a single Flume server ( we are testing with 8 core 32 GB RAM), when
> sent single event batch from multiple clients.
>
>
>
> After confirming the EPS at this stage, we are planning to check  the
> performance with batching & multi node Flume support.
>
>
>
> Thanks,
>
> Hemanth
>
>
>
> *From:* Hari Shreedharan [mailto:hshreedharan@cloudera.com
> <hshreedharan@cloudera.com>]
> *Sent:* Sunday, November 15, 2015 8:41 AM
> *To:* user@flume.apache.org
> *Subject:* Re: Flume benchmarking with HTTP source & File channel
>
>
>
> Did you try with a different source? Is your sender multithreaded? Sending
> from a single thread would obviously be slow. How many messages per batch?
> The bigger your batch is, better your perf will be
>
> On Saturday, November 14, 2015, Hemanth Abbina <HemanthA@eiqnetworks.com>
> wrote:
>
> Thanks Gonzalo.
>
>
>
> Yes, it’s a single server. First we would like to confirm the max
> throughput by a single server with this configuration. Size of each message
> is around 512 bytes.
>
>
>
> I have tried with in-memory & null sink too. Performance increased by 50
> requests/sec or so, not beyond that.
>
>
>
> In some of the forums, I have seen Flume benchmark of 30K/40K per single
> node (I’m not sure about the configurations). So, trying to check the max
> throughput by a server.
>
>
>
> *From:* Gonzalo Herreros [mailto:gherreros@gmail.com <gherreros@gmail.com>]
>
> *Sent:* Saturday, November 14, 2015 2:02 PM
> *To:* user <user@flume.apache.org>
> *Subject:* Re: Flume benchmarking with HTTP source & File channel
>
>
>
> If that is just with a single server, 600 messages per sec doesn't sound
> bad to me.
> Depending on the size of each message, it could be the network the
> limiting factor.
>
> I would try with the null sink and in memory channel. If that doesn't
> improve things I would say you need more nodes to go beyond that.
>
> Regards,
> Gonzalo
>
> On Nov 14, 2015 7:40 AM, "Hemanth Abbina" <HemanthA@eiqnetworks.com>
> wrote:
>
> Hi,
>
>
>
> We have been trying to validate & benchmark the Flume performance for our
> production use.
>
>
>
> We have configured Flume to have HTTP source, File channel & Kafka sink.
>
> Hardware : 8 Core, 32 GB RAM, CentOS6.5, Disk - 500 GB HDD.
>
> Flume configuration:
>
> *svcagent.sources =
> http-source
> *
>
> *svcagent.sinks =
> kafka-sink1
> *
>
> *svcagent.channels = file-channel1*
>
>
>
> *# HTTP source to read receive events on port 5005*
>
> *svcagent.sources.http-source.type =
> http                                                              *
>
> *svcagent.sources.http-source.channels =
> file-channel1
>                                                                                     
                                                                                         
             *
>
> *svcagent.sources.http-source.port =
> 5005                                                              *
>
> *svcagent.sources.http-source.bind =
> 10.15.1.31                                                        *
>
>
>
>
> *svcagent.sources.http-source.selector.type =
> multiplexing                                             *
>
> *svcagent.sources.http-source.selector.header =
> archival                                               *
>
> *svcagent.sources.http-source.selector.mapping.true =
> file-channel1                                    *
>
> *svcagent.sources.http-source.selector.default =
> file-channel1                                         *
>
> *#svcagent.sources.http-source.handler
> =org.eiq.flume.JSONHandler.HTTPSourceJSONHandler                *
>
>
>
>
> *svcagent.sinks.kafka-sink1.topic =
> flume-sink1                                                       *
>
> *svcagent.sinks.kafka-sink1.brokerList = 10.15.1.32:9092
> <http://10.15.1.32:9092>                                              *
>
> *svcagent.sinks.kafka-sink1.channel =
> file-channel1                                                   *
>
> *svcagent.sinks.kafka-sink1.batchSize =
> 5000
>     *
>
>
>
>
> *svcagent.channels.file-channel1.type =
> file                                                           *
>
> *svcagent.channels.file-channel1.checkpointDir=/etc/flume-kafka/checkpoint
> *
>
> *svcagent.channels.file-channel1.dataDirs=/etc/flume-kafka/data
> *
>
> *svcagent.channels.file-channel1.transactionCapacity=10000
>        *
>
> *svcagent.channels.file-channel1.capacity=50000
> *
>
> *svcagent.channels.file-channel1.checkpointInterval=120000
> *
>
> *svcagent.channels.file-channel1.checkpointOnClose=true
> *
>
> *svcagent.channels.file-channel1.maxFileSize=536870912
> *
>
> *svcagent.channels.file-channel1.use-fast-replay=false
>           *
>
>
>
> When we tried to stream HTTP data, from multiple clients (around 40 HTTP
> clients), we could get a max processing of 600  requests/sec, and not
> beyond that. Increased the XMX setting of Flume to 4096.
>
>
>
> Even we have tried with a Null Sink (instead of Kafka sink). Did not get
> much performance improvements. So, assuming the blockage is the HTTP source
> & File channel.
>
>
>
> Could you please suggest any fine tunings to improve the performance of
> this setup.
>
>
>
>
> --regards
>
> Hemanth
>
>
>
> --
>
>
>
> Thanks,
>
> Hari
>
>
>
>
>
> --
>
>
>
> Thanks,
>
> Hari
>
>
>
>
>

Mime
View raw message