flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roshan Naik <ros...@hortonworks.com>
Subject Re: Message Loss problem and performance requirements
Date Fri, 10 Apr 2015 18:42:34 GMT
Let me send out a separate email to users with some perf benchmarks which have suggestions
on how to gett better perf from File channel. This will go on the wiki soon, but haven't had
the time to put it up there.

In addition to the suggestions in it you can try adding more agents on the same machine and
see if you get additional throughput.


From: 김동경 <style9595@gmail.com<mailto:style9595@gmail.com>>
Reply-To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Date: Thursday, April 9, 2015 9:24 PM
To: "user@flume.apache.org<mailto:user@flume.apache.org>" <user@flume.apache.org<mailto:user@flume.apache.org>>
Subject: Message Loss problem and performance requirements


I wanna talk about message Loss problem and performance restriction in Flume.

First of all, I want to ask you like this, does file channel meet your performance requirement?
As far as I know, until flume-1.5.0, file channel is the only way to resolve message loss
problem in flume.
I performed brief benchmark test using file channel and got about 1000~2000 throughput per
one agent.
If I conjunct multiple Flumes; two or three step wise, performance will be more less.

I know it depends on the hardware specification.
However, no matter how I improve my hardware, it`s looks not easy to meet my performance requirements.
I need at least 100K ~ 200K E.P.S(events per seconds).
(I assume that not to use SSD, since I think it`s not commodity hardware.)

I think powerful feature of Flume is dynamic events routing using dynamic configuration reloading.
And multi-steps Flume agents can maximize this feature.
But multi-agent Flume deployments using file channel degrades the performance.

Memory channel is quite enough for my performance requirement,
but definitely it has the possibility of message loss.

As of now I am considering to use KafkaChannel which is introduced in Flume-1.6.0.
But I wanna know how the trade-offs in flume; performance requirements and message loss; are
addressed in many other systems all around the world.

Do you have any idea?

Best regards

View raw message