incubator-flume-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S Ahmed <sahmed1...@gmail.com>
Subject Re: Flume NG performance metrics
Date Tue, 08 May 2012 18:35:52 GMT
I have played around with Kafka, and with a single producer (which I think
in flume terms is a flow) I can write about 200 000 messages a second that
are 200 bytes each.  What it does it it writes in memory for 10 seconds and
flushes to disk. (this was on a dedicated box, with 4 gb of ram)

This is just writing and not consuming (or sinking in flumes terms).

Curious if there was a reason why flume does 70K/second.

On Tue, May 8, 2012 at 1:36 PM, Mike Percy <mpercy@cloudera.com> wrote:

> The "flow" terminology is something I could have defined better. I've
> updated the config file fragment a bit in the example on the wiki to
> hopefully clarify what was done:
> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements#FlumeNGPerformanceMeasurements-SyslogPerformanceTest20120430
>
> Generally when folks use the term "flow" as it relates to Flume it means
> an independent Source, Channel, Sink combination, which do not interact
> with any other Sources, Channels, or Sinks on the same agent. So each flow
> in this case had its own SyslogTcpSource port, its own MemoryChannel, and
> its own HDFSEventSink output path.
>
> Mike
>
> On May 8, 2012, at 9:38 AM, Arvind Prabhakar wrote:
>
> Hi Ahmed,
>
> >3.  6 flume agent servers that collected data in memory and flushed to
> the 9 hadoop servers
>
> All tests were carried out with a single Flume Agent (which corresponds to
> one JVM process) on a single host. However, each measurement was made with
> a different number of flows passing through this agent.
>
> Thanks,
> Arvind
>
>
> On Tue, May 8, 2012 at 9:08 AM, S Ahmed <sahmed1020@gmail.com> wrote:
>
>> Just want to make sure I understand the setup:
>>
>> 1. 9 hadoop servers that were fed the data
>> 2. 1 server was used to generate the syslog data that was spread accross
>> the 6 flume agent servers
>> 3.  6 flume agent servers that collected data in memory and flushed to
>> the 9 hadoop servers
>>
>> Is that right?
>>
>>
>> On Tue, May 8, 2012 at 1:49 AM, Jarek Jarcec Cecho <jarcec@apache.org>wrote:
>>
>>> Thanks Mike,
>>> this is in deed very helpful!
>>>
>>> Jarcec
>>>
>>> On Mon, May 07, 2012 at 06:55:49PM -0700, Mike Percy wrote:
>>> > Hi folks,
>>> > Will McQueen and I have been doing some Flume NG stress and
>>> performance testing, and we wanted to share some of our recent findings.
>>> The focus of the most recent tests has been on the syslog TCP source,
>>> memory channel, and HDFS sink.
>>> >
>>> > I wrote some software to generate load in syslog format over TCP and
>>> to automate some of the analysis. The first thing we wanted to verify is
>>> that no data was lost during these tests (a.k.a. correctness), with a close
>>> second priority being of course throughput (performance). I used Pig and
>>> AvroStorage from piggybank in the data integrity analysis, and committed
>>> the compiled (0.11 trunk) piggybank jar so the load analysis scripts would
>>> be relatively easy to use. It seems to be compatible with Pig 0.8.1. I am a
>>> little wary of having to maintain that type of thing at the Apache org
>>> level so for now I have checked all the code in on Github under an ASL 2.0
>>> license:
>>> >
>>> > https://github.com/mpercy/flume-load-gen
>>> >
>>> > I have created a Wiki page with the performance metrics we have come
>>> up with so far. The executive summary is that at the time of this writing,
>>> we have observed Flume NG on a single machine processing events at a
>>> throughput rate of 70,000+ events/sec with no data loss.
>>> >
>>> >
>>> https://cwiki.apache.org/confluence/display/FLUME/Flume+NG+Performance+Measurements
>>> >
>>> > I have put more details on the wiki page itself. Please let me know if
>>> you want me to add more detail. I'll be looking into improving the
>>> performance of these components going forward, however we wanted to post
>>> these results to set a public performance baseline of Flume NG.
>>> >
>>> > If others have done performance testing, we would love to see your
>>> results if you can post the details.
>>> >
>>> > Regards,
>>> > Mike
>>> >
>>>
>>
>>
>
>

Mime
View raw message