incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Yang <>
Subject Re: SocketTeeWriter
Date Tue, 11 May 2010 18:03:00 GMT

Multiple collectors will improve the mapper processing speed, but the
reducer is still the long tail of the demux processing. It sounds like you
have large amount of same type of data.  It will definitely speed up your
processing once CHUKWA-481 is addressed.


On 5/10/10 7:34 PM, "Corbin Hoenes" <> wrote:

> We are processing apache log files.    The current scale is 70-80GB per
> day...but we'd like it to have a story for scaling up to move. Just checking
> my collector logs it appears the data rate is still ranges from 600KB-1.2 MB.
> This is all from one collector.  Does your setup use multiple collectors?  My
> thought is that multiple collectors could be used to scale out once we reach a
> data rate that caused issues for a single collector.
> Any chance you know where that data rate is?
> On May 10, 2010, at 5:37 PM, Ariel Rabkin wrote:
>> That's how we use it at Berkeley, to process metrics from hundreds of
>> machines; total data rate less than a megabyte per second, though.
>> What scale of data are you looking at?
>> The intent of SocketTee was if you need some subset of the data now,
>> while write-to-HDFS-and-process-with-Hadoop is still the default path.
>> What sort of low-latency processing do you need?
>> --Ari
>> On Mon, May 10, 2010 at 4:28 PM, Corbin Hoenes <> wrote:
>>> Has anyone used the "Tee" in a larger scale deployment to try to get
>>> real-time/low latency data?  Interested in how feasible it would be to use
>>> it to pipe data into another system to handle these low latency requests and
>>> leave the long term analysis to hadoop.
>> -- 
>> Ari Rabkin
>> UC Berkeley Computer Science Department

View raw message