incubator-chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <>
Subject Re: SocketTeeWriter
Date Tue, 11 May 2010 02:34:49 GMT
We are processing apache log files.    The current scale is 70-80GB per day...but we'd like
it to have a story for scaling up to move. Just checking my collector logs it appears the
data rate is still ranges from 600KB-1.2 MB.    This is all from one collector.  Does your
setup use multiple collectors?  My thought is that multiple collectors could be used to scale
out once we reach a data rate that caused issues for a single collector.

Any chance you know where that data rate is?

On May 10, 2010, at 5:37 PM, Ariel Rabkin wrote:

> That's how we use it at Berkeley, to process metrics from hundreds of
> machines; total data rate less than a megabyte per second, though.
> What scale of data are you looking at?
> The intent of SocketTee was if you need some subset of the data now,
> while write-to-HDFS-and-process-with-Hadoop is still the default path.
> What sort of low-latency processing do you need?
> --Ari
> On Mon, May 10, 2010 at 4:28 PM, Corbin Hoenes <> wrote:
>> Has anyone used the "Tee" in a larger scale deployment to try to get real-time/low
latency data?  Interested in how feasible it would be to use it to pipe data into another
system to handle these low latency requests and leave the long term analysis to hadoop.
> -- 
> Ari Rabkin
> UC Berkeley Computer Science Department

View raw message