chukwa-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Corbin Hoenes <>
Subject Re: speeding up demux?
Date Tue, 11 May 2010 18:17:57 GMT

We currently process only a single DataType.  We process almost exclusively only apache log
files and handle them with only one data type. 

I believe we are more interested in case #1.  We have lots of a single type of data coming
in very quickly.
But I also agree longer term with Jerome's comment that having it be pluggable like the Processor
class is ideal.  

On May 10, 2010, at 5:40 PM, Ariel Rabkin wrote:

> On Mon, May 10, 2010 at 4:39 PM, Ariel Rabkin <> wrote:
>> Can you say a bit about where your bottleneck is?  Is there one reduce
>> that's taking a very long time? Can you check the logs and see which
>> datatype that reducer is dealing with?  There was some discussion of
>> this on JIRA recently; consensus is that our current partitioner works
>> well if you have a wide variety of datatypes, none of which is too
>> big, and badly if you have one or two datatypes with lots of data in
>> each.
> I forgot to add -- the JIRA you should follow is
> We'd love to get feedback on what a more sensible approach would be
> for handling your use case.
> --Ari
> -- 
> Ari Rabkin
> UC Berkeley Computer Science Department

View raw message