hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Milstein <dmilst...@hubteam.com>
Subject Re: Writing a New Aggregate Function
Date Fri, 24 Apr 2009 17:24:24 GMT
Runping,

Thanks for the response.  A question about case (2) below, (which is,  
in fact, what I want to do):

  - Is there any way to do this without patching the code within the  
aggregator package?

It sure doesn't look like it, but just to make sure.

Thanks again,
-Dan M

On Apr 24, 2009, at 12:56 PM, Runping Qi wrote:

> A couple of general goals behind of the aggregate package:
>
> 1. If you are application developers using aggregate package, you  
> only need
> to develop your own (user defined) valuator descriptor classes,  
> which are
> typically sub class of ValueAggregatorDescriptor. You can use
> the existing aggregator types (such as  LongValueSum,  
> ValueHistogram, etc.)
>
> 2. If you want to contribute new types of aggregator (for example, an
> ValueAverage class that keeps track the average of values will be a  
> much
> needed one), then you need to implement a class tham implements
> ValueAggregator class, and to update the generateValueAggregator  
> method of
> ValueAggregatorBaseDescriptor to handle your new aggregators.
>
> 3. If you want to contribute to the aggregate framework itsself, you  
> may
> need to touch every bit of the code in the package.
>
> Runping
>
>
>
> On Thu, Apr 23, 2009 at 1:44 PM, Dan Milstein  
> <dmilstein@hubteam.com> wrote:
>
>> Hello all,
>>
>> I've been using streaming + the aggregate package (available via - 
>> reducer
>> aggregate), and have been very happy with what it gives me.
>>
>> I'm interested in writing my own new aggregate functions (in Java)  
>> which I
>> could then access from my streaming code.
>>
>> Can anyone give me pointers towards how to make that happen?  I've  
>> read
>> through the aggregate package source, but I'm not seeing how to  
>> define my
>> own, and get access to it from streaming.
>>
>> To be specific, here's the sort of thing I'd like to be able to do:
>>
>> - In Java, define a SampleValues aggregator, which chooses a sample  
>> of the
>> input given to it
>>
>> - From my streaming program, in say python, output:
>>
>> SampleValues:some_key \t some_value
>>
>> - Have the aggregate framework somehow call my new aggregator for the
>> combiner and reducer steps
>>
>> Thanks,
>> -Dan Milstein
>>


Mime
View raw message