hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Runping Qi <runping...@gmail.com>
Subject Re: Writing a New Aggregate Function
Date Fri, 24 Apr 2009 19:10:25 GMT
You are right; you have to patch the code  in the aggregate package.


On Fri, Apr 24, 2009 at 10:24 AM, Dan Milstein <dmilstein@hubteam.com>wrote:

> Runping,
>
> Thanks for the response.  A question about case (2) below, (which is, in
> fact, what I want to do):
>
>  - Is there any way to do this without patching the code within the
> aggregator package?
>
> It sure doesn't look like it, but just to make sure.
>
> Thanks again,
> -Dan M
>
>
> On Apr 24, 2009, at 12:56 PM, Runping Qi wrote:
>
>  A couple of general goals behind of the aggregate package:
>>
>> 1. If you are application developers using aggregate package, you only
>> need
>> to develop your own (user defined) valuator descriptor classes, which are
>> typically sub class of ValueAggregatorDescriptor. You can use
>> the existing aggregator types (such as  LongValueSum, ValueHistogram,
>> etc.)
>>
>> 2. If you want to contribute new types of aggregator (for example, an
>> ValueAverage class that keeps track the average of values will be a much
>> needed one), then you need to implement a class tham implements
>> ValueAggregator class, and to update the generateValueAggregator method of
>> ValueAggregatorBaseDescriptor to handle your new aggregators.
>>
>> 3. If you want to contribute to the aggregate framework itsself, you may
>> need to touch every bit of the code in the package.
>>
>> Runping
>>
>>
>>
>> On Thu, Apr 23, 2009 at 1:44 PM, Dan Milstein <dmilstein@hubteam.com>
>> wrote:
>>
>>  Hello all,
>>>
>>> I've been using streaming + the aggregate package (available via -reducer
>>> aggregate), and have been very happy with what it gives me.
>>>
>>> I'm interested in writing my own new aggregate functions (in Java) which
>>> I
>>> could then access from my streaming code.
>>>
>>> Can anyone give me pointers towards how to make that happen?  I've read
>>> through the aggregate package source, but I'm not seeing how to define my
>>> own, and get access to it from streaming.
>>>
>>> To be specific, here's the sort of thing I'd like to be able to do:
>>>
>>> - In Java, define a SampleValues aggregator, which chooses a sample of
>>> the
>>> input given to it
>>>
>>> - From my streaming program, in say python, output:
>>>
>>> SampleValues:some_key \t some_value
>>>
>>> - Have the aggregate framework somehow call my new aggregator for the
>>> combiner and reducer steps
>>>
>>> Thanks,
>>> -Dan Milstein
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message