hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jason hadoop <jason.had...@gmail.com>
Subject Re: Writing a New Aggregate Function
Date Thu, 23 Apr 2009 23:45:27 GMT
It really isn't documented anywhere. There is a small section in my book in
ch08 about it. It didn't make the alpha that is up of ch08 though.

On Thu, Apr 23, 2009 at 1:44 PM, Dan Milstein <dmilstein@hubteam.com> wrote:

> Hello all,
> I've been using streaming + the aggregate package (available via -reducer
> aggregate), and have been very happy with what it gives me.
> I'm interested in writing my own new aggregate functions (in Java) which I
> could then access from my streaming code.
> Can anyone give me pointers towards how to make that happen?  I've read
> through the aggregate package source, but I'm not seeing how to define my
> own, and get access to it from streaming.
> To be specific, here's the sort of thing I'd like to be able to do:
>  - In Java, define a SampleValues aggregator, which chooses a sample of the
> input given to it
>  - From my streaming program, in say python, output:
> SampleValues:some_key \t some_value
>  - Have the aggregate framework somehow call my new aggregator for the
> combiner and reducer steps
> Thanks,
> -Dan Milstein

Alpha Chapters of my book on Hadoop are available

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message