hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <tdunn...@veoh.com>
Subject Re: computing conditional probabilities with Hadoop?
Date Tue, 02 Oct 2007 02:32:06 GMT

Actually, it would be almost as useful to be able to have a "multi-reduce".

In such a system, you would specify multiple input/map pairs.  The reduce
function signature would then be something like:

    reduce(WritableComparable key, OutputCollector, Reporter, Iterator ...)

Where the output of each set of maps would be given its own iterator.

I didn't mention this alternative earlier because I figured it would be a
much bigger leap than just ordering the reduce values.  It would, however,
be very useful when it comes to co-grouping operations.

On 10/1/07 6:17 PM, "Ted Dunning" <tdunning@veoh.com> wrote:

> This is a common requirement.
> Left unchanged would be fine but is probably very hard to enforce because of
> the many map tasks and some uncertainty about which maps finished first.
> Similarly useful would be the ability to require a particular sort ordering
> on reduce values.
> On 10/1/07 6:05 PM, "Chris Dyer" <redpony@umd.edu> wrote:
>> Does anyone know if Hadoop guarantees (can be made to guarantee) that the
>> relative order of keys that are equal will be left unchanged?

View raw message