hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stu Hood" <stuh...@webmail.us>
Subject Re: computing conditional probabilities with Hadoop?
Date Tue, 02 Oct 2007 03:24:37 GMT
Have you done any testing to confirm that the order of the output keys is actually changed?

Merge-sort on its own is a 'stable' algorithm, and so the order should not change unless different
variations on sorting are used (in memory before spilling to disk, for instance).

Thanks,
Stu


-----Original Message-----
From: Ted Dunning <tdunning@veoh.com>
Sent: Monday, October 1, 2007 10:32pm
To: hadoop-user@lucene.apache.org
Subject: Re: computing conditional probabilities with Hadoop?



Actually, it would be almost as useful to be able to have a "multi-reduce".

In such a system, you would specify multiple input/map pairs.  The reduce
function signature would then be something like:

    reduce(WritableComparable key, OutputCollector, Reporter, Iterator ...)

Where the output of each set of maps would be given its own iterator.

I didn't mention this alternative earlier because I figured it would be a
much bigger leap than just ordering the reduce values.  It would, however,
be very useful when it comes to co-grouping operations.


On 10/1/07 6:17 PM, "Ted Dunning"  wrote:

> 
> This is a common requirement.
> 
> Left unchanged would be fine but is probably very hard to enforce because of
> the many map tasks and some uncertainty about which maps finished first.
> Similarly useful would be the ability to require a particular sort ordering
> on reduce values.
> 
> 
> On 10/1/07 6:05 PM, "Chris Dyer"  wrote:
> 
>> Does anyone know if Hadoop guarantees (can be made to guarantee) that the
>> relative order of keys that are equal will be left unchanged?
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message