hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Dyer" <redp...@umd.edu>
Subject Re: computing conditional probabilities with Hadoop?
Date Tue, 02 Oct 2007 03:34:55 GMT
I haven't done any testing.  This is one of those things that I'd
rather see documented in the interface specification before I rely on
it since the stuff I'm working on will be run with vastly different
amounts of data.  But, it's good to know that Merge sort is supposed
to preserve the relative order of 'equal' keys.  It seems like if
that's the case, we might be able to push to get this to be adopted as
a requirement.
Chris

On 10/1/07, Stu Hood <stuhood@webmail.us> wrote:
> Have you done any testing to confirm that the order of the output keys is actually changed?
>
> Merge-sort on its own is a 'stable' algorithm, and so the order should not change unless
different variations on sorting are used (in memory before spilling to disk, for instance).
>
> Thanks,
> Stu
>
>
> -----Original Message-----
> From: Ted Dunning <tdunning@veoh.com>
> Sent: Monday, October 1, 2007 10:32pm
> To: hadoop-user@lucene.apache.org
> Subject: Re: computing conditional probabilities with Hadoop?
>
>
>
> Actually, it would be almost as useful to be able to have a "multi-reduce".
>
> In such a system, you would specify multiple input/map pairs.  The reduce
> function signature would then be something like:
>
>     reduce(WritableComparable key, OutputCollector, Reporter, Iterator ...)
>
> Where the output of each set of maps would be given its own iterator.
>
> I didn't mention this alternative earlier because I figured it would be a
> much bigger leap than just ordering the reduce values.  It would, however,
> be very useful when it comes to co-grouping operations.
>
>
> On 10/1/07 6:17 PM, "Ted Dunning"  wrote:
>
> >
> > This is a common requirement.
> >
> > Left unchanged would be fine but is probably very hard to enforce because of
> > the many map tasks and some uncertainty about which maps finished first.
> > Similarly useful would be the ability to require a particular sort ordering
> > on reduce values.
> >
> >
> > On 10/1/07 6:05 PM, "Chris Dyer"  wrote:
> >
> >> Does anyone know if Hadoop guarantees (can be made to guarantee) that the
> >> relative order of keys that are equal will be left unchanged?
> >
>
>

Mime
View raw message