hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <huan....@accenture.com>
Subject RE: what is the major difference between Hadoop and cloudMapReduce?
Date Thu, 03 Dec 2009 17:39:41 GMT
> -----Original Message-----
> From: Todd Lipcon [mailto:todd@cloudera.com]
> Sent: Monday, November 30, 2009 8:15 AM
> To: general@hadoop.apache.org
> Subject: Re: what is the major difference between Hadoop and
> cloudMapReduce?
> On Mon, Nov 30, 2009 at 1:48 AM, <huan.liu@accenture.com> wrote:
> > Todd,
> >
> > We do not keep all values for a key in memory. Instead, we only keep
> the
> > partial reduce result in memory, but throw away the value as soon as
> it is
> > used. The point you raised is still very valid if the reduce state
> > maintained per key is large, which I hope is a very rare case. If you
> have
> > some concrete workload examples, it will help us prioritize the
> development
> > effort. I can definitely see the benefits of introducing a paging
> mechanism
> > to spill partial reduce results to the output SQS queue in the future.
> > Thanks.
> >
> Hi Huan,
> I guess I misremembered or misread the paper.
> Given this technique, doesn't it mean that reducers can only work when
> commutative and associative?
> -Todd


I do not see how it is different from Hadoop's iterator interface, unless the reduce function
relies on the fact that the values are sorted in a particular order when fed by the iterator
one at a time. 

If there is no assumption on the value ordering, or the ordering expected is different from
what the iterator presents, the reduce function has to read in all values from the iterator
first (page to disk if necessary), rearrange them as necessary, then process based on that
new ordering. This will be the same as what we will do in our iterator interface. In the next()
function, our reduce function can read in all values from the iterator (page to disk if necessary),
then in the finish() function, our reduce function rearranges the ordering and process based
on the new ordering. 

Apologies for the late reply. Traveling in china with no reliable network connection this
week. Thanks.


This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise private information.  If you have received it in error, please notify the sender
immediately and delete the original.  Any other use of the email by you is prohibited.

View raw message