hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: what is the major difference between Hadoop and cloudMapReduce?
Date Mon, 30 Nov 2009 16:15:27 GMT
On Mon, Nov 30, 2009 at 1:48 AM, <huan.liu@accenture.com> wrote:

> Todd,
>
> We do not keep all values for a key in memory. Instead, we only keep the
> partial reduce result in memory, but throw away the value as soon as it is
> used. The point you raised is still very valid if the reduce state
> maintained per key is large, which I hope is a very rare case. If you have
> some concrete workload examples, it will help us prioritize the development
> effort. I can definitely see the benefits of introducing a paging mechanism
> to spill partial reduce results to the output SQS queue in the future.
> Thanks.
>

Hi Huan,

I guess I misremembered or misread the paper.

Given this technique, doesn't it mean that reducers can only work when
commutative and associative?

-Todd


>
> > -----Original Message-----
> > From: Todd Lipcon [mailto:todd@cloudera.com]
> > Sent: Sunday, November 29, 2009 10:15 AM
> > To: general@hadoop.apache.org
> > Subject: Re: what is the major difference between Hadoop and
> > cloudMapReduce?
> >
> > Another difference is that Cloud Mapreduce doesn't scale well when
> > there are
> > a large number of values for the same key. They collect in a single SQS
> > queue per partition, and in the reducer reads that into a hashtable
> > that
> > never spills to disk. So, if you have more values for a key than fit in
> > RAM,
> > you can't run a job. Hadoop on the other hand spills reduce input to
> > disk
> > when it's too large to sort in memory. The paper says to add more
> > partitions
> > if you have a lot of data, but it doesn't help in the case of skewed
> > reduce
> > input (very common in a lot of workloads)
> >
> > -Todd
> >
> > On Sun, Nov 29, 2009 at 8:16 AM, Bruce Snyder
> > <bruce.snyder@gmail.com>wrote:
> >
> > > On Sun, Nov 29, 2009 at 3:13 AM, Tim Robertson
> > > <timrobertson100@gmail.com> wrote:
> > > > Hi Tom,
> > > >
> > > > One major difference is that Hadoop allows working with any object
> > > > that you can serialize, rather than only Strings so it is
> > applicable
> > > > for use with (e.g) image processing, or rendering to images or PDFs
> > on
> > > > the output format.
> > >
> > > Yes, you noted this on the general@hadoop list. I think your
> > > suggestion to expand this to support other types is a good one.
> > >
> > > Bruce
> > > --
> > > perl -e 'print
> > > unpack("u30","D0G)U8V4\@4VYY9&5R\"F)R=6-E+G-N>61E<D\!G;6%I;\"YC;VT*"
> > > );'
> > >
> > > ActiveMQ in Action: http://bit.ly/2je6cQ
> > > Blog: http://bruceblog.org/
> > > Twitter: http://twitter.com/brucesnyder
> > >
>
>
> This message is for the designated recipient only and may contain
> privileged, proprietary, or otherwise private information.  If you have
> received it in error, please notify the sender immediately and delete the
> original.  Any other use of the email by you is prohibited.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message