hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: what is the major difference between Hadoop and cloudMapReduce?
Date Sun, 29 Nov 2009 18:15:01 GMT
Another difference is that Cloud Mapreduce doesn't scale well when there are
a large number of values for the same key. They collect in a single SQS
queue per partition, and in the reducer reads that into a hashtable that
never spills to disk. So, if you have more values for a key than fit in RAM,
you can't run a job. Hadoop on the other hand spills reduce input to disk
when it's too large to sort in memory. The paper says to add more partitions
if you have a lot of data, but it doesn't help in the case of skewed reduce
input (very common in a lot of workloads)

-Todd

On Sun, Nov 29, 2009 at 8:16 AM, Bruce Snyder <bruce.snyder@gmail.com>wrote:

> On Sun, Nov 29, 2009 at 3:13 AM, Tim Robertson
> <timrobertson100@gmail.com> wrote:
> > Hi Tom,
> >
> > One major difference is that Hadoop allows working with any object
> > that you can serialize, rather than only Strings so it is applicable
> > for use with (e.g) image processing, or rendering to images or PDFs on
> > the output format.
>
> Yes, you noted this on the general@hadoop list. I think your
> suggestion to expand this to support other types is a good one.
>
> Bruce
> --
> perl -e 'print
> unpack("u30","D0G)U8V4\@4VYY9&5R\"F)R=6-E+G-N>61E<D\!G;6%I;\"YC;VT*"
> );'
>
> ActiveMQ in Action: http://bit.ly/2je6cQ
> Blog: http://bruceblog.org/
> Twitter: http://twitter.com/brucesnyder
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message