hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <huan....@accenture.com>
Subject RE: Introducing Cloud MapReduce
Date Mon, 30 Nov 2009 09:35:20 GMT
> From: Bruce Snyder [mailto:bruce.snyder@gmail.com]
> Sent: Friday, November 27, 2009 1:19 PM
> To: general@hadoop.apache.org
> Subject: Re: Introducing Cloud MapReduce
> On Fri, Nov 27, 2009 at 11:34 AM, Tim Robertson
> <timrobertson100@gmail.com> wrote:
> > Hi Bruce,
> >
> > Interesting stuff.  It looks like it only works with String Keys and
> > Values (possibly a reason you see large performance gains with
> simpler
> > serialization requirements) - have you any plans to support other
> > types in the roadmap? Perhaps with Protobufs or Avro serialization?
> I can definitely see the need for supporting more than one type of
> key/value. We'll need to add this to the roadmap, thanks.

Added to the roadmap. Thanks for the suggestion.
> > Have you considered using the same Mapper and Reducer class and
> method
> > signatures so that a MR job could be written and launched against
> > CloudMR or Hadoop?
> > [The reason I ask is I am hacking an implementation of a single JVM,
> > multithreaded MR framework that does this, so I can ship the same
> > analysis I run on large datasets to people to run on small datasets
> > also (much lighter than Hadoop but compatible).  I am putting it on
> > google code mapreduce4j.  It might be nice if there was a standard MR
> > API that people coded against and launched in various frameworks -
> > just a thought].
> Excellent points, Tim. A point that I raised was trying to determine a
> standard sort of API that would work across implementations. So far
> I'm not sure if this means that we match what is already out there or
> if we provide interfaces for many languages or what. I like the idea,
> but I'll let Huan chime in here.

The eventual goal is to be able to run a Hadoop job as it is in Cloud MapReduce, so that one
can easily port his/her application to a different platform, but we are some distance away
from that. To be fully compatible, it requires us to implement/fake a lot of interfaces in
Hadoop, such as the different input readers and the configuration mechanism (many parameters
are not needed in CMR). We will look into the mapreduce4j implementation to see what we can
leverage. Thanks for the excellent suggestion.


This message is for the designated recipient only and may contain privileged, proprietary,
or otherwise private information.  If you have received it in error, please notify the sender
immediately and delete the original.  Any other use of the email by you is prohibited.

View raw message