hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Re: Introducing Cloud MapReduce
Date Fri, 27 Nov 2009 18:34:35 GMT
Hi Bruce,

Interesting stuff.  It looks like it only works with String Keys and
Values (possibly a reason you see large performance gains with simpler
serialization requirements) - have you any plans to support other
types in the roadmap? Perhaps with Protobufs or Avro serialization?

Have you considered using the same Mapper and Reducer class and method
signatures so that a MR job could be written and launched against
CloudMR or Hadoop?
[The reason I ask is I am hacking an implementation of a single JVM,
multithreaded MR framework that does this, so I can ship the same
analysis I run on large datasets to people to run on small datasets
also (much lighter than Hadoop but compatible).  I am putting it on
google code mapreduce4j.  It might be nice if there was a standard MR
API that people coded against and launched in various frameworks -
just a thought].


On Fri, Nov 27, 2009 at 6:41 PM, Bruce Snyder <bruce.snyder@gmail.com> wrote:
> Given that Hadoop is the dominant project at the ASF for
> cloud-oriented Java applications, I'd like to introduce a new project
> in the open source world named Cloud MapReduce. Below are the
> pertinent details for the project followed by some info describing why
> I'm posting on this list.
> What: Created by Huan Liu and Dan Orban at Accenture Labs, Cloud
> MapReduce is a MapReduce framework built on top of the Amazon
> Infrastructure services. It utilizes these services (Amazon EC2,
> Amazon S3, Amazon SQS and Amazon SimpleDB) to create a unique
> MapReduce framework with some distinct advantages. Cloud MapReduce
> utilizes the Apache License 2.0.
> Where: The code base currently lives in a GoogleCode project so it is
> available for download by anyone:
> http://code.google.com/p/cloudmapreduce/
> There is also more information about Cloud MapReduce at the website,
> including a detailed technical report.
> Why: Cloud MapReduce offers three primary advantages over other cloud
> frameworks:
> 1) It is faster than other implementations (e.g., 60 times faster than
> Hadoop in one case. Speedup depends on the application and data.).
> 2) It is more scalable and more failure resistant because it has no
> single point of bottleneck.
> 3) It is dramatically simpler with only 3,000 lines of code (e.g., two
> orders of magnitude simpler than Hadoop).
> Cloud MapReduce is a new framework that is seeking a good home.
> Creator Huan Liu came to me because, in his words, he 'wants to build
> a community around Cloud MapReduce.' What better place to do this than
> the ASF! In speaking with Doug Cutting and others at ApacheCon US a
> few weeks ago, we agreed that the best approach to begin would be via
> this list in order to get the best breadth possible across the Apache
> cloud-related community.
> In finding a home at the ASF for Cloud MapReduce, we are aware that it
> does not complement Hadoop really, so making it a subproject of Hadoop
> is not necessarily the best approach. Another approach would be to
> take it through the Incubator in order to eventually graduate as a top
> level project.
> Exposure to a wider audience is always the best way to grow an idea
> and to explore corners that we may have overlooked. The intent of this
> message is to start a conversation around Cloud MapReduce, ask
> question, discuss its future and generally figure out how best to
> bring it to the ASF. Hopefully the information here is enough for
> folks to begin asking questions.
> Bruce
> --
> perl -e 'print unpack("u30","D0G)U8V4\@4VYY9&5R\"F)R=6-E+G-N>61E<D\!G;6%I;\"YC;VT*"
> );'
> ActiveMQ in Action: http://bit.ly/2je6cQ
> Blog: http://bruceblog.org/
> Twitter: http://twitter.com/brucesnyder

View raw message