hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Jungblut <thomas.jungb...@gmail.com>
Subject Re: Implementing Hadoop map-reduce on Hama
Date Thu, 11 Oct 2012 17:57:47 GMT
Thanks you two for bringing up that discussion.

Personally I have a very strong opinion on that, I think that building a
MapReduce solution on top of BSP is useless.
We had nearly ten years of development in this paradigm and it has grown
and specialized itself very much.
You can express MapReduce in BSP, that's totally fine. But that does not
mean that every MapReduce algorithm is automagically efficient on BSP.
There was (and still is) lots of development on the MapReduce engine and
you can't cope with that on a more abstract paradigm.

But, of course there are things where MapReduce is inefficient (iterative
jobs, grouping, no explicit output caching).
Yeah grouping, actually grouping is the main part of reducing, but it is
solved inefficiently in Hadoop.
You are forced to sort and that's (when I recall your paper correctly) also
a drawback which lead you to implement mrql with BSP, because grouping by
hash is for several cases much more faster and sometimes also more
It's funny because the original paper [1] suggested that they just have
sort as a nice feature to build an inverted index and to do binary search
on the tokens. So it's more of a nice side-effect than the real design of
the system.

All in all, it does not mean that I am not interested in providing such
functionality in Hama, but I'm sure that we should invest our time more
carefully on features that bring value to the users (improving message
scalability, improve performance, provide more examples and algorithms, do
talks and presentations) than coding a half baked solution that is easily
outperformed by the normal MapReduce.
It was never my intention to "kill" Hadoop by developing with Hama, but to
improve certain use cases that can not be done efficiently in MapReduce.
So if it's just 1k lines and it is not a half-baked solution, feel free to
contribute your stuff.

[1] http://research.google.com/archive/mapreduce.html

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message