hama-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Implementing Hadoop map-reduce on Hama
Date Fri, 12 Oct 2012 01:06:29 GMT
Hot! I want to discuss only about in-memory. I many heard about
in-memory stuffs.

In fact, a data loading phase will be duplicated in every BSP job that
requires in-memory processing. I don't like idea of implementing
MapReduce on top of BSP but, I think we can consider about novel
model.

On Fri, Oct 12, 2012 at 4:03 AM, Leonidas Fegaras <fegaras@cse.uta.edu> wrote:
> OK. Since this is already work in progress by Apurv and it's not a
> high-priority
> by the Hama team, I will not pursue it any further.
> Leonidas
>
>
>
> On Oct 11, 2012, at 12:57 PM, Thomas Jungblut wrote:
>
>> Thanks you two for bringing up that discussion.
>>
>> Personally I have a very strong opinion on that, I think that building a
>> MapReduce solution on top of BSP is useless.
>> We had nearly ten years of development in this paradigm and it has grown
>> and specialized itself very much.
>> You can express MapReduce in BSP, that's totally fine. But that does not
>> mean that every MapReduce algorithm is automagically efficient on BSP.
>> There was (and still is) lots of development on the MapReduce engine and
>> you can't cope with that on a more abstract paradigm.
>>
>> But, of course there are things where MapReduce is inefficient (iterative
>> jobs, grouping, no explicit output caching).
>> Yeah grouping, actually grouping is the main part of reducing, but it is
>> solved inefficiently in Hadoop.
>> You are forced to sort and that's (when I recall your paper correctly)
>> also
>> a drawback which lead you to implement mrql with BSP, because grouping by
>> hash is for several cases much more faster and sometimes also more
>> efficient.
>> It's funny because the original paper [1] suggested that they just have
>> sort as a nice feature to build an inverted index and to do binary search
>> on the tokens. So it's more of a nice side-effect than the real design of
>> the system.
>>
>> All in all, it does not mean that I am not interested in providing such
>> functionality in Hama, but I'm sure that we should invest our time more
>> carefully on features that bring value to the users (improving message
>> scalability, improve performance, provide more examples and algorithms, do
>> talks and presentations) than coding a half baked solution that is easily
>> outperformed by the normal MapReduce.
>> It was never my intention to "kill" Hadoop by developing with Hama, but to
>> improve certain use cases that can not be done efficiently in MapReduce.
>> So if it's just 1k lines and it is not a half-baked solution, feel free to
>> contribute your stuff.
>>
>> [1] http://research.google.com/archive/mapreduce.html
>
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Mime
View raw message