hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: strategies to share information between mapreduce tasks
Date Wed, 26 Sep 2012 16:37:57 GMT
Apache Giraph is a framework for graph processing, currently runs over
"MR" (but is getting its own coordination via YARN soon):

You may also checkout the generic BSP system (Giraph uses BSP too, if
am not wrong, but doesn't use Hama - works over MR instead), Apache
Hama: http://hama.apache.org

On Wed, Sep 26, 2012 at 9:51 PM, Jane Wayne <jane.wayne2978@gmail.com> wrote:
> i'll look for myself, but could you please let me know what is giraph?
> is it another layer on hadoop like hive/pig or an api like mahout?
> On Wed, Sep 26, 2012 at 12:09 PM, Jonathan Bishop <jbishop.rwc@gmail.com> wrote:
>> Yes, Giraph seems like the best way to go - it is mainly a vertex
>> evaluation with message passing between vertices. Synchronization is
>> handled for you.
>> On Wed, Sep 26, 2012 at 8:36 AM, Jane Wayne <jane.wayne2978@gmail.com>wrote:
>>> hi,
>>> i know that some algorithms cannot be parallelized and adapted to the
>>> mapreduce paradigm. however, i have noticed that in most cases where i
>>> find myself struggling to express an algorithm in mapreduce, the
>>> problem is mainly due to no ability to cross-communicate between
>>> mappers or reducers.
>>> one naive approach i've seen mentioned here and elsewhere, is to use a
>>> database to store data for use by all the mappers. however, i have
>>> seen many arguments (that i agree with largely) against this approach.
>>> in general, my question is this: has anyone tried to implement an
>>> algorithm using mapreduce where mappers required cross-communications?
>>> how did you solve this limitation of mapreduce?
>>> thanks,
>>> jane.

Harsh J

View raw message