hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bertrand Dechoux <decho...@gmail.com>
Subject Re: strategies to share information between mapreduce tasks
Date Wed, 26 Sep 2012 15:43:14 GMT
The difficulty with data transfer between tasks is handling synchronisation
and failure.
You may want to look at graph processing done on top of Hadoop (like
Giraph).
That's one way to do it but whether it is relevant or not to you will
depend on your context.

Regards

Bertrand

On Wed, Sep 26, 2012 at 5:36 PM, Jane Wayne <jane.wayne2978@gmail.com>wrote:

> hi,
>
> i know that some algorithms cannot be parallelized and adapted to the
> mapreduce paradigm. however, i have noticed that in most cases where i
> find myself struggling to express an algorithm in mapreduce, the
> problem is mainly due to no ability to cross-communicate between
> mappers or reducers.
>
> one naive approach i've seen mentioned here and elsewhere, is to use a
> database to store data for use by all the mappers. however, i have
> seen many arguments (that i agree with largely) against this approach.
>
> in general, my question is this: has anyone tried to implement an
> algorithm using mapreduce where mappers required cross-communications?
> how did you solve this limitation of mapreduce?
>
> thanks,
>
> jane.
>



-- 
Bertrand Dechoux

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message