hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Purtell <apurt...@apache.org>
Subject Re: strategies to share information between mapreduce tasks
Date Wed, 26 Sep 2012 17:36:37 GMT
HBase is usually distributed with Hadoop, well integrated with the
platform, and use of it in MapReduce applications is quite common.

On Wednesday, September 26, 2012, Jane Wayne wrote:

> my problem is more general (than graph problems) and doesn't need to
> have logic built around synchronization or failure. for example, when
> a mapper is finished successfully, it just writes/persists to a
> storage location (could be disk, could be database, could be memory,
> etc...). when the next input is processed (could be on the same mapper
> or different mapper), i just need to do a lookup from the storage
> location (that is accessible by all task nodes). if the mapper fails,
> this doesn't hurt my processing, although i would like for no failures
> (and it's good if hadoop can spawn another task to mitigate).
>
>
>
> On Wed, Sep 26, 2012 at 11:43 AM, Bertrand Dechoux <dechouxb@gmail.com<javascript:;>>
> wrote:
> > The difficulty with data transfer between tasks is handling
> synchronisation
> > and failure.
> > You may want to look at graph processing done on top of Hadoop (like
> > Giraph).
> > That's one way to do it but whether it is relevant or not to you will
> > depend on your context.
> >
> > Regards
> >
> > Bertrand
> >
> > On Wed, Sep 26, 2012 at 5:36 PM, Jane Wayne <jane.wayne2978@gmail.com<javascript:;>
> >wrote:
> >
> >> hi,
> >>
> >> i know that some algorithms cannot be parallelized and adapted to the
> >> mapreduce paradigm. however, i have noticed that in most cases where i
> >> find myself struggling to express an algorithm in mapreduce, the
> >> problem is mainly due to no ability to cross-communicate between
> >> mappers or reducers.
> >>
> >> one naive approach i've seen mentioned here and elsewhere, is to use a
> >> database to store data for use by all the mappers. however, i have
> >> seen many arguments (that i agree with largely) against this approach.
> >>
> >> in general, my question is this: has anyone tried to implement an
> >> algorithm using mapreduce where mappers required cross-communications?
> >> how did you solve this limitation of mapreduce?
> >>
> >> thanks,
> >>
> >> jane.
> >>
> >
> >
> >
> > --
> > Bertrand Dechoux
>


-- 
Best regards,

   - Andy

Problems worthy of attack prove their worth by hitting back. - Piet Hein
(via Tom White)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message