hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: What are the methods to share dynamic data among mappers/reducers?
Date Thu, 02 Jan 2014 18:21:03 GMT

There isn't anything natively supported for that in the framework, but you can do that yourselves
by using a shared service (for e.g via HDFS files, ZooKeeper nodes) that mappers/reducers
all have access to.

More details on your usecase? In any case, once you start making mappers and reducers depend
on either externally changing state or inter-dependence, you may be breaking fundamental assumptions
of MapReduce - embarrassingly parallel computation (limiting scalability) and/or idempotency
(affecting retries during failures).


On Jan 2, 2014, at 1:42 AM, sam liu <samliuhadoop@gmail.com> wrote:

> Hi,
> As I know, the Distributed Cache will copy the shared data to the slaves before starting
job, and won't change the shared data after that. 
> So are there any solutions to share dynamic data among mappers/reducers?
> Thanks!

NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.

View raw message