hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: DistributedCache
Date Fri, 12 Dec 2014 04:25:01 GMT
Look at this thread. It has alternatives to DistributedCache.

Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.


On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
> Hi,
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
> any pointers are greatly appreciated.
> thx & rgds,
> srinivas chamarthi

View raw message