hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: DistributedCache
Date Fri, 12 Dec 2014 04:25:01 GMT
Look at this thread. It has alternatives to DistributedCache.
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api

Basically you can use the new method job.addCacheFiles to pass on stuff to
the individual tasks.

Regards,
Shahab

On Thu, Dec 11, 2014 at 9:07 PM, Srinivas Chamarthi <
srinivas.chamarthi@gmail.com> wrote:
>
> Hi,
>
> I want to cache map/reducer temporary output files so that I can compare
> two map results coming from two different nodes to verify the integrity
> check.
>
> I am simulating this use case with speculative execution by rescheduling
> the first task as soon as it is started and running.
>
> Now I want to compare output files coming from speculative attempt and
> prior attempt so that I can calculate the credit scoring of each node.
>
> I want to use DistributedCache to cache the local file system files in
> CommitPending stage from TaskImpl. But the DistributedCache is actually
> deprecated. is there any other way I can do this ?
>
> I think I can use HDFS to save the temporary output files so that other
> nodes can see it ? but is there any in-memory solution I can use ?
>
> any pointers are greatly appreciated.
>
> thx & rgds,
> srinivas chamarthi
>

Mime
View raw message