hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siddharth Dawar <siddharthdawa...@gmail.com>
Subject How to share files amongst multiple jobs using Distributed Cache in Hadoop 2.7.2
Date Tue, 07 Jun 2016 09:17:29 GMT

I wrote a program which creates Map-Reduce jobs in an iterative fashion as

while (true) {

JobConf conf2  = new JobConf(getConf(),graphMining.class);

new Path(input));FileOutputFormat.setOutputPath(conf2, new
Path(output)); }

RunningJob job = JobClient.runJob(conf2);

Now, I want the first Job which gets created to write something in the
distributed cache and the jobs which get created after the first job to
read from the distributed cache.

I came to know that the DistributedCache.addcacheFiles() method is
deprecated, so the documentation suggests to use Job.addcacheFiles() method
specific for each job.

But, I am unable to get an handle of the currently running job, as
JobClient.runJob(conf2) submits a job internally.

How can I share the content written by the first job in this while loop
available via distributed cache to other jobs which get created in later
iterations of while loop ?

View raw message