hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arko Provo Mukherjee <arkoprovomukher...@gmail.com>
Subject Re: Iterative MR issue
Date Wed, 12 Oct 2011 08:00:03 GMT
Hi,

I solved it by creating a new JobConf instance for each iteration in the loop.

Thanks & regards
Arko

On Oct 12, 2011, at 1:54 AM, Arko Provo Mukherjee <arkoprovomukherjee@gmail.com> wrote:

> Hello Everyone,
> 
> I have a particular situation, where I am trying to run Iterative Map-Reduce, where the
output files for one iteration are the input files for the next. 
> It stops when there are no new files created in the output.
> 
> Code Snippet:
> 
> int round = 0;
> JobConf jobconf = new JobConf(new Configuration(), MyClass.class);
> 
> do  {
> 
> String old_path = "path_" + Integer.toString(round);
> 
> round = round + 1;
> 
> String new_path = "path" + Integer.toString(round);
> 
> FileInputFormat.addInputPath ( jobconf, new Path (old_file) );  
> 
> FileInputFormat.setInputPath ( jobconf, new Path (new_file) );   // These will eventually
become directories containing multiple files
> 
> jobconf.setMapperClass(MyMap.class);
> 
> jobconf.setReducerClass(MyReduce.class);
> 
> // Other code
> 
> JobClient.runJob(jobconf);
> 
> FileStatus[] directory = fs.listStatus ( new Path ( new_file ) );  // To check for any
new files in the output directory
> 
> } while ( directory.length != 0 );  // Stop iteration only when no new files are generated
in the output path
> 
> 
> 
> The code runs smoothly in the first round and I can see the new directory path_1 getting
created and files added in it from the Reducer output. 
> 
> The original path_0 is created from before by me and I have added relevant files in it.

> 
> The output files seems to have the correct data as per my Map/Reduce logic.
> 
> However, in the second round it fails with the following exception.
> 
> In 0.19 (In a cloud system - Fully Distributed Mode)
> 
> java.lang.IllegalArgumentException: Wrong FS: hdfs://cloud_hostname:9000/hadoop/tmp/hadoop/mapred/system/job_201106271322_9494/job.jar,
expected: file:///
> 
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:322)
> 
> 
> 
> In 0.20.203 (my own system and not a cloud - Pseudo Distributed Mode)
> 
> 11/10/12 00:35:42 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0002
> 
> Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0001/job.jar,
expected: file:///
> at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354)
> 
> It seems that Hadoop is not being able to delete the staging file for the job.
> 
> Can you please suggest any reason for this? Please help!
> 
> Thanks a lot in advance!
> 
> Warm regards
> Arko
> 
> 
> 
> 
> 

Mime
View raw message