Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 524A77401 for ; Wed, 12 Oct 2011 08:00:46 +0000 (UTC) Received: (qmail 54928 invoked by uid 500); 12 Oct 2011 08:00:43 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 54882 invoked by uid 500); 12 Oct 2011 08:00:43 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 54865 invoked by uid 99); 12 Oct 2011 08:00:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2011 08:00:41 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of arkoprovomukherjee@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2011 08:00:31 +0000 Received: by iakh37 with SMTP id h37so755457iak.35 for ; Wed, 12 Oct 2011 01:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=references:in-reply-to:mime-version:content-transfer-encoding :content-type:message-id:cc:x-mailer:from:subject:date:to; bh=hGYh/0d8MrDbPK86kuhWlACjFuOJgqo5vKfVVc2KGpI=; b=CW8KUR42tc3htT9vsFWkgNK4nLr7iBZu4WeZ3Enx5v4gXyht5Vg8Padqd4In23tT0H DKKSfHUlsUpzEE9SesnGfrIxbNa+rr4ZT65UrqymkHqxBxa5ixWsWvRzhb3Bc6hN2Rm6 aWkW3mA901dMtRElBRf3D7GCjdCA/P/6jKnvY= Received: by 10.42.243.138 with SMTP id lm10mr32722689icb.6.1318406410318; Wed, 12 Oct 2011 01:00:10 -0700 (PDT) Received: from [192.168.1.144] (173-17-186-90.client.mchsi.com. [173.17.186.90]) by mx.google.com with ESMTPS id fy35sm3547300ibb.4.2011.10.12.01.00.08 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 12 Oct 2011 01:00:09 -0700 (PDT) References: In-Reply-To: Mime-Version: 1.0 (iPhone Mail 8C148) Content-Transfer-Encoding: 7bit Content-Type: multipart/alternative; boundary=Apple-Mail-1-796789426 Message-Id: <51728706-62CE-4D2E-8B64-6DCCB1246C3F@gmail.com> Cc: "mapreduce-user@hadoop.apache.org" X-Mailer: iPhone Mail (8C148) From: Arko Provo Mukherjee Subject: Re: Iterative MR issue Date: Wed, 12 Oct 2011 03:00:03 -0500 To: Arko Provo Mukherjee X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail-1-796789426 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi, I solved it by creating a new JobConf instance for each iteration in the loo= p. Thanks & regards Arko On Oct 12, 2011, at 1:54 AM, Arko Provo Mukherjee wrote: > Hello Everyone, >=20 > I have a particular situation, where I am trying to run Iterative Map-Redu= ce, where the output files for one iteration are the input files for the nex= t.=20 > It stops when there are no new files created in the output. >=20 > Code Snippet: >=20 > int round =3D 0; > JobConf jobconf =3D new JobConf(new Configuration(), MyClass.class); >=20 > do { >=20 > String old_path =3D "path_" + Integer.toString(round); >=20 > round =3D round + 1; >=20 > String new_path =3D "path" + Integer.toString(round); >=20 > FileInputFormat.addInputPath ( jobconf, new Path (old_file) ); =20 >=20 > FileInputFormat.setInputPath ( jobconf, new Path (new_file) ); // These w= ill eventually become directories containing multiple files >=20 > jobconf.setMapperClass(MyMap.class); >=20 > jobconf.setReducerClass(MyReduce.class); >=20 > // Other code >=20 > JobClient.runJob(jobconf); >=20 > FileStatus[] directory =3D fs.listStatus ( new Path ( new_file ) ); // To= check for any new files in the output directory >=20 > } while ( directory.length !=3D 0 ); // Stop iteration only when no new f= iles are generated in the output path >=20 >=20 >=20 > The code runs smoothly in the first round and I can see the new directory p= ath_1 getting created and files added in it from the Reducer output.=20 >=20 > The original path_0 is created from before by me and I have added relevant= files in it.=20 >=20 > The output files seems to have the correct data as per my Map/Reduce logic= . >=20 > However, in the second round it fails with the following exception. >=20 > In 0.19 (In a cloud system - Fully Distributed Mode) >=20 > java.lang.IllegalArgumentException: Wrong FS: hdfs://cloud_hostname:9000/h= adoop/tmp/hadoop/mapred/system/job_201106271322_9494/job.jar, expected: file= :/// >=20 > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:322) >=20 >=20 >=20 > In 0.20.203 (my own system and not a cloud - Pseudo Distributed Mode) >=20 > 11/10/12 00:35:42 INFO mapred.JobClient: Cleaning up the staging area hdfs= ://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_2= 01110120017_0002 >=20 > Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: h= dfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/jo= b_201110120017_0001/job.jar, expected: file:/// > at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354) >=20 > It seems that Hadoop is not being able to delete the staging file for the j= ob. >=20 > Can you please suggest any reason for this? Please help! >=20 > Thanks a lot in advance! >=20 > Warm regards > Arko >=20 >=20 >=20 >=20 >=20 --Apple-Mail-1-796789426 Content-Transfer-Encoding: 7bit Content-Type: text/html; charset=utf-8
Hi,

I solved it by creating a new JobConf instance for each iteration in the loop.

Thanks & regards
Arko

On Oct 12, 2011, at 1:54 AM, Arko Provo Mukherjee <arkoprovomukherjee@gmail.com> wrote:

Hello Everyone,

I have a particular situation, where I am trying to run Iterative Map-Reduce, where the output files for one iteration are the input files for the next. 
It stops when there are no new files created in the output.

Code Snippet:

int round = 0;

JobConf jobconf = new JobConf(new Configuration(), MyClass.class);

do  {

String old_path = "path_" + Integer.toString(round);

round = round + 1;

String new_path = "path" + Integer.toString(round);

FileInputFormat.addInputPath ( jobconf, new Path (old_file) );  

FileInputFormat.setInputPath ( jobconf, new Path (new_file) );   // These will eventually become directories containing multiple files

jobconf.setMapperClass(MyMap.class);

jobconf.setReducerClass(MyReduce.class);

// Other code

JobClient.runJob(jobconf);

FileStatus[] directory = fs.listStatus ( new Path ( new_file ) );  // To check for any new files in the output directory

} while ( directory.length != 0 );  // Stop iteration only when no new files are generated in the output path


The code runs smoothly in the first round and I can see the new directory path_1 getting created and files added in it from the Reducer output. 

The original path_0 is created from before by me and I have added relevant files in it. 

The output files seems to have the correct data as per my Map/Reduce logic.

However, in the second round it fails with the following exception.

In 0.19 (In a cloud system - Fully Distributed Mode)

java.lang.IllegalArgumentException: Wrong FS: hdfs://cloud_hostname:9000/hadoop/tmp/hadoop/mapred/system/job_201106271322_9494/job.jar, expected: file:///

at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:322)


In 0.20.203 (my own system and not a cloud - Pseudo Distributed Mode)

11/10/12 00:35:42 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0002

Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: hdfs://localhost:54310/hadoop-0.20.203.0/HDFS/mapred/staging/arko/.staging/job_201110120017_0001/job.jar, expected: file:///
at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:354)

It seems that Hadoop is not being able to delete the staging file for the job.

Can you please suggest any reason for this? Please help!

Thanks a lot in advance!

Warm regards
Arko



--Apple-Mail-1-796789426--