Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 758EE983E for ; Wed, 21 Sep 2011 16:57:24 +0000 (UTC) Received: (qmail 19925 invoked by uid 500); 21 Sep 2011 16:57:23 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 19878 invoked by uid 500); 21 Sep 2011 16:57:23 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 19865 invoked by uid 99); 21 Sep 2011 16:57:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 16:57:23 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of swathiv@zinniasystems.com designates 209.85.213.48 as permitted sender) Received: from [209.85.213.48] (HELO mail-yw0-f48.google.com) (209.85.213.48) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Sep 2011 16:57:17 +0000 Received: by ywb3 with SMTP id 3so1914388ywb.35 for ; Wed, 21 Sep 2011 09:56:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.236.9.101 with SMTP id 65mr7095303yhs.11.1316624214872; Wed, 21 Sep 2011 09:56:54 -0700 (PDT) Received: by 10.236.106.68 with HTTP; Wed, 21 Sep 2011 09:56:54 -0700 (PDT) In-Reply-To: References: Date: Wed, 21 Sep 2011 22:26:54 +0530 Message-ID: Subject: Re: How do I set the intermediate output path when I use 2 mapreduce jobs? From: Swathi V To: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf303bfe04a1508a04ad7676db --20cf303bfe04a1508a04ad7676db Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, This code might help you //JobDependancies.java snippet Configuration conf =3D new Configuration(); Job job1 =3D new Job(conf, "job1"); job1.setJarByClass(JobDependancies.class); job1.setMapperClass(WordMapper.class); job1.setReducerClass(WordReducer.class); job1.setOutputKeyClass(Text.class); job1.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job1, new Path(args[0])); String out=3Dargs[1]+System.nanoTime(); FileOutputFormat.setOutputPath(job1, new Path(out)); Configuration conf2 =3D new Configuration(); Job job2 =3D new Job(conf2, "job2"); job2.setJarByClass(JobDependancies.class); job2.setOutputKeyClass(IntWritable.class); job2.setOutputValueClass(Text.class); job2.setMapperClass(SortWordMapper.class); job2.setReducerClass(Reducer.class); FileInputFormat.addInputPath(job2, new Path(out+"/part-r-00000")); FileOutputFormat.setOutputPath(job2, new Path(args[1])); ControlledJob controlledJob1 =3D new ControlledJob(job1.getConfiguration()); ControlledJob controlledJob2 =3D new ControlledJob(job2.getConfiguration()); controlledJob2.addDependingJob(controlledJob1); JobControl jobControl=3D new JobControl("control"); jobControl.addJob(controlledJob1); jobControl.addJob(controlledJob2); Thread thread =3D new Thread(jobControl); thread.start(); while(!jobControl.allFinished()) { try { Thread.sleep(10000); } catch (InterruptedException e) { // TODO Auto-generated catch block e.printStackTrace(); } } jobControl.stop(); } } wordcount output =3D> job1 is given to sort=3D> job2 Irrespective of mappers and reducers, above mentioned is the way to handle many jobs. 2011/9/21 =E8=B0=AD=E5=86=9B > Hi, > I want to use 2 MR jobs sequentially. > And the first job produces intermediate result to a temp file. > The second job reads the result in temp file but not the FileInputPath. > I tried, but FileNotFoundException reported. > Then I checked the datanodes, temp file was created. > The first job was executed correctly. > Why the second job cannot find the file? The file was created before the > second job was executed. > Thanks! > > -- > > Regards! > > Jun Tan > > > --=20 Regards, Swathi.V. --20cf303bfe04a1508a04ad7676db Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi,

This code might help you
//JobDependancies.java snippet
Configuration conf =3D new Configuration();
=C2=A0=C2=A0 Job job1 =3D new Job(conf, "= ;job1");
=C2=A0=C2=A0 job1.setJarByClass(JobDependanci= es.class);
=C2=A0=C2=A0 job1.setMapperClass(WordMapper.c= lass);
=C2=A0=C2=A0 job1.setReducerClass(WordReducer= .class);
=C2=A0=C2=A0 job1.setOutputKeyClass(Text.clas= s);
=C2=A0=C2=A0 job1.setOutputValueClass(IntWrit= able.class);
=C2=A0=C2=A0 FileInputFormat.addInputPath(job= 1, new Path(args[0]));
=C2=A0=C2=A0 String out=3Dargs[1]+System.nano= Time();
=C2=A0=C2=A0 FileOutputFormat.setOutputPath(j= ob1, new Path(out));
=C2=A0=C2=A0
=C2=A0=C2=A0
=C2=A0=C2=A0
=C2=A0=C2=A0 Configuration conf2 =3D new Conf= iguration();
=C2=A0=C2=A0 Job job2 =C2=A0=3D new Job(conf2= , "job2");
=C2=A0=C2=A0 job2.setJarByClass(JobDependanci= es.class);
=C2=A0=C2=A0 job2.setOutputKeyClass(IntWritab= le.class);
=C2=A0=C2=A0 job2.setOutputValueClass(Text.cl= ass);
=C2=A0=C2=A0 job2.setMapperClass(SortWordMapp= er.class);
=C2=A0=C2=A0 job2.setReducerClass(Reducer.cla= ss);
=C2=A0=C2=A0 FileInputFormat.addInputPath(job= 2, new Path(out+"/part-r-00000"));
=C2=A0=C2=A0 FileOutputFormat.setOutputPath(j= ob2, new Path(args[1]));
=C2=A0=C2=A0
=C2=A0=C2=A0 ControlledJob controlledJob1 =3D= new ControlledJob(job1.getConfiguration());
=C2=A0=C2=A0 ControlledJob controlledJob2 =3D= new ControlledJob(job2.getConfiguration());
=C2=A0=C2=A0 controlledJob2.addDependingJob(c= ontrolledJob1);
=C2=A0=C2=A0 JobControl jobControl=3D new Job= Control("control");
=C2=A0=C2=A0
=C2=A0=C2=A0 jobControl.addJob(controlledJob1= );
=C2=A0=C2=A0 jobControl.addJob(controlledJob2= );
=C2=A0=C2=A0
=C2=A0=C2=A0 Thread thread =3D new Thread(job= Control);
=C2=A0=C2=A0 thread.start();
=C2=A0=C2=A0 while(!jobControl.allFinished())=
=C2=A0=C2=A0 {
=C2=A0=C2=A0 try {
=C2=A0=C2=A0 Thread.sleep(10000);
=C2=A0=C2=A0 } catch (InterruptedException e) {
=C2=A0=C2=A0 // TODO Auto-generated catch block
=C2=A0=C2=A0 e.printStackTrace();
=C2=A0=C2=A0 }
=C2=A0=C2=A0 }
=C2=A0=C2=A0 jobControl.stop();
=C2=A0=C2=A0=C2=A0}
}


wordcount output =3D> job1 is given to sort=3D> job2 <= br>Irrespective of mappers and reducers, above mentioned is the way to hand= le many jobs.

2011/9/21 =E8=B0=AD=E5=86= =9B <tanjun_252= 5@163.com>
Hi,
I want to use 2 MR jobs sequentially.
And the first job produces intermediate result to a temp file.
The second job reads the result in temp file but not the FileInputPath= .
I tried, but FileNotFoundException reported.
Then I checked the datanodes, temp file was created.
The first job was executed correctly.
Why the second job cannot find the file? The file was created before t= he second job was executed.
Thanks!

--

Regards!

Jun Tan






--=
Regards,
Swathi.V.
<= br>
--20cf303bfe04a1508a04ad7676db--