Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1F49310C10 for ; Mon, 4 Nov 2013 14:03:02 +0000 (UTC) Received: (qmail 18217 invoked by uid 500); 4 Nov 2013 14:02:47 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 18073 invoked by uid 500); 4 Nov 2013 14:02:41 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 18065 invoked by uid 99); 4 Nov 2013 14:02:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Nov 2013 14:02:40 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of prvs=013637bae=Ananda.Murugan@honeywell.com designates 199.64.220.26 as permitted sender) Received: from [199.64.220.26] (HELO az18ip008.honeywell.com) (199.64.220.26) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Nov 2013 14:02:33 +0000 Received-SPF: Neutral (az18ip008.honeywell.com: domain of Ananda.Murugan@honeywell.com does not assert whether or not 10.192.44.102 is permitted sender) identity=mailfrom; client-ip=10.192.44.102; receiver=az18ip008.honeywell.com; envelope-from="Ananda.Murugan@honeywell.com"; x-sender="Ananda.Murugan@honeywell.com"; x-conformance=spf_only; x-record-type="v=spf1" Received-SPF: None (az18ip008.honeywell.com: no sender authenticity information available from domain of postmaster@az18ex5005.global.ds.honeywell.com) identity=helo; client-ip=10.192.44.102; receiver=az18ip008.honeywell.com; envelope-from="Ananda.Murugan@honeywell.com"; x-sender="postmaster@az18ex5005.global.ds.honeywell.com"; x-conformance=spf_only X-SBRS: None X-SenderGroup: Relay_to_Internet X-MailFlowPolicy: $Relay X-Attachment_Filename: X-Attachment_Filesize: None X-IronPort-AV: E=Sophos;i="4.93,632,1378882800"; d="scan'208,217";a="70025109" Received: from unknown (HELO az18ex5005.global.ds.honeywell.com) ([10.192.44.102]) by az18ip008.honeywell.com with ESMTP; 04 Nov 2013 07:02:08 -0700 Received: from IE1AEX5001.global.ds.honeywell.com (199.63.219.241) by az18ex5005.global.ds.honeywell.com (10.192.24.136) with Microsoft SMTP Server (TLS) id 14.2.309.2; Mon, 4 Nov 2013 07:02:00 -0700 Received: from IE1AEX3007.global.ds.honeywell.com ([169.254.11.209]) by IE1AEX5001.global.ds.honeywell.com ([199.63.219.241]) with mapi id 14.02.0309.002; Mon, 4 Nov 2013 19:32:00 +0530 From: "Chandra Mohan, Ananda Vel Murugan" To: "user@hadoop.apache.org" Subject: Running map reduce programmatically is unusually slow Thread-Topic: Running map reduce programmatically is unusually slow Thread-Index: Ac7ZZhHskVBlSfLyRlCUz/1GQ9/uuA== Date: Mon, 4 Nov 2013 14:01:59 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [199.63.219.250] Content-Type: multipart/alternative; boundary="_000_E5CD9C0EEBC5954A95554ACE79F770BB894176IE1AEX3007globald_" MIME-Version: 1.0 X-CFilter-Loop: Forwarded X-Virus-Checked: Checked by ClamAV on apache.org --_000_E5CD9C0EEBC5954A95554ACE79F770BB894176IE1AEX3007globald_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi, I have written a small utility to run map reduce job programmatically. My a= im is to run my map reduce job without using hadoop shell script. I am plan= ning to call this utility from another application. Following is the code which runs the map reduce job. I have bundled this ja= va class into a jar (remotemr.jar ). I have the actual map reduce job bundl= ed inside another jar (mapreduce.jar) import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.SequenceFileInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; import org.apache.hadoop.mapred.jobcontrol.Job; import org.apache.hadoop.mapred.jobcontrol.JobControl; public class RemoteMapreduce { public static void main(String[] args) throws IOException, Interrupt= edException, ClassNotFoundException { String inputPath =3D args[0]; String outputPath =3D args[1]; String specFilePath=3Dargs[2]; Configuration config =3D new Configuration(); config.addResource(new Path("/opt/hadoop-1.0.2/bin/core-site.= xml")); config.addResource(new Path("/opt/hadoop-1.0.2/bin/hdfs-site.= xml")); JobConf jobConf =3D new JobConf(config); jobConf.set("hadoop.tmp.dir ", "/tmp/hadoop-ananda/"); jobConf.setJar("/home/ananda/mapreduce.jar"); jobConf.setMapperClass(Myjob.MapClass.class); SequenceFileInputFormat.setInputPaths(jobConf, new Path(input= Path)); TextOutputFormat.setOutputPath(jobConf, new Path(outputPath))= ; jobConf.setMapOutputKeyClass(Text.class); jobConf.setMapOutputValueClass(Text.class); jobConf.setInputFormat(SequenceFileInputFormat.class); jobConf.setOutputFormat(TextOutputFormat.class); jobConf.setOutputKeyClass(Text.class); jobConf.setOutputValueClass(Text.class); jobConf.set("specPath", specFilePath); jobConf.setUser("ananda"); Job job1 =3D new Job(jobConf); JobClient jc =3D new JobClient(jobConf); jc.submitJob(jobConf); /* JobControl ctrl =3D new JobControl("dar"); ctrl.addJob(job1); ctrl.run();*/ System.out.println("Job launched!"); } } I am running it as follows java -cp :/home/ananda/mapreduce.jar:/= home/Ananda/remotemr.jar RemoteMapreduce It runs without any error. But it takes longer time than what it takes when= I run it using hadoop shell script. One more thing is all the three input = paths needs to be fully qualified HDFS paths i.e. hdfs://:/= . If I give partial paths as in hadoop shell script, I am getting inp= ut path not found errors. Am I doing anything wrong? Please help. Thanks Regards, Anand.C --_000_E5CD9C0EEBC5954A95554ACE79F770BB894176IE1AEX3007globald_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi,

 

I have written a small utility to run map reduce job= programmatically. My aim is to run my map reduce job without using hadoop = shell script. I am planning to call this utility from another application.

 

Following is the code which runs the map reduce job.= I have bundled this java class into a jar (remotemr.jar ). I have the actu= al map reduce job bundled inside another jar (mapreduce.jar)

 

import java.io.IOExcepti= on;=

 

import org.apache.hadoop= .conf.Configuration;

import org.apache.hadoop= .fs.Path;=

import org.apache.hadoop= .io.Text;=

import org.apache.hadoop= .mapred.JobClient;

import org.apache.hadoop= .mapred.JobConf;

import org.apache.hadoop= .mapred.SequenceFileInputFormat;

import org.apache.hadoop= .mapred.TextOutputFormat;

import org.apache.hadoop= .mapred.jobcontrol.Job;

import org.apache.hadoop.mapred.jobcontrol.JobControl;

 

 

public class RemoteMapreduce {

 

     &= nbsp; public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {=

     &= nbsp;       

     &= nbsp;         String inputPath =3D = args[0];<= /o:p>

     &= nbsp;        String outputPath =3D args[= 1];=

     &= nbsp;        String specFilePath=3Dargs[= 2];=

     &= nbsp;        Configuration config =3D new Configuration();

     &= nbsp;        config.addResource(<= b>new Path("/opt/hadoop-1.0.2/bin/core-site.xml"));

     &= nbsp;        config.addResource(<= b>new Path("/opt/hadoop-1.0.2/bin/hdfs-site.xml"));

     &= nbsp;        JobConf jobConf =3D new JobConf(config);

     &= nbsp;        jobConf.set("hadoop.tm= p.dir ", = "/tmp/hadoop-ananda/");

     &= nbsp;        jobConf.setJar("/home/= ananda/mapreduce.jar");

     &= nbsp;        jobConf.setMapperClass(Myjo= b.MapClass.class);

     &= nbsp;        SequenceFileInputFormat.= setInputPaths(jobConf, new Path(inputPath));

     &= nbsp;        TextOutputFormat.setOutp= utPath(jobConf, new Path(outputPath));

     &= nbsp;        jobConf.setMapOutputKeyClas= s(Text.class);

     &= nbsp;        jobConf.setMapOutputValueCl= ass(Text.class);

     &= nbsp;        jobConf.setInputFormat(Sequ= enceFileInputFormat.class);

     &= nbsp;        jobConf.setOutputFormat(Tex= tOutputFormat.class);

     &= nbsp;        jobConf.setOutputKeyClass(T= ext.class);

     &= nbsp;        jobConf.setOutputValueClass= (Text.class);

     &= nbsp;        jobConf.set("specPath&= quot;, specFilePath);

     &= nbsp;        jobConf.setUser("anand= a");

     &= nbsp;        Job job1 =3D new Job(jobConf);

     &= nbsp;        JobClient jc =3D new JobClient(jobConf);

     &= nbsp;        jc.submitJob(jobConf);<= /p>

     &= nbsp;        = /* JobControl ctrl =3D new JobControl("dar");

     = ;         ctrl.addJob(job1);=

     = ;         ctrl.run();*/

     &= nbsp;       

     &= nbsp;        System.out.println(&qu= ot;Job launched!");   

     &= nbsp;         

     &= nbsp; }

}

 

 

I am running it as follows

 

java –cp  <all hadoop jars needed for = the job>:/home/ananda/mapreduce.jar:/home/Ananda/remotemr.jar  RemoteMapredu= ce <inputpath> <outputpath> <specpath><= /p>

 

It runs without any error. But it takes longer time = than what it takes when I run it using hadoop shell script. One more thing = is all the three input paths needs to be fully qualified HDFS paths i.e. hd= fs://<hostname>:<port>/<path>. If I give partial paths as in hadoop shell script, I am getting input path no= t found errors. Am I doing anything wrong? Please help. Thanks

 

Regards,

Anand.C

--_000_E5CD9C0EEBC5954A95554ACE79F770BB894176IE1AEX3007globald_--