Return-Path: Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: (qmail 87233 invoked from network); 23 Nov 2010 16:18:24 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 23 Nov 2010 16:18:24 -0000 Received: (qmail 44941 invoked by uid 500); 23 Nov 2010 16:18:55 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 44714 invoked by uid 500); 23 Nov 2010 16:18:53 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 44705 invoked by uid 99); 23 Nov 2010 16:18:52 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 16:18:52 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of praveen.peddi@nokia.com designates 192.100.105.134 as permitted sender) Received: from [192.100.105.134] (HELO mgw-mx09.nokia.com) (192.100.105.134) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 23 Nov 2010 16:18:44 +0000 Received: from vaebh106.NOE.Nokia.com (vaebh106.europe.nokia.com [10.160.244.32]) by mgw-mx09.nokia.com (Switch-3.3.3/Switch-3.3.3) with ESMTP id oANGI3eR016890 for ; Tue, 23 Nov 2010 10:18:22 -0600 Received: from vaebh102.NOE.Nokia.com ([10.160.244.23]) by vaebh106.NOE.Nokia.com with Microsoft SMTPSVC(6.0.3790.4675); Tue, 23 Nov 2010 18:18:14 +0200 Received: from smtp.mgd.nokia.com ([65.54.30.5]) by vaebh102.NOE.Nokia.com over TLS secured channel with Microsoft SMTPSVC(6.0.3790.4675); Tue, 23 Nov 2010 18:18:10 +0200 Received: from NOK-EUMSG-02.mgdnok.nokia.com ([65.54.30.87]) by nok-am1mhub-01.mgdnok.nokia.com ([65.54.30.5]) with mapi; Tue, 23 Nov 2010 17:18:09 +0100 From: To: Date: Tue, 23 Nov 2010 17:18:08 +0100 Subject: RE: Starting a Hadoop job programtically Thread-Topic: Starting a Hadoop job programtically Thread-Index: AcuK5+SvwWrZjAS4QmSpR3iPqgcQxAAPFdjA Message-ID: References: <1290463639.32160.28.camel@expat> <1290500627.3935.7.camel@linux-elo4.site> In-Reply-To: <1290500627.3935.7.camel@linux-elo4.site> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_" MIME-Version: 1.0 X-OriginalArrivalTime: 23 Nov 2010 16:18:10.0150 (UTC) FILETIME=[0526B060:01CB8B2A] X-Nokia-AV: Clean X-Virus-Checked: Checked by ClamAV on apache.org --_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Henning, adding hadoop's conf folder didn't help fixing the issue but when I added t= he two below properties, I was able to access file system but cannot write = anything due to different user. I have following questions based on experim= ents. 1. How can I access HDFS or submit jobs as different user than my java app = is running. For example, Hadoop cluster is setup for "hadoop" user and my j= ava app is runnign as different user. In order to run the job correctly, I = have to submit it as "hadoop" user. correct? How to achive it programitcall= y? 2. Few of the jobs I am calling is provided by the library which means I ca= nnot add these two config properties myself. Is there any way around this o= ther than replicating the job submission code from the library to locally? Thanks Praveen ________________________________ From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de] Sent: Tuesday, November 23, 2010 3:24 AM To: mapreduce-user@hadoop.apache.org Subject: RE: Starting a Hadoop job programtically Hi Praveen, in order to submit it to the cluster, you just need to have a core-site.x= ml on your classpath (or load it explicitly into your configuration object)= that looks (at least) like this fs.default.name hdfs://${name:port of namenode} mapred.job.tracker ${name:port of jobtracker} If you want to wait for each job's completion, you can use job.waitForCompl= etion(true) rather than job.submit(). Good luck, henning On Mon, 2010-11-22 at 23:40 +0100, praveen.peddi@nokia.com wrote: Hi Thanks for your reply. In my case I have a Driver that calls multiple jo= bs one after the other. I am using the following code to submit each job bu= t it uses local hadoop jar files that is in the classpath. Its not submitti= ng the job to Hadoop cluster. I thought I would need to specify where the m= aster Hadoop is located on remote machine. Example command I use from comma= nd line is as follows but I need to do it from my Java program. $ hadoop-0.20.2/bin/hadoop jar /home/ppeddi/dev/Merchandising/RelevancyEngi= ne/relevancy-core/dist/Relevancy4.jar -i raw-downloads-input-10K -o reco-pa= tterns-output-10K-1S -k 100 -method mapreduce -g 500 -regex '[\ ]' -s 5 I hope I made the question clear now. Praveen ________________________________ From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de] Sent: Monday, November 22, 2010 5:07 PM To: mapreduce-user@hadoop.apache.org Subject: Re: Starting a Hadoop job programtically Hi Praveen, we do. We are using the "new" org.apache.hadoop.mapreduce.* API in Hadoop= 0.20.2. Essentially the flow is: //---- // assuming all config is on the class path Configuration config =3D new Configuration(); Job job =3D new Job(config, "some job name"); // set in/out types job.setInputFormatClass(...); job.setOutputFormatClass(...); job.setMapOutputKeyClass(...); job.setMapOutputValueClass(...); job.setOutputKeyClass(...); job.setOutputValueClass(...); // set implementations as required job.setMapperClass(); job.setCombinerClass(); job.setReducerClass(); // set the jar... this is often the tricky part! job.setJarByClass(); job.submit(); //---- Hope I didn't forget anything. Note: You need to give Hadoop something it can launch in a JVM that has no = more but the hadoop jars and whatever else you configured statically in your hadoop-env.sh script. Can you describe your scenario in more detail? Henning Am Montag, den 22.11.2010, 22:39 +0100 schrieb praveen.peddi@nokia.com: Hi all, I am trying to figure how I can start a hadoop job porgramatically from my = Java application running in an app server. I was able to run my map reduce = job using hadoop command from hadoop master machine but my goal is to run t= he same job from my java program (running on a different machine than maste= r). I googled and could not find solution for this. All the examples I have= seen so far are using hadoop from command line to start a job. 1. Has anyone called Hadoop job invocation from a Java application? 2. If so, could someone provide some sample code. 3. Thanks Praveen Henning Blohm ZFabrik Software KG henning.blohm@zfabrik.de www.z2-environment.eu --_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
Hi Henning,
adding hadoop's conf folder didn't help fixing the is= sue but=20 when I added the two below properties, I was able to access file system but= =20 cannot write anything due to different user. I have following questions bas= ed on=20 experiments.
 
1. How can I access HDFS or submit jobs as different = user than=20 my java app is running. For example, Hadoop cluster is setup for "hadoop" u= ser=20 and my java app is runnign as different user. In order to run the job corre= ctly,=20 I have to submit it as "hadoop" user. correct? How to achive it=20 programitcally?
2. Few of the jobs I am calling is provided by the li= brary=20 which means I cannot add these two config properties myself. Is there any w= ay=20 around this other than replicating the job submission code from the library= to=20 locally?
 
Thanks
= Praveen

From: ext Henning Blohm=20 [mailto:henning.blohm@zfabrik.de]
Sent: Tuesday, November 23, 20= 10=20 3:24 AM
To: mapreduce-user@hadoop.apache.org
Subject: R= E:=20 Starting a Hadoop job programtically

Hi Praveen,

  in order to submit it to the cluster, = you=20 just need to have a core-site.xml on your classpath (or load it explicitly = into=20 your configuration object) that looks (at least) like=20 this

<configuration>
<property>
<name>fs.def= ault.name</name>
<value>hdfs://${name:port=20 of=20 namenode}</value>
</property>

<property>
<= ;name>mapred.job.tracker</name>
<value>${name:port=20 of jobtracker}</value>
</property>=20
</configuration>

If you want to wait for each job's comple= tion,=20 you can use job.waitForCompletion(true) rather than job.submit().

Go= od=20 luck,
  henning


On Mon, 2010-11-22 at 23:40 +0100,=20 praveen.peddi@nokia.com wrote:=20
Hi Thanks fo= r your=20 reply. In my case I have a Driver that calls multiple jobs one after the= =20 other. I am using the following code to submit each job but it uses local= =20 hadoop jar files that is in the classpath. Its not submitting the job to= =20 Hadoop cluster. I thought I would need to specify where the master Hadoop= is=20 located on remote machine. Example command I use from command line is as= =20 follows but I need to do it from my Java program.
$ hadoop-0.20.2/bin/hadoop jar=20 /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Releva= ncy4.jar=20 -i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method= =20 mapreduce -g 500 -regex '[\ ]' -s 5


I hope I mad= e the=20 question clear now.
Praveen



From: ext Henning Blohm=20 [mailto:henning.blohm@zfabrik.de]
Sent: Monday, November 22, 2010 5:07=20 PM
To:=20 mapreduce-user@hadoop.apache.org
Subject: Re: Starting a Hadoop job=20 programtically



Hi Praveen,

  we do. We are using the= =20 "new" org.apache.hadoop.mapreduce.* API in Hadoop 0.20.2.

 =20 Essentially the flow is:

  //----
  // assuming all c= onfig=20 is on the class path
  Configuration config =3D new Configuration= ();=20
  Job job =3D new Job(config, "some job name");

  //= set=20 in/out types
  job.setInputFormatClass(...);
 =20 job.setOutputFormatClass(...);
 =20 job.setMapOutputKeyClass(...);
 =20 job.setMapOutputValueClass(...);
 =20 job.setOutputKeyClass(...);
 =20 job.setOutputValueClass(...);

  // set implementations as=20 required
  job.setMapperClass(<your mapper implementation clas= s=20 object>);
  job.setCombinerClass(<your combiner implementat= ion=20 class object>);
  job.setReducerClass(<your reducer=20 implementation class object>);

  // set the jar... this is= =20 often the tricky part!
  job.setJarByClass(<some class that is= in=20 the job jar and not elsewhere higher up on the class path>);

&n= bsp;=20 job.submit();
  //----

Hope I didn't forget anything. = ;=20

Note: You need to give Hadoop something it can launch in a JVM th= at=20 has no more but the hadoop jars and whatever else you
configured stati= cally=20 in your hadoop-env.sh script.

Can you describe your scenario in mo= re=20 detail?

Henning


Am Montag, den 22.11.2010, 22:39 +0100= =20 schrieb praveen.peddi@nokia.com:
Hi all,
I am=20 trying to figure how I can start a hadoop job porgramatically from my J= ava=20 application running in an app server. I was able to run my map reduce j= ob=20 using hadoop command from hadoop master machine but my goal is to run t= he=20 same job from my java program (running on a different machine than mast= er).=20 I googled and could not find solution for this. All the examples I have= seen=20 so far are using hadoop from command line to start a job.
1. Has anyone called Hadoop job invocation from a Java=20 application?
2. If so, could someone provide = some=20 sample code.
3.
Thanks
Praveen

Henning Blohm

ZFabrik Software=20 KG

henning.blohm@zfabri= k.de
www.z2-environment.eu




--_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_--