Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of praveen.peddi@nokia.com
 designates 192.100.105.134 as permitted sender)
From: <praveen.peddi@nokia.com>
To: <mapreduce-user@hadoop.apache.org>
Date: Tue, 23 Nov 2010 17:18:08 +0100
Subject: RE: Starting a Hadoop job programtically
Thread-Topic: Starting a Hadoop job programtically
Thread-Index: AcuK5+SvwWrZjAS4QmSpR3iPqgcQxAAPFdjA
Message-ID: 
 <C49C624DE9A7B048BA6120569B28B4DD388418922D@NOK-EUMSG-02.mgdnok.nokia.com>
References: 
 <C49C624DE9A7B048BA6120569B28B4DD3884188EDE@NOK-EUMSG-02.mgdnok.nokia.com>
	 <1290463639.32160.28.camel@expat>
	 <C49C624DE9A7B048BA6120569B28B4DD3884188F06@NOK-EUMSG-02.mgdnok.nokia.com>
 <1290500627.3935.7.camel@linux-elo4.site>
In-Reply-To: <1290500627.3935.7.camel@linux-elo4.site>
Accept-Language: en-US
Content-Language: en-US
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_"
MIME-Version: 1.0

--_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

Hi Henning,
adding hadoop's conf folder didn't help fixing the issue but when I added t=
he two below properties, I was able to access file system but cannot write =
anything due to different user. I have following questions based on experim=
ents.

1. How can I access HDFS or submit jobs as different user than my java app =
is running. For example, Hadoop cluster is setup for "hadoop" user and my j=
ava app is runnign as different user. In order to run the job correctly, I =
have to submit it as "hadoop" user. correct? How to achive it programitcall=
y?
2. Few of the jobs I am calling is provided by the library which means I ca=
nnot add these two config properties myself. Is there any way around this o=
ther than replicating the job submission code from the library to locally?

Thanks
Praveen
________________________________
From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de]
Sent: Tuesday, November 23, 2010 3:24 AM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Starting a Hadoop job programtically

Hi Praveen,

  in order to submit it to the cluster, you just need to have a core-site.x=
ml on your classpath (or load it explicitly into your configuration object)=
 that looks (at least) like this

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://${name:port of namenode}</value>
</property>

<property>
<name>mapred.job.tracker</name>
<value>${name:port of jobtracker}</value>
</property>
</configuration>

If you want to wait for each job's completion, you can use job.waitForCompl=
etion(true) rather than job.submit().

Good luck,
  henning


On Mon, 2010-11-22 at 23:40 +0100, praveen.peddi@nokia.com wrote:
Hi Thanks for your reply. In my case I have a Driver that calls multiple jo=
bs one after the other. I am using the following code to submit each job bu=
t it uses local hadoop jar files that is in the classpath. Its not submitti=
ng the job to Hadoop cluster. I thought I would need to specify where the m=
aster Hadoop is located on remote machine. Example command I use from comma=
nd line is as follows but I need to do it from my Java program.
$ hadoop-0.20.2/bin/hadoop jar /home/ppeddi/dev/Merchandising/RelevancyEngi=
ne/relevancy-core/dist/Relevancy4.jar -i raw-downloads-input-10K -o reco-pa=
tterns-output-10K-1S -k 100 -method mapreduce -g 500 -regex '[\ ]' -s 5


I hope I made the question clear now.
Praveen

________________________________

From: ext Henning Blohm [mailto:henning.blohm@zfabrik.de]
Sent: Monday, November 22, 2010 5:07 PM
To: mapreduce-user@hadoop.apache.org
Subject: Re: Starting a Hadoop job programtically


Hi Praveen,

  we do. We are using the "new" org.apache.hadoop.mapreduce.* API in Hadoop=
 0.20.2.

  Essentially the flow is:

  //----
  // assuming all config is on the class path
  Configuration config =3D new Configuration();
  Job job =3D new Job(config, "some job name");

  // set in/out types
  job.setInputFormatClass(...);
  job.setOutputFormatClass(...);
  job.setMapOutputKeyClass(...);
  job.setMapOutputValueClass(...);
  job.setOutputKeyClass(...);
  job.setOutputValueClass(...);

  // set implementations as required
  job.setMapperClass(<your mapper implementation class object>);
  job.setCombinerClass(<your combiner implementation class object>);
  job.setReducerClass(<your reducer implementation class object>);

  // set the jar... this is often the tricky part!
  job.setJarByClass(<some class that is in the job jar and not elsewhere hi=
gher up on the class path>);

  job.submit();
  //----

Hope I didn't forget anything.

Note: You need to give Hadoop something it can launch in a JVM that has no =
more but the hadoop jars and whatever else you
configured statically in your hadoop-env.sh script.

Can you describe your scenario in more detail?

Henning


Am Montag, den 22.11.2010, 22:39 +0100 schrieb praveen.peddi@nokia.com:
Hi all,
I am trying to figure how I can start a hadoop job porgramatically from my =
Java application running in an app server. I was able to run my map reduce =
job using hadoop command from hadoop master machine but my goal is to run t=
he same job from my java program (running on a different machine than maste=
r). I googled and could not find solution for this. All the examples I have=
 seen so far are using hadoop from command line to start a job.
1. Has anyone called Hadoop job invocation from a Java application?
2. If so, could someone provide some sample code.
3.
Thanks
Praveen

Henning Blohm

ZFabrik Software KG

henning.blohm@zfabrik.de<mailto:henning.blohm@zfabrik.de>
www.z2-environment.eu<http://www.z2-environment.eu>


--_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content=3D"text/html; charset=3Dus-ascii" http-equiv=3DContent-Type>
<META name=3DGENERATOR content=3D"MSHTML 8.00.6001.18702"></HEAD>
<BODY>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D763453615-23112010><FONT color=3D=
#0000ff=20
size=3D2 face=3DArial>Hi Henning,</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D763453615-23112010><FONT color=3D=
#0000ff=20
size=3D2 face=3DArial>adding hadoop's conf folder didn't help fixing the is=
sue but=20
when I added the two below properties, I was able to access file system but=
=20
cannot write anything due to different user. I have following questions bas=
ed on=20
experiments.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D763453615-23112010><FONT color=3D=
#0000ff=20
size=3D2 face=3DArial></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D763453615-23112010><FONT color=3D=
#0000ff=20
size=3D2 face=3DArial>1. How can I access HDFS or submit jobs as different =
user than=20
my java app is running. For example, Hadoop cluster is setup for "hadoop" u=
ser=20
and my java app is runnign as different user. In order to run the job corre=
ctly,=20
I have to submit it as "hadoop" user. correct? How to achive it=20
programitcally?</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D763453615-23112010><FONT color=3D=
#0000ff=20
size=3D2 face=3DArial>2. Few of the jobs I am calling is provided by the li=
brary=20
which means I cannot add these two config properties myself. Is there any w=
ay=20
around this other than replicating the job submission code from the library=
 to=20
locally?</FONT></SPAN></DIV>
<DIV>&nbsp;</DIV>
<DIV><SPAN class=3D763453615-23112010></SPAN><FONT face=3DArial><FONT=20
color=3D#0000ff><FONT size=3D2>T<SPAN=20
class=3D763453615-23112010>hanks</SPAN></FONT></FONT></FONT></DIV>
<DIV><SPAN class=3D763453615-23112010></SPAN><SPAN=20
class=3D763453615-23112010></SPAN><FONT face=3DArial><FONT color=3D#0000ff>=
<FONT=20
size=3D2>P<SPAN=20
class=3D763453615-23112010>raveen</SPAN></FONT></FONT></FONT><BR></DIV>
<DIV dir=3Dltr lang=3Den-us class=3DOutlookMessageHeader align=3Dleft>
<HR tabIndex=3D-1>
<FONT size=3D2 face=3DTahoma><B>From:</B> ext Henning Blohm=20
[mailto:henning.blohm@zfabrik.de] <BR><B>Sent:</B> Tuesday, November 23, 20=
10=20
3:24 AM<BR><B>To:</B> mapreduce-user@hadoop.apache.org<BR><B>Subject:</B> R=
E:=20
Starting a Hadoop job programtically<BR></FONT><BR></DIV>
<DIV></DIV>Hi Praveen,<BR><BR>&nbsp; in order to submit it to the cluster, =
you=20
just need to have a core-site.xml on your classpath (or load it explicitly =
into=20
your configuration object) that looks (at least) like=20
this<BR><BR>&lt;configuration&gt;<BR>&lt;property&gt;<BR>&lt;name&gt;fs.def=
ault.name&lt;/name&gt;<BR>&lt;value&gt;hdfs://${name:port=20
of=20
namenode}&lt;/value&gt;<BR>&lt;/property&gt;<BR><BR>&lt;property&gt;<BR>&lt=
;name&gt;mapred.job.tracker&lt;/name&gt;<BR>&lt;value&gt;${name:port=20
of jobtracker}&lt;/value&gt;<BR>&lt;/property&gt;=20
<BR>&lt;/configuration&gt;<BR><BR>If you want to wait for each job's comple=
tion,=20
you can use job.waitForCompletion(true) rather than job.submit().<BR><BR>Go=
od=20
luck,<BR>&nbsp; henning<BR><BR><BR>On Mon, 2010-11-22 at 23:40 +0100,=20
praveen.peddi@nokia.com wrote:=20
<BLOCKQUOTE TYPE=3D"CITE"><FONT size=3D2><FONT color=3D#0000ff>Hi Thanks fo=
r your=20
  reply. In my case I have a Driver that calls multiple jobs one after the=
=20
  other. I am using the following code to submit each job but it uses local=
=20
  hadoop jar files that is in the classpath. Its not submitting the job to=
=20
  Hadoop cluster. I thought I would need to specify where the master Hadoop=
 is=20
  located on remote machine. Example command I use from command line is as=
=20
  follows but I need to do it from my Java program.</FONT></FONT> </BLOCKQU=
OTE>
<BLOCKQUOTE TYPE=3D"CITE"> </BLOCKQUOTE>
<BLOCKQUOTE TYPE=3D"CITE">$ hadoop-0.20.2/bin/hadoop jar=20
  /home/ppeddi/dev/Merchandising/RelevancyEngine/relevancy-core/dist/Releva=
ncy4.jar=20
  -i raw-downloads-input-10K -o reco-patterns-output-10K-1S -k 100 -method=
=20
  mapreduce -g 500 -regex '[\ ]' -s 5<BR><BR><BR></BLOCKQUOTE>
<BLOCKQUOTE TYPE=3D"CITE"><FONT size=3D2><FONT color=3D#0000ff>I hope I mad=
e the=20
  question clear now.</FONT></FONT> </BLOCKQUOTE>
<BLOCKQUOTE TYPE=3D"CITE"> </BLOCKQUOTE>
<BLOCKQUOTE TYPE=3D"CITE">Praveen<BR><BR></BLOCKQUOTE>
<BLOCKQUOTE TYPE=3D"CITE">
  <HR>
  <BR><B><FONT size=3D2>From:</FONT></B><FONT size=3D2> ext Henning Blohm=20
  [mailto:henning.blohm@zfabrik.de] </FONT><BR><B><FONT=20
  size=3D2>Sent:</FONT></B><FONT size=3D2> Monday, November 22, 2010 5:07=20
  PM</FONT><BR><B><FONT size=3D2>To:</FONT></B><FONT size=3D2>=20
  mapreduce-user@hadoop.apache.org</FONT><BR><B><FONT=20
  size=3D2>Subject:</FONT></B><FONT size=3D2> Re: Starting a Hadoop job=20
  programtically</FONT><BR><BR><BR></BLOCKQUOTE>
<BLOCKQUOTE TYPE=3D"CITE"><BR></BLOCKQUOTE>
<BLOCKQUOTE TYPE=3D"CITE">Hi Praveen,<BR><BR>&nbsp; we do. We are using the=
=20
  "new" org.apache.hadoop.mapreduce.* API in Hadoop 0.20.2.<BR><BR>&nbsp;=20
  Essentially the flow is:<BR><BR>&nbsp; //----<BR>&nbsp; // assuming all c=
onfig=20
  is on the class path<BR>&nbsp; Configuration config =3D new Configuration=
();=20
  <BR>&nbsp; Job job =3D new Job(config, "some job name");<BR><BR>&nbsp; //=
 set=20
  in/out types<BR>&nbsp; job.setInputFormatClass(...);<BR>&nbsp;=20
  job.setOutputFormatClass(...);<BR>&nbsp;=20
  job.setMapOutputKeyClass(...);<BR>&nbsp;=20
  job.setMapOutputValueClass(...);<BR>&nbsp;=20
  job.setOutputKeyClass(...);<BR>&nbsp;=20
  job.setOutputValueClass(...);<BR><BR>&nbsp; // set implementations as=20
  required<BR>&nbsp; job.setMapperClass(&lt;your mapper implementation clas=
s=20
  object&gt;);<BR>&nbsp; job.setCombinerClass(&lt;your combiner implementat=
ion=20
  class object&gt;);<BR>&nbsp; job.setReducerClass(&lt;your reducer=20
  implementation class object&gt;);<BR><BR>&nbsp; // set the jar... this is=
=20
  often the tricky part!<BR>&nbsp; job.setJarByClass(&lt;some class that is=
 in=20
  the job jar and not elsewhere higher up on the class path&gt;);<BR><BR>&n=
bsp;=20
  job.submit();<BR>&nbsp; //----<BR><BR>Hope I didn't forget anything.&nbsp=
;=20
  <BR><BR>Note: You need to give Hadoop something it can launch in a JVM th=
at=20
  has no more but the hadoop jars and whatever else you<BR>configured stati=
cally=20
  in your hadoop-env.sh script.<BR><BR>Can you describe your scenario in mo=
re=20
  detail?<BR><BR>Henning<BR><BR><BR>Am Montag, den 22.11.2010, 22:39 +0100=
=20
  schrieb praveen.peddi@nokia.com: <BR>
  <BLOCKQUOTE TYPE=3D"CITE"><FONT size=3D2>Hi all,</FONT> <BR><FONT size=3D=
2>I am=20
    trying to figure how I can start a hadoop job porgramatically from my J=
ava=20
    application running in an app server. I was able to run my map reduce j=
ob=20
    using hadoop command from hadoop master machine but my goal is to run t=
he=20
    same job from my java program (running on a different machine than mast=
er).=20
    I googled and could not find solution for this. All the examples I have=
 seen=20
    so far are using hadoop from command line to start a job. </FONT><BR><F=
ONT=20
    size=3D2>1. Has anyone called Hadoop job invocation from a Java=20
    application?</FONT> <BR><FONT size=3D2>2. If so, could someone provide =
some=20
    sample code.</FONT> <BR><FONT size=3D2>3. </FONT><BR><FONT=20
    size=3D2>Thanks</FONT> <BR><FONT size=3D2>Praveen</FONT> <BR><BR></BLOC=
KQUOTE>
  <TABLE cellSpacing=3D0 cellPadding=3D0 width=3D"100%">
    <TBODY>
    <TR>
      <TD>
        <TABLE cellSpacing=3D0 cellPadding=3D0 width=3D"100%">
          <TBODY>
          <TR>
            <TD><B>Henning Blohm</B><BR><BR><B>ZFabrik Software=20
              KG</B><BR><BR><A=20
              href=3D"mailto:henning.blohm@zfabrik.de">henning.blohm@zfabri=
k.de</A><BR><A=20
              href=3D"http://www.z2-environment.eu">www.z2-environment.eu</=
A><BR><BR><BR><BR></TD></TR></TBODY></TABLE></TD></TR></TBODY></TABLE></BLO=
CKQUOTE><BR></BODY></HTML>

--_000_C49C624DE9A7B048BA6120569B28B4DD388418922DNOKEUMSG02mgd_--