Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of manoj444@gmail.com designates
 209.85.213.176 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr2BXK3LNaX_iuZDM7v=XKw42uNOZsdEt0a4FpXq-pffUg@mail.gmail.com>
References: 
 <CACvfG=oE0C6MZPEDX8zY-H=R0dg4Nxh-LHNzvDvb6doLOF0dYQ@mail.gmail.com>
 <CAOcnVr1FPFKNwjUgoy+_3Pf6=r8UOy3H-XWg=K0hYZuXOWtWGg@mail.gmail.com>
 <CACvfG=oh1xrz2C_MfuQvTY+tz+OHikrJ7YCQVGUt-ELpz545zA@mail.gmail.com>
 <CAOcnVr2BXK3LNaX_iuZDM7v=XKw42uNOZsdEt0a4FpXq-pffUg@mail.gmail.com>
From: Manoj Babu <manoj444@gmail.com>
Date: Mon, 13 Aug 2012 17:50:59 +0530
Message-ID: 
 <CACvfG=pM8G9S2zxmSg_fZmpvaO_E5+k9DM5hrz3mjamfSWkevA@mail.gmail.com>
Subject: Re: doubt on Hadoop job submission process
To: Harsh J <harsh@cloudera.com>
Cc: mapreduce-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7b33d8d830f5bb04c724bb9b

--047d7b33d8d830f5bb04c724bb9b
Content-Type: text/plain; charset=ISO-8859-1

Then i need to submit the jar contains non hadoop activity classes and its
supporting libraries to all the nodes since i can't create two jar's.
Is there anyway to do it optimized?


Cheers!
Manoj.


On Mon, Aug 13, 2012 at 5:20 PM, Harsh J <harsh@cloudera.com> wrote:

> Sure, you may separate the logic as you want it to be, but just ensure
> the configuration object has a proper setJar or setJarByClass done on
> it before you submit the job.
>
> On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu <manoj444@gmail.com> wrote:
> > Hi Harsh,
> >
> > Thanks for your reply.
> >
> > Consider from my main program i am doing so many
> > activities(Reading/writing/updating non hadoop activities) before
> invoking
> > JobClient.runJob(conf);
> > Is it anyway to separate the process flow by programmatic instead of
> going
> > for any workflow engine?
> >
> > Cheers!
> > Manoj.
> >
> >
> >
> > On Mon, Aug 13, 2012 at 4:10 PM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> Hi Manoj,
> >>
> >> Reply inline.
> >>
> >> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu <manoj444@gmail.com> wrote:
> >> > Hi All,
> >> >
> >> > Normal Hadoop job submission process involves:
> >> >
> >> > Checking the input and output specifications of the job.
> >> > Computing the InputSplits for the job.
> >> > Setup the requisite accounting information for the DistributedCache of
> >> > the
> >> > job, if necessary.
> >> > Copying the job's jar and configuration to the map-reduce system
> >> > directory
> >> > on the distributed file-system.
> >> > Submitting the job to the JobTracker and optionally monitoring it's
> >> > status.
> >> >
> >> > I have a doubt in 4th point of  job execution flow could any of you
> >> > explain
> >> > it?
> >> >
> >> > What is job's jar?
> >>
> >> The job.jar is the jar you supply via "hadoop jar <jar>". Technically
> >> though, it is the jar pointed by JobConf.getJar() (Set via setJar or
> >> setJarByClass calls).
> >>
> >> > Is it job's jar is the one we submitted to hadoop or hadoop will build
> >> > based
> >> > on the job configuration object?
> >>
> >> It is the former, as explained above.
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

--047d7b33d8d830f5bb04c724bb9b
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Then i need to submit the jar contains non hadoop activity classes and its =
supporting libraries to all the nodes since i can&#39;t create two jar&#39;=
s.<div>Is there anyway to do it optimized?<br><div><br></div><div><br clear=
=3D"all">

Cheers!<div>Manoj.</div><br>
<br><br><div class=3D"gmail_quote">On Mon, Aug 13, 2012 at 5:20 PM, Harsh J=
 <span dir=3D"ltr">&lt;<a href=3D"mailto:harsh@cloudera.com" target=3D"_bla=
nk">harsh@cloudera.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_=
quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1=
ex">

Sure, you may separate the logic as you want it to be, but just ensure<br>
the configuration object has a proper setJar or setJarByClass done on<br>
it before you submit the job.<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu &lt;<a href=3D"mailto:manoj444@=
gmail.com">manoj444@gmail.com</a>&gt; wrote:<br>
&gt; Hi Harsh,<br>
&gt;<br>
&gt; Thanks for your reply.<br>
&gt;<br>
&gt; Consider from my main program i am doing so many<br>
&gt; activities(Reading/writing/updating non hadoop activities) before invo=
king<br>
&gt; JobClient.runJob(conf);<br>
&gt; Is it anyway to separate the process flow by programmatic instead of g=
oing<br>
&gt; for any workflow engine?<br>
&gt;<br>
&gt; Cheers!<br>
&gt; Manoj.<br>
&gt;<br>
&gt;<br>
&gt;<br>
&gt; On Mon, Aug 13, 2012 at 4:10 PM, Harsh J &lt;<a href=3D"mailto:harsh@c=
loudera.com">harsh@cloudera.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Hi Manoj,<br>
&gt;&gt;<br>
&gt;&gt; Reply inline.<br>
&gt;&gt;<br>
&gt;&gt; On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu &lt;<a href=3D"mailto:=
manoj444@gmail.com">manoj444@gmail.com</a>&gt; wrote:<br>
&gt;&gt; &gt; Hi All,<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Normal Hadoop job submission process involves:<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Checking the input and output specifications of the job.<br>
&gt;&gt; &gt; Computing the InputSplits for the job.<br>
&gt;&gt; &gt; Setup the requisite accounting information for the Distribute=
dCache of<br>
&gt;&gt; &gt; the<br>
&gt;&gt; &gt; job, if necessary.<br>
&gt;&gt; &gt; Copying the job&#39;s jar and configuration to the map-reduce=
 system<br>
&gt;&gt; &gt; directory<br>
&gt;&gt; &gt; on the distributed file-system.<br>
&gt;&gt; &gt; Submitting the job to the JobTracker and optionally monitorin=
g it&#39;s<br>
&gt;&gt; &gt; status.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; I have a doubt in 4th point of =A0job execution flow could an=
y of you<br>
&gt;&gt; &gt; explain<br>
&gt;&gt; &gt; it?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; What is job&#39;s jar?<br>
&gt;&gt;<br>
&gt;&gt; The job.jar is the jar you supply via &quot;hadoop jar &lt;jar&gt;=
&quot;. Technically<br>
&gt;&gt; though, it is the jar pointed by JobConf.getJar() (Set via setJar =
or<br>
&gt;&gt; setJarByClass calls).<br>
&gt;&gt;<br>
&gt;&gt; &gt; Is it job&#39;s jar is the one we submitted to hadoop or hado=
op will build<br>
&gt;&gt; &gt; based<br>
&gt;&gt; &gt; on the job configuration object?<br>
&gt;&gt;<br>
&gt;&gt; It is the former, as explained above.<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Harsh J<br>
&gt;<br>
&gt;<br>
<br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br></div></div>

--047d7b33d8d830f5bb04c724bb9b--