Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F02DA9E1E for ; Mon, 13 Aug 2012 12:21:54 +0000 (UTC) Received: (qmail 46244 invoked by uid 500); 13 Aug 2012 12:21:53 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 45963 invoked by uid 500); 13 Aug 2012 12:21:48 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 45944 invoked by uid 99); 13 Aug 2012 12:21:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 12:21:48 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of manoj444@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-yx0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Aug 2012 12:21:41 +0000 Received: by yenm12 with SMTP id m12so3572350yen.35 for ; Mon, 13 Aug 2012 05:21:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=oa405j4ahy4dV2+Pfckz954CSJ0T3brpaC8Xmmh8hRg=; b=U76vBRxTf/u6DFGkqUW5vmv9zO9JCHRJD7yDTUOEZ1JrFmKPnoC4UqyPgS0p4UnxAe 6u2NZSKIn4k0QDRh8OdebIw3wF9cokAJACsh4dZXucqtM4BcXwIcXCl4u0Fvxyxxbc7A t4rl3TrOxoErSUj/N7OGAWNG0T7ihqLwAzlmMRxzwC4E3NBMNFC91jVx/6/XBi7dqiRZ XhiVr3KT8Obax+RTL93hmQ14ws94R5VN2dZdDc6uo89aMg3rz3rTQRz1EwBHLLAknFng lHJu+GyRscmbOsIVFQha8llaxa17w8Kry2GyaSnRCLXuJi5y54dOV0CuXAyPabDOtGkU n2CA== Received: by 10.68.232.103 with SMTP id tn7mr18601315pbc.86.1344860480140; Mon, 13 Aug 2012 05:21:20 -0700 (PDT) MIME-Version: 1.0 Received: by 10.66.134.162 with HTTP; Mon, 13 Aug 2012 05:20:59 -0700 (PDT) In-Reply-To: References: From: Manoj Babu Date: Mon, 13 Aug 2012 17:50:59 +0530 Message-ID: Subject: Re: doubt on Hadoop job submission process To: Harsh J Cc: mapreduce-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7b33d8d830f5bb04c724bb9b --047d7b33d8d830f5bb04c724bb9b Content-Type: text/plain; charset=ISO-8859-1 Then i need to submit the jar contains non hadoop activity classes and its supporting libraries to all the nodes since i can't create two jar's. Is there anyway to do it optimized? Cheers! Manoj. On Mon, Aug 13, 2012 at 5:20 PM, Harsh J wrote: > Sure, you may separate the logic as you want it to be, but just ensure > the configuration object has a proper setJar or setJarByClass done on > it before you submit the job. > > On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu wrote: > > Hi Harsh, > > > > Thanks for your reply. > > > > Consider from my main program i am doing so many > > activities(Reading/writing/updating non hadoop activities) before > invoking > > JobClient.runJob(conf); > > Is it anyway to separate the process flow by programmatic instead of > going > > for any workflow engine? > > > > Cheers! > > Manoj. > > > > > > > > On Mon, Aug 13, 2012 at 4:10 PM, Harsh J wrote: > >> > >> Hi Manoj, > >> > >> Reply inline. > >> > >> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu wrote: > >> > Hi All, > >> > > >> > Normal Hadoop job submission process involves: > >> > > >> > Checking the input and output specifications of the job. > >> > Computing the InputSplits for the job. > >> > Setup the requisite accounting information for the DistributedCache of > >> > the > >> > job, if necessary. > >> > Copying the job's jar and configuration to the map-reduce system > >> > directory > >> > on the distributed file-system. > >> > Submitting the job to the JobTracker and optionally monitoring it's > >> > status. > >> > > >> > I have a doubt in 4th point of job execution flow could any of you > >> > explain > >> > it? > >> > > >> > What is job's jar? > >> > >> The job.jar is the jar you supply via "hadoop jar ". Technically > >> though, it is the jar pointed by JobConf.getJar() (Set via setJar or > >> setJarByClass calls). > >> > >> > Is it job's jar is the one we submitted to hadoop or hadoop will build > >> > based > >> > on the job configuration object? > >> > >> It is the former, as explained above. > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J > --047d7b33d8d830f5bb04c724bb9b Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Then i need to submit the jar contains non hadoop activity classes and its = supporting libraries to all the nodes since i can't create two jar'= s.
Is there anyway to do it optimized?


Cheers!
Manoj.



On Mon, Aug 13, 2012 at 5:20 PM, Harsh J= <harsh@cloudera.com> wrote:
Sure, you may separate the logic as you want it to be, but just ensure
the configuration object has a proper setJar or setJarByClass done on
it before you submit the job.

On Mon, Aug 13, 2012 at 4:43 PM, Manoj Babu <manoj444@gmail.com> wrote:
> Hi Harsh,
>
> Thanks for your reply.
>
> Consider from my main program i am doing so many
> activities(Reading/writing/updating non hadoop activities) before invo= king
> JobClient.runJob(conf);
> Is it anyway to separate the process flow by programmatic instead of g= oing
> for any workflow engine?
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Aug 13, 2012 at 4:10 PM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Hi Manoj,
>>
>> Reply inline.
>>
>> On Mon, Aug 13, 2012 at 3:42 PM, Manoj Babu <manoj444@gmail.com> wrote:
>> > Hi All,
>> >
>> > Normal Hadoop job submission process involves:
>> >
>> > Checking the input and output specifications of the job.
>> > Computing the InputSplits for the job.
>> > Setup the requisite accounting information for the Distribute= dCache of
>> > the
>> > job, if necessary.
>> > Copying the job's jar and configuration to the map-reduce= system
>> > directory
>> > on the distributed file-system.
>> > Submitting the job to the JobTracker and optionally monitorin= g it's
>> > status.
>> >
>> > I have a doubt in 4th point of =A0job execution flow could an= y of you
>> > explain
>> > it?
>> >
>> > What is job's jar?
>>
>> The job.jar is the jar you supply via "hadoop jar <jar>= ". Technically
>> though, it is the jar pointed by JobConf.getJar() (Set via setJar = or
>> setJarByClass calls).
>>
>> > Is it job's jar is the one we submitted to hadoop or hado= op will build
>> > based
>> > on the job configuration object?
>>
>> It is the former, as explained above.
>>
>> --
>> Harsh J
>
>



--
Harsh J

--047d7b33d8d830f5bb04c724bb9b--