Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C4019D3AA for ; Wed, 23 Jan 2013 04:39:50 +0000 (UTC) Received: (qmail 11345 invoked by uid 500); 23 Jan 2013 04:39:45 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 11191 invoked by uid 500); 23 Jan 2013 04:39:45 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 11179 invoked by uid 99); 23 Jan 2013 04:39:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2013 04:39:45 +0000 X-ASF-Spam-Status: No, hits=-0.1 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of hemanty@thoughtworks.com designates 64.18.0.182 as permitted sender) Received: from [64.18.0.182] (HELO exprod5og106.obsmtp.com) (64.18.0.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Jan 2013 04:39:35 +0000 Received: from mail-ie0-f197.google.com ([209.85.223.197]) (using TLSv1) by exprod5ob106.postini.com ([64.18.4.12]) with SMTP ID DSNKUP9pcqyIzCSP7zGXPuFCzy4pstSVtJoh@postini.com; Tue, 22 Jan 2013 20:39:15 PST Received: by mail-ie0-f197.google.com with SMTP id 16so39304287iea.8 for ; Tue, 22 Jan 2013 20:39:13 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:x-received:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=iQmmlT1IPnToKz9MgzUrvYVWLjXqjOyUGAZWY46fFyY=; b=TEPkxmdc/4dfI6+aW6EgDHsEBh1RW35C6f0Rmo2u/HEtmIfQGfqm/3DoTqRtIr37X9 AWtKkNDbFVbsAZITWzVhVkPG+8pV0eRYIyMiynErmAe0wtYhWrytDbK0hZRiqsrohttN n7ImI2oeYijR7D5VIqU0n0yDkLMpU6Osa+dEGMZ8qfAZ+UmIoJSNcdPCEZj29O5iFeLP 84H7se2L01VfU7eRA38Gyp9uiqaGRvKaege+wMqxSHe6+TQxOHWK1A4amr0e845gdWsD nHOaSeOYB6f2MsrFj36JatFBYVj4bO6W8ScGwYMGh8NzZZYc/wGA5MfWBHytALtWXRFy S0Lw== X-Received: by 10.60.31.206 with SMTP id c14mr18472940oei.88.1358915953799; Tue, 22 Jan 2013 20:39:13 -0800 (PST) MIME-Version: 1.0 X-Received: by 10.60.31.206 with SMTP id c14mr18472937oei.88.1358915953656; Tue, 22 Jan 2013 20:39:13 -0800 (PST) Received: by 10.76.1.18 with HTTP; Tue, 22 Jan 2013 20:39:13 -0800 (PST) In-Reply-To: <424548712-1358909647-cardhu_decombobulator_blackberry.rim.net-181512260-@b1.c16.bise7.blackberry> References: <424548712-1358909647-cardhu_decombobulator_blackberry.rim.net-181512260-@b1.c16.bise7.blackberry> Date: Wed, 23 Jan 2013 10:09:13 +0530 Message-ID: Subject: Re: Where do/should .jar files live? From: Hemanth Yamijala To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=e89a8ff252e0b281d004d3ed4699 X-Gm-Message-State: ALoCoQkbQDarXaeIl3glCZWdHBx51G79f+Fcy1NznC2aJ7aUdg1ycpMJQWPgk6uMt2OovxSktSA9gwYPN6mF9f29853l4uS8gl/IAmzr3fjTZhSvsd7EjZBofgVosW6qDtYfNMVqF+k7sDxN1GS44j2Hl9/FIkbfKQ== X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff252e0b281d004d3ed4699 Content-Type: text/plain; charset=ISO-8859-1 On top of what Bejoy said, just wanted to add that when you submit a job to Hadoop using the hadoop jar command, the jars which you reference in the command on the edge/client node will be picked up by Hadoop and made available to the cluster nodes where the mappers and reducers run. Thanks Hemanth On Wed, Jan 23, 2013 at 8:24 AM, wrote: > ** > Hi Chris > > In larger clusters it is better to have an edge/client node where all the > user jars reside and you trigger your MR jobs from here. > > A client/edge node is a server with hadoop jars and conf but hosting no > daemons. > > In smaller clusters one DN might act as the client node and you can > execute your jars from there. Here you have a risk of that DN getting > filled if the files are copied to hdfs from this DN (as per block placement > policy one replica would always be on this node) > > > In oozie you put your executables into hdfs . But oozie comes at an > integration level. In initial development phase, developers put jar into > the LFS on client node, execute and test their code. > Regards > Bejoy KS > > Sent from remote device, Please excuse typos > ------------------------------ > *From: * Chris Embree > *Date: *Tue, 22 Jan 2013 14:24:40 -0500 > *To: * > *ReplyTo: * user@hadoop.apache.org > *Subject: *Where do/should .jar files live? > > Hi List, > > This should be a simple question, I think. Disclosure, I am not a java > developer. ;) > > We're getting ready to build our Dev and Prod clusters. I'm pretty > comfortable with HDFS and how it sits atop several local file systems on > multiple servers. I'm fairly comfortable with the concept of Map/Reduce > and why it's cool and we want it. > > Now for the question. Where should my developers, put and store their jar > files? Or asked another way, what's the best entry point for submitting > jobs? > > We have separate physical systems for NN, Checkpoint Node (formerly 2nn), > Job Tracker and Standby NN. Should I run from the JT node? Do I keep all > of my finished .jar's on the JT local file system? > Or should I expect that jobs will be run via Oozie? Do I put jars on the > local Oozie FS? > > Thanks in advance. > Chris > --e89a8ff252e0b281d004d3ed4699 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On top of what Bejoy said, just wanted to add that when yo= u submit a job to Hadoop using the hadoop jar command, the jars which you r= eference in the command on the edge/client node will be picked up by Hadoop= and made available to the cluster nodes where the mappers and reducers run= .

Thanks
Hemanth


On Wed, Jan 23, 2013 at= 8:24 AM, <bejoy.hadoop@gmail.com> wrote:
Hi Chris

In larger cluste= rs it is better to have an edge/client node where all the user jars reside = and you trigger your MR jobs from here.

A client/edge node is a server with hadoop jars and conf but hosting no= daemons.

In smaller clusters one DN might act as the client node an= d you can execute your jars from there. Here you have a risk of that DN ge= tting filled if the files are copied to hdfs from this DN (as per block pla= cement policy one replica would always be on this node)


In oozie you put your executables into hdfs . But oozie comes at an= integration level. In initial development phase, developers put jar into t= he LFS on client node, execute and test their code.
Regards
Bejoy KS

Sent from remote device, Please excuse typos

= From: Chris Embree <cembree@gmail.com>
Date: Tue, 22 Jan 2013 14:24:40 -0500
Subject: Where do/should .jar files live?

Hi List,

This should be a si= mple question, I think. =A0Disclosure, I am not a java developer. ;)

We're getting ready to build our Dev and Prod clust= ers. I'm pretty comfortable with HDFS and how it sits atop several loca= l file systems on multiple servers. =A0I'm fairly comfortable with the = concept of Map/Reduce and why it's cool and we want it.=A0

Now for the question. =A0Where should my developers, pu= t and store their jar files? =A0Or asked another way, what's the best e= ntry point for submitting jobs?

We have=A0separate= =A0physical systems for NN, Checkpoint Node (formerly 2nn), Job Tracker and= Standby NN. =A0Should I run from the JT node? Do I keep all of my finished= .jar's on the JT local file system? =A0
Or should I expect that jobs will be run via Oozie? =A0Do I put jars o= n the local Oozie FS?=A0

Thanks in advance.=A0
Chris

--e89a8ff252e0b281d004d3ed4699--