Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AB6E6107E7 for ; Thu, 13 Mar 2014 21:38:20 +0000 (UTC) Received: (qmail 15907 invoked by uid 500); 13 Mar 2014 21:38:12 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 15809 invoked by uid 500); 13 Mar 2014 21:38:12 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 15802 invoked by uid 99); 13 Mar 2014 21:38:12 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 21:38:12 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of threadedblue@gmail.com designates 74.125.82.41 as permitted sender) Received: from [74.125.82.41] (HELO mail-wg0-f41.google.com) (74.125.82.41) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Mar 2014 21:38:07 +0000 Received: by mail-wg0-f41.google.com with SMTP id n12so1421038wgh.12 for ; Thu, 13 Mar 2014 14:37:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=2s8jZ82Hrni9gZKVKEOTfTw1RGEmXBCV8pqMIJdjH7w=; b=sVyspuso9kbY2gaES1/VhmjrUv5+r/yCnGKaUigJ0gMBaYpkNsZBXA1knvdmSJ4u8d wNyySOuXyxV0p0Fz1CZVGZcqbmiLXRpAR+FX2Zq45jNdr7Xi/VJxJ3X3SwWiMnFU9hxK 4Obtv0MY6hXNQp3wMlwHVDmTSVIP/mAhwl9En4xZ6kZ0VnZBAgsJwFy/tu13iJ0S13VD d/4GeJYU42FWbv7zIIADPE2h3J+ASRRlbJ1y9fc3OY/2bPTLFuck8P6gztuEIwaqGAjL BoIJPohYxnlgZU4+6u+ffb4TqYJSA+py1QEwwkLgkFa9b436Y81KD/FL7Ll9cpFe1ktE noUg== MIME-Version: 1.0 X-Received: by 10.194.63.103 with SMTP id f7mr3601805wjs.38.1394746666084; Thu, 13 Mar 2014 14:37:46 -0700 (PDT) Received: by 10.194.90.40 with HTTP; Thu, 13 Mar 2014 14:37:46 -0700 (PDT) In-Reply-To: References: Date: Thu, 13 Mar 2014 17:37:46 -0400 Message-ID: Subject: Re: Reg: Setting up Hadoop Cluster From: Geoffry Roberts To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7ba97cde95724304f483c369 X-Virus-Checked: Checked by ClamAV on apache.org --047d7ba97cde95724304f483c369 Content-Type: text/plain; charset=ISO-8859-1 Did you not populate the "slaves" file when you did your installation? In older versions of hadoop (< 2.0), there was a "master" file where you entered your name node. Now days there are multiple name nodes. I haven't worked with them as of yet. I installed pig, for example, on my name node and ran it from there. On Thu, Mar 13, 2014 at 5:22 PM, ados1984@gmail.com wrote: > Thank you Geoffry, > > I have some fundamental question here. > > 1. Once I have installed Hadoop, how can i identify which nodes is > master node, which is slave? > 2. My understanding is that master node is by default namenode and > slave node are data nodes, correct? > 3. So i installed hadoop and i do not know which one is namenode and > which one id datanode then how can i go in and start run my jar from > namenode? > 4. also when we do mapreduce programming, where do we write the > program on hadoop server (where we have nodes installed both > master/namenode and slaves/datanode) or in our local system using any > standard ide then package them together as jar and deploy it to name node, > but here again how can i identify which is name node and which is data node? > 5. Ok, assumming, I have figured out which one is data node and which > one is namenode then how will my mapreduce program or pig or hive scripts > know that it needs to run on node 1 or node 2 or node 3? > 6. also where do we install pig, hive and flume on hadoop > master/slaves nodes or somewhere else? and how do we let pig/hive know that > node 1 is master/namenode and other nodes are slaves or data nodes? > > I would really appreciate inputs on this questions as setting up hadoop is > turning out to be a quite complex task from where i currently see it. > > Regards, Andy. > > > On Thu, Mar 13, 2014 at 5:14 PM, Geoffry Roberts wrote: > >> Andy, >> >> Once you have hadoop running, You can run your jobs from the cli of the >> name node. When I write a map reduce job, I jar it up. and place it in, >> say, my home directory and run it from there. I do the same with pig >> scripts. I've used neither hive nor cascading, but I imagine they would >> work the same. >> >> Another approach I've tried is WebHDFS. It's for manipulating the hdfs >> via a restful interface. It worked well enough for me. I stopped using it >> when I discovered it didn't support MapFiles but that's another story. >> >> >> On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com wrote: >> >>> Hello Team, >>> >>> I have one question regarding putting data into hdfs and running >>> mapreduce on data present in hdfs. >>> >>> 1. hdfs is file system and so to interact with it what kind of >>> clients are available? also where do we need to install those client? >>> 2. regarding pig, hive and mapreduce, where do we install them on >>> hadoop cluster and from where do we run all scripts and how does it >>> internally know that it needs to run on node 1, node2 or node 3? >>> >>> any inputs here would really helpful. >>> >>> Thanks, Andy. >>> >> >> >> >> -- >> There are ways and there are ways, >> >> Geoffry Roberts >> > > -- There are ways and there are ways, Geoffry Roberts --047d7ba97cde95724304f483c369 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
Did you not populate the "slaves" file when you = did your installation? =A0In older versions of hadoop (< 2.0), =A0there = was a "master" file where you entered your name node. =A0Now days= there are multiple name nodes. =A0I haven't worked with them as of yet= .

I installed pig, for example, on my name node and ran it fro= m there. =A0


On Thu, Mar 13, 2014 at 5:22 PM, ados1984@gmail.com <ados1984@gmail.com> wrote:
Thank you Geoffry,=A0
<= br>
I have some fundamental question here.=A0
  1. O= nce I have installed Hadoop, how can i identify which nodes is master node,= which is slave?=A0
  2. My understanding is that master node is by default namenode and slave n= ode are data nodes, correct?
  3. So i installed hadoop and i do not know which one is namenode and which= one id datanode then how can i go in and start run my jar from namenode?
  4. also when we do mapreduce programming, where do we write the program= on hadoop server (where we have nodes installed both master/namenode and s= laves/datanode) or in our local system using any standard ide then package = them together as jar and deploy it to name node, but here again how can i i= dentify which is name node and which is data node?
  5. Ok, assumming, I have figured out which one is data node and which one = is namenode then how will my mapreduce program or pig or hive scripts know = that it needs to run on node 1 or node 2 or node 3?
  6. also where do w= e install pig, hive and flume on hadoop master/slaves nodes or somewhere el= se? and how do we let pig/hive know that node 1 is master/namenode and othe= r nodes are slaves or data nodes?
I would really appreciate inputs on this questions as setting up = hadoop is turning out to be a quite complex task from where i currently see= it.=A0

Regards, Andy.=A0


On Thu, Mar 13, 2014 at 5:14 PM, Geoffry= Roberts <threadedblue@gmail.com> wrote:
Andy,

Once you have hadoop running, =A0= You can run your jobs from the cli of the name node. When I write a map red= uce job, I jar it up. and place it in, say, my home directory and run it fr= om there. =A0I do the same with pig scripts. =A0I've used neither hive = nor cascading, but I imagine they would work the same.

Another approach I've tried is WebHDFS. =A0It's= for manipulating the hdfs via a restful interface. =A0It worked well enoug= h for me. =A0I stopped using it when I discovered it didn't support Map= Files but that's another story.


On Thu, Mar 13, 2014 at 5:00 PM, ados1984@gmail.com <ados1984@gmail.com> wrote:
Hello Team,=A0

I have one question regarding putting data into hdfs and running map= reduce on data present in hdfs.=A0
  1. hdfs is file system and so to interact with it what kind of cl= ients are available? also where do we need to install those client?
  2. regarding pig, hive and mapreduce, where do we install them on hadoop c= luster and from where do we run all scripts and how does it internally know= that it needs to run on node 1, node2 or node 3?
any inputs = here would really helpful.=A0

Thanks, Andy.=A0



<= font color=3D"#888888">--
There are ways and ther= e are ways,=A0

Geoffry Roberts




--
=
There are ways and there are ways,=A0

<= /div>Geoffry Roberts
--047d7ba97cde95724304f483c369--