Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 79C1FD4C3 for ; Fri, 26 Oct 2012 19:56:07 +0000 (UTC) Received: (qmail 45828 invoked by uid 500); 26 Oct 2012 19:56:03 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 45522 invoked by uid 500); 26 Oct 2012 19:56:02 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 45515 invoked by uid 99); 26 Oct 2012 19:56:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 19:56:02 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [206.47.135.205] (HELO Spam1.prd.mpac.ca) (206.47.135.205) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 26 Oct 2012 19:55:57 +0000 Received: from Spam1.prd.mpac.ca (unknown [127.0.0.1]) by IMSVA80 (Postfix) with ESMTP id 024461D8067 for ; Fri, 26 Oct 2012 15:55:34 -0400 (EDT) Received: from SMAIL1.prd.mpac.ca (unknown [172.29.2.53]) by Spam1.prd.mpac.ca (Postfix) with ESMTP id B2ED41D8060 for ; Fri, 26 Oct 2012 15:55:33 -0400 (EDT) Received: from SMAIL1.prd.mpac.ca ([fe80::d548:4221:967c:4cfb]) by SMAIL1.prd.mpac.ca ([fe80::18cb:8648:b77f:2b55%11]) with mapi id 14.02.0318.004; Fri, 26 Oct 2012 15:55:33 -0400 From: "Kartashov, Andy" To: "user@hadoop.apache.org" Subject: RE: cluster set-up / a few quick questions - SOLVED Thread-Topic: cluster set-up / a few quick questions - SOLVED Thread-Index: Ac2zs9dtmS3BpDI+QXGhl758a9G7/g== Date: Fri, 26 Oct 2012 19:55:32 +0000 Message-ID: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.29.60.102] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-TM-AS-Product-Ver: IMSVA-8.0.0.1304-6.5.0.1024-19312.001 X-TM-AS-Result: No--30.318-5.0-31-10 X-imss-scan-details: No--30.318-5.0-31-10 X-TM-AS-Result-Xfilter: Match text exemption rules:No X-TMASE-MatchedRID: Rp71wniPtoNNwBTaxUmxQ9jko+KiQPUGJPNIV6GF8mss7eP5cPCWQ72m lpsBwd+ckSdbj0ZS3zdtotC7DMqeNg0zx+NDDxZ5G5mg0pzqmX4iqZ/4glcc/HoZ5YK74mUQp7u eaEkDqTP1o5SkqHQD6hcglY7+3ZaOtA8JR9KeuqG+NtCxbjBfhul4HqJgBYkcRZX/uARLPugKYo IBfuC9rH3Ecp3RwRwXl1YHZ9wuMacI2JRK10QJ6Ca1MaKuob8PC/ExpXrHizxXopZjyO6CZW/XA 78IfEJJ4Ni8aTDHPT/mFsvNwctvVqzyVfyvsHkP7spMO3HwKCB3T8gwfPR6+Ujv0ePwhjxI8w1t Li9x3Wgk0B+bBPdS5BmMUppp8GdpgABse4jK6+fhqJ6oLOc8QVjyZ+FJjLlS2ynMvceUvyEi/B2 gujrEH/4T6mAY10b6CES68TotJGeZVeo6ZJHDNoXOXi9YU0Y/HznaOB9+eYjjud2x7TPVt2nGaB B9GBKEiIAAZbpsm0nN6UyKDtNYhF3IljUA3OevpkIW3Gref30Hsnt0u0M2AG3D6f6IpbLI3VEtr LlHcpA/CuxWa6IsmB2794JR90kQWONZnYJ3/08fBnGazfXNjqvd3WuZF0KA+Cckfm+bb6ChuSmr CzRTCPhrPJGbPiPvZ7TMxxr9286CO1Tz9gfgYRM0JxSxHjFJFn3pUzsLVrQ= X-Virus-Checked: Checked by ClamAV on apache.org Hadoopers, The problem was in EC2 security. While I could passwordlessly ssh into ano= ther node and back I could not telnet to it due to EC2 firewall. Needed to= open ports for the NN and JT. :) Now I can see 2 DNs running "hadoop fsck " and can also -ls into NN from = the slave. Sweet!!! Is this possible to balance data over DNs without copying them with hadoop= -put command? I read about bin/start-balancer.sh somewhere but cannot find= it on my current hadoop installation. Besides, is balancing data over DN going to improve perfomance of MR job? Cheers, Happy Hadooping. -----Original Message----- From: Nitin Pawar [mailto:nitinpawar432@gmail.com] Sent: Friday, October 26, 2012 3:18 PM To: user@hadoop.apache.org Subject: Re: cluster set-up / a few quick questions questions 1) Have you setup password less ssh between both hosts for the user who own= s the hadoop processes (or root) 2) If answer to questions 1 is yes, how did you start NN, JT DN and TT 3) If you started them one by one, there is no reason running a command on = one node will execute it on other. On Sat, Oct 27, 2012 at 12:17 AM, Kartashov, Andy = wrote: > Andy, many thanks. > > I am stuck here now so please put me in the right direction. > > I successfully ran a job on a cluster on foo1 in pseudo-distributed mode = and are now trying to try fully-dist'ed one. > > a. I created another instance foo2 on EC2. Installed hadoop on it and cop= ied conf/ folder from foo1 to foo2. I created /hadoop/dfs/data folder on = the local linux system on foo2. > > b. on foo1 I created file conf/slaves and added: > localhost > > > At this point I cannot find an answer on what to do next. > > I started NN, DN, SNN, JT, TT on foor1. After I ran "hadoop fsck /user/ba= r -files -blocks -locations", it showed # of datanode as 1. I was expectin= g DN and TT on foo2 to be started by foo1. But it didn't happen, so I start= ed them myself and tried the the command again. Still one DD. > I realise that boo2 has no data at this point but I could not find bin/st= art-balancer.sh script to help me to balance data over to DD from foo1 to f= oo2. > > What do I do next? > > Thanks > AK > > -----Original Message----- > From: Andy Isaacson [mailto:adi@cloudera.com] > Sent: Friday, October 26, 2012 2:21 PM > To: user@hadoop.apache.org > Subject: Re: cluster set-up / a few quick questions > > On Fri, Oct 26, 2012 at 9:40 AM, Kartashov, Andy = wrote: >> Gents, > > We're not all male here. :) I prefer "Hadoopers" or "hi all,". > >> 1. >> - do you put Master's node under fs.default.name in core-site= .xml on the slave machines or slaves' hostnames? > > Master. I have a 4-node cluster, named foo1 - foo4. My fs.default.name i= s hdfs://foo1.domain.com. > >> - do you need to run "sudo -u hdfs hadoop namenode -format" and create /= tmp /var folders on the HDFS of the slave machines that will be running onl= y DN and TT or not? Do you still need to create hadoop/dfs/name folder on t= he slaves? > > (The following is the simple answer, for non-HA non-federated HDFS. > You'll want to get the simple example working before trying the > complicated ones.) > > No. A cluster has one namenode, running on the machine known as the maste= r, and the admin must "hadoop namenode -format" on that machine only. > > In my example, I ran "hadoop namenode -format" on foo1. > >> 2. >> In hdfs-site.xml for dfs.name.dir & dfs.data.dir properties we specify = /hadoop/dfs/name /hadoop/dfs/data being local linux NFS directories by r= unning command "mkdir -p /hadoop/dfs/data" >> but mapred.system.dir property is to point to HDFS and not NFS since w= e are running "sudo -u hdfs hadoop fs -mkdir /tmp/mapred/system"?? >> If so and since it is exactly the same format /far/boo/baz how does had= oop know which directory is local on NFS or HDFS? > > This is very confusing, to be sure! There are a few places where paths a= re implicitly known to be on HDFS rather than a Linux filesystem path. mapr= ed.system.dir is one of those. This does mean that given a string that star= ts with "/tmp/" you can't necessarily know whether it's a Linux path or a H= DFS path without looking at the larger context. > > In the case of mapred.system.dir, the docs are the place to check; accord= ing to cluster_setup.html, mapred.system.dir is "Path on the HDFS where whe= re the Map/Reduce framework stores system files". > > http://hadoop.apache.org/docs/r1.0.3/cluster_setup.html > > Hope this helps, > -andy > NOTICE: This e-mail message and any attachments are confidential, > subject to copyright and may be privileged. Any unauthorized use, > copying or disclosure is prohibited. If you are not the intended > recipient, please delete and contact the sender immediately. Please > consider the environment before printing this e-mail. AVIS : le > pr=E9sent courriel et toute pi=E8ce jointe qui l'accompagne sont > confidentiels, prot=E9g=E9s par le droit d'auteur et peuvent =EAtre couve= rts > par le secret professionnel. Toute utilisation, copie ou divulgation > non autoris=E9e est interdite. Si vous n'=EAtes pas le destinataire pr=E9= vu > de ce courriel, supprimez-le et contactez imm=E9diatement l'exp=E9diteur. > Veuillez penser =E0 l'environnement avant d'imprimer le pr=E9sent courrie= l -- Nitin Pawar NOTICE: This e-mail message and any attachments are confidential, subject t= o copyright and may be privileged. Any unauthorized use, copying or disclos= ure is prohibited. If you are not the intended recipient, please delete and= contact the sender immediately. Please consider the environment before pri= nting this e-mail. AVIS : le pr=E9sent courriel et toute pi=E8ce jointe qui= l'accompagne sont confidentiels, prot=E9g=E9s par le droit d'auteur et peu= vent =EAtre couverts par le secret professionnel. Toute utilisation, copie = ou divulgation non autoris=E9e est interdite. Si vous n'=EAtes pas le desti= nataire pr=E9vu de ce courriel, supprimez-le et contactez imm=E9diatement l= 'exp=E9diteur. Veuillez penser =E0 l'environnement avant d'imprimer le pr= =E9sent courriel