Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 11599 invoked from network); 13 Apr 2010 02:53:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 13 Apr 2010 02:53:03 -0000 Received: (qmail 26943 invoked by uid 500); 13 Apr 2010 02:53:02 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 26862 invoked by uid 500); 13 Apr 2010 02:53:02 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 26854 invoked by uid 99); 13 Apr 2010 02:53:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Apr 2010 02:53:01 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.217.225] (HELO mail-gx0-f225.google.com) (209.85.217.225) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Apr 2010 02:52:55 +0000 Received: by gxk25 with SMTP id 25so1572489gxk.11 for ; Mon, 12 Apr 2010 19:52:21 -0700 (PDT) MIME-Version: 1.0 Received: by 10.100.196.13 with HTTP; Mon, 12 Apr 2010 19:52:21 -0700 (PDT) In-Reply-To: References: Date: Mon, 12 Apr 2010 22:52:21 -0400 Received: by 10.101.150.1 with SMTP id c1mr8871371ano.112.1271127141125; Mon, 12 Apr 2010 19:52:21 -0700 (PDT) Message-ID: Subject: Re: hadoop on demand setup: Failed to retrieve 'hdfs' service address From: Kevin Van Workum To: general@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Mon, Apr 12, 2010 at 8:52 PM, Boyu Zhang wrote: > Hi Kevin, > > Sorry to bother again, I am wondering in order to get HOD to work, do we > need to install all the prerequisite software like passwordless ssh? Than= ks > a lot! SSH is not needed for HOD, it uses pbsdsh to launch processes on the nodes. HOD seems to be very sensitive about the python version: 2.4 and 2.6 don't work, you need 2.5. HOD is a little more flexible with Java, 1.5 and 1.6 seem to both work for me. Also, the most recent versions of Twisted and zope seem to be fine. > > Boyu > > On Tue, Apr 6, 2010 at 10:43 AM, Kevin Van Workum wro= te: > >> Hello, >> >> I'm trying to setup hadoop on demand (HOD) on my cluster. I'm >> currently unable to "allocate cluster". I'm starting hod with the >> following command: >> >> /usr/local/hadoop-0.20.2/hod/bin/hod -c >> /usr/local/hadoop-0.20.2/hod/conf/hodrc -t >> /b/01/vanw/hod/hadoop-0.20.2.tar.gz -o "allocate ~/hod 3" >> --ringmaster.log-dir=3D/tmp -b 4 >> >> The job starts on the nodes and I see the ringmaster running on the >> MotherSuperior. The ringmaster-main.log file is created, but is empty. >> I don't see any associated processes running on the other 2 nodes in >> the job. >> >> The critical errors are as follows: >> >> [2010-04-06 10:34:13,630] CRITICAL/50 hadoop:298 - Failed to retrieve >> 'hdfs' service address. >> [2010-04-06 10:34:13,631] DEBUG/10 hadoop:631 - Cleaning up cluster id >> 238366.jman, as cluster could not be allocated. >> [2010-04-06 10:34:13,632] DEBUG/10 hadoop:635 - Calling rm.stop() >> [2010-04-06 10:34:13,639] DEBUG/10 hadoop:637 - Returning from rm.stop() >> [2010-04-06 10:34:13,639] CRITICAL/50 hod:401 - Cannot allocate >> cluster /b/01/vanw/hod >> [2010-04-06 10:34:14,149] DEBUG/10 hod:597 - return code: 7 >> >> The contents of the hodrc file is: >> >> [hod] >> stream =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D True >> java-home =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D /usr/local/jdk= 1.6.0_02 >> cluster =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D orange >> cluster-factor =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 1.8 >> xrs-port-range =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 32768-65536 >> debug =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 4 >> allocate-wait-time =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 3600 >> temp-dir =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D /tmp/hod >> >> [ringmaster] >> register =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D True >> stream =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D False >> temp-dir =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D /tmp/hod >> http-port-range =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 8000-9000 >> work-dirs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D /tmp/hod/1,/tm= p/hod/2 >> xrs-port-range =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 32768-65536 >> debug =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 4 >> >> [hodring] >> stream =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D False >> temp-dir =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D /tmp/hod >> register =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D True >> java-home =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D /usr/local/jdk= 1.6.0_02 >> http-port-range =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 8000-9000 >> xrs-port-range =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 32768-65536 >> debug =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 4 >> >> [resource_manager] >> queue =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D dque >> batch-home =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D /usr/local/tor= que-2.3.7 >> id =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D torque >> env-vars =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D >> HOD_PYTHON_HOME=3D/usr/local/python-2.5.5/bin/python >> >> [gridservice-mapred] >> external =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D False >> pkgs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D /usr/loc= al/hadoop-0.20.2 >> tracker_port =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D 8030 >> info_port =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 50080 >> >> [gridservice-hdfs] >> external =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D False >> pkgs =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=3D /usr/loc= al/hadoop-0.20.2 >> fs_port =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 8020 >> info_port =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =3D 50070 >> >> >> Some other useful information: >> Linux 2.6.18-128.7.1.el5 >> Python 2.5.5 >> Twisted 10.0.0 >> zope 3.3.0 >> java version "1.6.0_02" >> >> -- >> Kevin Van Workum, PhD >> Sabalcore Computing Inc. >> Run your code on 500 processors. >> Sign up for a free trial account. >> www.sabalcore.com >> 877-492-8027 ext. 11 >> > --=20 Kevin Van Workum, PhD Sabalcore Computing Inc. Run your code on 500 processors. Sign up for a free trial account. www.sabalcore.com 877-492-8027 ext. 11