Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 21247 invoked from network); 8 Dec 2010 15:41:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Dec 2010 15:41:37 -0000 Received: (qmail 51192 invoked by uid 500); 8 Dec 2010 15:41:34 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 50881 invoked by uid 500); 8 Dec 2010 15:41:34 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 50873 invoked by uid 99); 8 Dec 2010 15:41:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 15:41:33 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=RCVD_NUMERIC_HELO,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Sudhir.Vallamkondu@icrossing.com designates 38.101.13.26 as permitted sender) Received: from [38.101.13.26] (HELO scottsdale.email.icrossing.com) (38.101.13.26) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 15:41:28 +0000 Received: from 70.190.29.53 ([70.190.29.53]) by santana.ic.aiall ([10.13.52.26]) via Exchange Front-End Server mail.icrossing.com ([10.13.52.24]) with Microsoft Exchange Server HTTP-DAV ; Wed, 8 Dec 2010 15:41:07 +0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.4325 User-Agent: Microsoft-Entourage/12.27.0.100910 Date: Wed, 08 Dec 2010 08:41:09 -0700 Subject: Re: Help: 1) Hadoop processes still are running after we stopped hadoop.2) How to exclude a dead node? From: Sudhir Vallamkondu To: Message-ID: Thread-Topic: Help: 1) Hadoop processes still are running after we stopped hadoop.2) How to exclude a dead node? thread-index: AcuW7lVwQiSb/DxHQ029r2kSYN+GfQ== In-Reply-To: <1291791594.16764.ezmlm@hadoop.apache.org> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable Yes. Reference: I couldn't find a apache hadoop page describing this but see below link=20 http://serverfault.com/questions/115148/hadoop-slaves-file-necessary On 12/7/10 11:59 PM, "common-user-digest-help@hadoop.apache.org" wrote: > From: li ping > Date: Wed, 8 Dec 2010 14:17:40 +0800 > To: > Subject: Re: Help: 1) Hadoop processes still are running after we = stopped > > hadoop.2) How to exclude a dead node? >=20 > I am not sure I have fully understand your post. > You mean the conf/slaves only be used for stop/start script to start = or stop > the datanode/tasktracker? > And the conf/master only contains the information about the secondary > namenode? >=20 > Thanks >=20 > On Wed, Dec 8, 2010 at 1:44 PM, Sudhir Vallamkondu < > Sudhir.Vallamkondu@icrossing.com> wrote: >=20 >> There is a proper decommissioning process to remove dead nodes. See = the FAQ >> link here: >>=20 >> = http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_= by_ >> taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F >>=20 >> For a fact $HADOOP_HOME/conf/slaves is not used by the name node to = keep >> track of datanodes/tasktracker. It is merely used by the stop/start = hadoop >> scripts to know which nodes to start datanode / tasktracker services. >> Similarly there is confusion regarding understanding the >> $HADOOP_HOME/conf/master file. That file contains the details of the >> machine >> where secondary name node is running, not the name node/job tracker. >>=20 >> With regards to not all java/hadoop processes getting killed, this = may be >> happening due to hadoop loosing track of pid files. By default the = pid >> files >> are configured to be created in the /tmp directory. If these pid = files get >> deleted then stop/start scripts cannot detect running hadoop = processes. I >> suggest changing location of pid files to a persistent location like >> /var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on >> configuring the PID location >>=20 >> - Sudhir >>=20 >>=20 >> On 12/7/10 5:07 PM, "common-user-digest-help@hadoop.apache.org" >> wrote: >>=20 >>> From: Tali K >>> Date: Tue, 7 Dec 2010 10:40:16 -0800 >>> To: >>> Subject: Help: 1) Hadoop processes still are running after we = stopped >>> hadoop.2) How to exclude a dead node? >>>=20 >>>=20 >>> 1)When I stopped hadoop, we checked all the nodes and found that 2 = or 3 >>> java/hadoop processes were still running on each node. So we went = to >> each >>> node and did a 'killall java' - in some cases I had to do 'killall = -9 >> java'. >>> My question : why is is this happening and what would be = recommendations >> , how >>> to make sure that there is no hadoop processes running after I = stopped >> hadoop >>> with stop-all.sh? >>>=20 >>> 2) Also we have a dead node. We removed this node from >>> $HADOOP_HOME/conf/slaves. This file is supposed to tell the = namenode >>> which machines are supposed to be datanodes/tasktrackers. >>> We started hadoop again, and were surprised to see a dead node in >> hadoop >>> 'report' ("$HADOOP_HOME/bin/hadoop dfsadmin -report|less") >>> It is only after blocking a deadnode and restarting hadoop, deadnode = no >> longer >>> showed up in hreport. >>> Any recommendations, how to deal with dead nodes? >>=20 iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and = may contain confidential and privileged information of iCrossing. Any = unauthorized review, use, disclosure or distribution is prohibited. If = you are not the intended recipient, please contact the sender by reply = email and destroy all copies of the original message.