Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 27676 invoked from network); 8 Dec 2010 05:44:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Dec 2010 05:44:43 -0000 Received: (qmail 44700 invoked by uid 500); 8 Dec 2010 05:44:40 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 44598 invoked by uid 500); 8 Dec 2010 05:44:40 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 44590 invoked by uid 99); 8 Dec 2010 05:44:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 05:44:40 +0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=RCVD_NUMERIC_HELO,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Sudhir.Vallamkondu@icrossing.com designates 38.101.13.26 as permitted sender) Received: from [38.101.13.26] (HELO scottsdale.email.icrossing.com) (38.101.13.26) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Dec 2010 05:44:33 +0000 Received: from 38.101.13.2 ([38.101.13.2]) by santana.ic.aiall ([10.13.52.26]) via Exchange Front-End Server mail.icrossing.com ([10.13.52.24]) with Microsoft Exchange Server HTTP-DAV ; Wed, 8 Dec 2010 05:44:12 +0000 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.4325 User-Agent: Microsoft-Entourage/12.27.0.100910 Date: Tue, 07 Dec 2010 22:44:13 -0700 Subject: Re: Help: 1) Hadoop processes still are running after we stopped > hadoop.2) How to exclude a dead node? From: Sudhir Vallamkondu To: Message-ID: Thread-Topic: Help: 1) Hadoop processes still are running after we stopped > hadoop.2) How to exclude a dead node? thread-index: AcuWi6HWtj+Lu56QR0GSiQPD8e9D1QAD0+Yg In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable There is a proper decommissioning process to remove dead nodes. See the = FAQ link here:=20 http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_= by_ taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F For a fact $HADOOP_HOME/conf/slaves is not used by the name node to keep track of datanodes/tasktracker. It is merely used by the stop/start = hadoop scripts to know which nodes to start datanode / tasktracker services. Similarly there is confusion regarding understanding the $HADOOP_HOME/conf/master file. That file contains the details of the = machine where secondary name node is running, not the name node/job tracker. With regards to not all java/hadoop processes getting killed, this may = be happening due to hadoop loosing track of pid files. By default the pid = files are configured to be created in the /tmp directory. If these pid files = get deleted then stop/start scripts cannot detect running hadoop processes. = I suggest changing location of pid files to a persistent location like /var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on configuring the PID location - Sudhir On 12/7/10 5:07 PM, "common-user-digest-help@hadoop.apache.org" wrote: > From: Tali K > Date: Tue, 7 Dec 2010 10:40:16 -0800 > To: > Subject: Help: 1) Hadoop processes still are running after we stopped > hadoop.2) How to exclude a dead node? >=20 >=20 > 1)When I stopped hadoop, we checked all the nodes and found that 2 or = 3 > java/hadoop processes were still running on each node. So we went to = each > node and did a 'killall java' - in some cases I had to do 'killall -9 = java'. > My question : why is is this happening and what would be = recommendations , how > to make sure that there is no hadoop processes running after I stopped = hadoop > with stop-all.sh? > =20 > 2) Also we have a dead node. We removed this node from > $HADOOP_HOME/conf/slaves. This file is supposed to tell the namenode > which machines are supposed to be datanodes/tasktrackers. > We started hadoop again, and were surprised to see a dead node in = hadoop > 'report' ("$HADOOP_HOME/bin/hadoop dfsadmin -report|less") > It is only after blocking a deadnode and restarting hadoop, deadnode = no longer > showed up in hreport. > Any recommendations, how to deal with dead nodes? iCrossing Privileged and Confidential Information This email message is for the sole use of the intended recipient(s) and = may contain confidential and privileged information of iCrossing. Any = unauthorized review, use, disclosure or distribution is prohibited. If = you are not the intended recipient, please contact the sender by reply = email and destroy all copies of the original message.