Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of
 Sudhir.Vallamkondu@icrossing.com designates 38.101.13.26 as permitted sender)
User-Agent: Microsoft-Entourage/12.27.0.100910
Date: Tue, 07 Dec 2010 22:44:13 -0700
Subject: Re: Help: 1) Hadoop processes still are running after we stopped >
 hadoop.2)  How to exclude a dead node?
From: Sudhir Vallamkondu <Sudhir.Vallamkondu@icrossing.com>
To: <common-user@hadoop.apache.org>
Message-ID: <C924693D.4FE77%Sudhir.Vallamkondu@icrossing.com>
Thread-Topic: Help: 1) Hadoop processes still are running after we stopped >
 hadoop.2)  How to exclude a dead node?
thread-index: AcuWi6HWtj+Lu56QR0GSiQPD8e9D1QAD0+Yg
In-Reply-To: <C9244F8D.4FE56%Sudhir.Vallamkondu@icrossing.com>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable

There is a proper decommissioning process to remove dead nodes. See the =
FAQ
link here:=20
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_=
by_
taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

For a fact $HADOOP_HOME/conf/slaves is not used by the name node to keep
track of datanodes/tasktracker. It is merely used by the stop/start =
hadoop
scripts to know which nodes to start datanode / tasktracker services.
Similarly there is confusion regarding understanding the
$HADOOP_HOME/conf/master file. That file contains the details of the =
machine
where secondary name node is running, not the name node/job tracker.

With regards to not all java/hadoop processes getting killed, this may =
be
happening due to hadoop loosing track of pid files. By default the pid =
files
are configured to be created in the /tmp directory. If these pid files =
get
deleted then stop/start scripts cannot detect running hadoop processes. =
I
suggest changing location of pid files to a persistent location like
/var/hadoop/. The $HADOOP_HOME/conf/hadoop-env.sh file has details on
configuring the PID location

- Sudhir


On 12/7/10 5:07 PM, "common-user-digest-help@hadoop.apache.org"
<common-user-digest-help@hadoop.apache.org> wrote:

> From: Tali K <ncherryus@hotmail.com>
> Date: Tue, 7 Dec 2010 10:40:16 -0800
> To: <core-user@hadoop.apache.org>
> Subject: Help: 1) Hadoop processes still are running after we stopped
> hadoop.2)  How to exclude a dead node?
>=20
>=20
> 1)When I stopped hadoop, we checked all the nodes and found that 2 or =
3
> java/hadoop processes were still running on each node.  So we went to =
each
> node and did a 'killall java' - in some cases I had to do 'killall -9 =
java'.
> My question : why is is this happening and what would be =
recommendations , how
> to make sure that there is no hadoop processes running after I stopped =
hadoop
> with stop-all.sh?
> =20
> 2) Also we have a dead node. We  removed this node  from
> $HADOOP_HOME/conf/slaves.  This file is supposed to tell the namenode
>  which machines are supposed to be datanodes/tasktrackers.
> We  started hadoop again, and were surprised to see a dead node in  =
hadoop
> 'report' ("$HADOOP_HOME/bin/hadoop dfsadmin -report|less")
> It is only after blocking a deadnode and restarting hadoop, deadnode =
no longer
> showed up in hreport.
> Any recommendations, how to deal with dead nodes?


iCrossing Privileged and Confidential Information
This email message is for the sole use of the intended recipient(s) and =
may contain confidential and privileged information of iCrossing. Any =
unauthorized review, use, disclosure or distribution is prohibited. If =
you are not the intended recipient, please contact the sender by reply =
email and destroy all copies of the original message.