Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of sudhakara.st@gmail.com
 designates 209.85.212.63 as permitted sender)
Date: Tue, 12 Feb 2013 07:30:57 -0800 (PST)
From: sudhakara st <sudhakara.st@gmail.com>
To: cdh-user@cloudera.org
Cc: user <user@hadoop.apache.org>
Message-Id: <fe3e8790-baef-4f3b-b4ed-3084ff7ac547@cloudera.org>
In-Reply-To: 
 <CAJzooYdEQNqiUp99Zo_2wke+BOebYTZv=r=PaYqTcN=6cUu91g@mail.gmail.com>
References: 
 <CAJzooYdEQNqiUp99Zo_2wke+BOebYTZv=r=PaYqTcN=6cUu91g@mail.gmail.com>
Subject: Re: Decommissioning Nodes in Production Cluster.
MIME-Version: 1.0
Content-Type: multipart/mixed;
	boundary="----=_Part_614_14055976.1360683057177"

------=_Part_614_14055976.1360683057177
Content-Type: multipart/alternative;
	boundary="----=_Part_615_15594027.1360683057178"

------=_Part_615_15594027.1360683057178
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

The decommissioning process is controlled by an exclude file, which for=20
HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*m=
apred.hosts.exclude
* property. In most cases, there is one shared file,referred to as the=20
exclude file.This  exclude file name should be specified as a configuration=
=20
parameter *dfs.hosts.exclude *in the name node start up.


To remove nodes from the cluster:
1. Add the network addresses of the nodes to be decommissioned to the=20
exclude file.

2. Restart the MapReduce cluster to stop the tasktrackers on the nodes bein=
g
decommissioned.
3. Update the namenode with the new set of permitted datanodes, with this
command:
% hadoop dfsadmin -refreshNodes
4. Go to the web UI and check whether the admin state has changed to=20
=E2=80=9CDecommission
In Progress=E2=80=9D for the datanodes being decommissioned. They will star=
t copying
their blocks to other datanodes in the cluster.

5. When all the datanodes report their state as =E2=80=9CDecommissioned,=E2=
=80=9D then all=20
the blocks
have been replicated. Shut down the decommissioned nodes.
6. Remove the nodes from the include file, and run:
% hadoop dfsadmin -refreshNodes
7. Remove the nodes from the slaves file.

 Decommission data nodes in small percentage(less than 2%) at time don't=20
cause any effect on cluster. But it better to pause MR-Jobs before you=20
triggering Decommission to ensure  no task running in decommissioning=20
subjected nodes.
 If very small percentage of task running in the decommissioning node it=20
can submit to other task tracker, but percentage queued jobs  larger then=
=20
threshold  then there is chance of job failure. Once triggering the 'hadoop=
=20
dfsadmin -refreshNodes' command and decommission started, you can resume=20
the MR jobs.

*Source : The Definitive Guide [Tom White]*


On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan=
=20
wrote:
>
> Hi Guys,
>
> It's recommenced do with removing one the datanode in production cluster.
> via Decommission the particular datanode. please guide me.
> =20
> -Dhanasekaran,
>
> Did I learn something today? If not, I wasted it.
> =20

------=_Part_615_15594027.1360683057178
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: quoted-printable

The decommissioning process is controlled by an exclude file, which for HDF=
S is set by the<b> dfs.hosts.exclude</b> property, and for MapReduce by the=
<b> mapred.hosts.exclude</b> property. In most cases, there is one shared f=
ile,referred to as the exclude file.This&nbsp; exclude file name should be =
specified as a configuration parameter <b>dfs.hosts.exclude </b>in the name=
 node start up.<br><br><br>To remove nodes from the cluster:<br>1. Add the =
network addresses of the nodes to be decommissioned to the exclude file.<br=
><br>2. Restart the MapReduce cluster to stop the tasktrackers on the nodes=
 being<br>decommissioned.<br>3. Update the namenode with the new set of per=
mitted datanodes, with this<br>command:<br>% hadoop dfsadmin -refreshNodes<=
br>4. Go to the web UI and check whether the admin state has changed to =E2=
=80=9CDecommission<br>In Progress=E2=80=9D for the datanodes being decommis=
sioned. They will start copying<br>their blocks to other datanodes in the c=
luster.<br><br>5. When all the datanodes report their state as =E2=80=9CDec=
ommissioned,=E2=80=9D then all the blocks<br>have been replicated. Shut dow=
n the decommissioned nodes.<br>6. Remove the nodes from the include file, a=
nd run:<br>% hadoop dfsadmin -refreshNodes<br>7. Remove the nodes from the =
slaves file.<br><br>&nbsp;Decommission data nodes in small percentage(less =
than 2%) at time don't cause any effect on cluster. But it better to pause =
MR-Jobs before you triggering Decommission to ensure&nbsp; no task running =
in decommissioning subjected nodes.<br>&nbsp;If very small percentage of ta=
sk running in the decommissioning node it can submit to other task tracker,=
 but percentage queued jobs&nbsp; larger then threshold&nbsp; then there is=
 chance of job failure. Once triggering the 'hadoop dfsadmin -refreshNodes'=
 command and decommission started, you can resume the MR jobs.<br><br><b>So=
urce : <span class=3D"st">The <em>Definitive Guide</em> [Tom White]</span><=
/b><br><br><br><br>On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhana=
sekaran Anbalagan wrote:<blockquote class=3D"gmail_quote" style=3D"margin: =
0;margin-left: 0.8ex;border-left: 1px #ccc solid;padding-left: 1ex;"><div d=
ir=3D"ltr">Hi Guys,<div><br></div><div>It's&nbsp;recommenced do with removi=
ng one the datanode in production cluster.</div><div>via Decommission the&n=
bsp;particular&nbsp;datanode. please guide me.</div><div>&nbsp;</div><div>

-Dhanasekaran,</div><div><br></div><div><div>Did I learn something today? I=
f not, I wasted it.</div>
</div></div>
</blockquote>
------=_Part_615_15594027.1360683057178--

------=_Part_614_14055976.1360683057177--