Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13678DAE4 for ; Tue, 12 Feb 2013 18:21:31 +0000 (UTC) Received: (qmail 49127 invoked by uid 500); 12 Feb 2013 18:21:25 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 48939 invoked by uid 500); 12 Feb 2013 18:21:25 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 41816 invoked by uid 99); 12 Feb 2013 15:31:26 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of sudhakara.st@gmail.com designates 209.85.212.63 as permitted sender) X-Received: by 10.49.1.162 with SMTP id 2mr1229356qen.2.1360683057425; Tue, 12 Feb 2013 07:30:57 -0800 (PST) X-Google-Doc-Id: 99019890e918cbf0 X-Google-Web-Client: true Date: Tue, 12 Feb 2013 07:30:57 -0800 (PST) From: sudhakara st To: cdh-user@cloudera.org Cc: user Message-Id: In-Reply-To: References: Subject: Re: Decommissioning Nodes in Production Cluster. MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_Part_614_14055976.1360683057177" X-Google-Token: ELHA6YgFktbkvMJWcQo0 X-Google-IP: 198.95.226.40 X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_614_14055976.1360683057177 Content-Type: multipart/alternative; boundary="----=_Part_615_15594027.1360683057178" ------=_Part_615_15594027.1360683057178 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable The decommissioning process is controlled by an exclude file, which for=20 HDFS is set by the* dfs.hosts.exclude* property, and for MapReduce by the*m= apred.hosts.exclude * property. In most cases, there is one shared file,referred to as the=20 exclude file.This exclude file name should be specified as a configuration= =20 parameter *dfs.hosts.exclude *in the name node start up. To remove nodes from the cluster: 1. Add the network addresses of the nodes to be decommissioned to the=20 exclude file. 2. Restart the MapReduce cluster to stop the tasktrackers on the nodes bein= g decommissioned. 3. Update the namenode with the new set of permitted datanodes, with this command: % hadoop dfsadmin -refreshNodes 4. Go to the web UI and check whether the admin state has changed to=20 =E2=80=9CDecommission In Progress=E2=80=9D for the datanodes being decommissioned. They will star= t copying their blocks to other datanodes in the cluster. 5. When all the datanodes report their state as =E2=80=9CDecommissioned,=E2= =80=9D then all=20 the blocks have been replicated. Shut down the decommissioned nodes. 6. Remove the nodes from the include file, and run: % hadoop dfsadmin -refreshNodes 7. Remove the nodes from the slaves file. Decommission data nodes in small percentage(less than 2%) at time don't=20 cause any effect on cluster. But it better to pause MR-Jobs before you=20 triggering Decommission to ensure no task running in decommissioning=20 subjected nodes. If very small percentage of task running in the decommissioning node it=20 can submit to other task tracker, but percentage queued jobs larger then= =20 threshold then there is chance of job failure. Once triggering the 'hadoop= =20 dfsadmin -refreshNodes' command and decommission started, you can resume=20 the MR jobs. *Source : The Definitive Guide [Tom White]* On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhanasekaran Anbalagan= =20 wrote: > > Hi Guys, > > It's recommenced do with removing one the datanode in production cluster. > via Decommission the particular datanode. please guide me. > =20 > -Dhanasekaran, > > Did I learn something today? If not, I wasted it. > =20 ------=_Part_615_15594027.1360683057178 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable The decommissioning process is controlled by an exclude file, which for HDF= S is set by the dfs.hosts.exclude property, and for MapReduce by the= mapred.hosts.exclude property. In most cases, there is one shared f= ile,referred to as the exclude file.This  exclude file name should be = specified as a configuration parameter dfs.hosts.exclude in the name= node start up.


To remove nodes from the cluster:
1. Add the = network addresses of the nodes to be decommissioned to the exclude file.
2. Restart the MapReduce cluster to stop the tasktrackers on the nodes= being
decommissioned.
3. Update the namenode with the new set of per= mitted datanodes, with this
command:
% hadoop dfsadmin -refreshNodes<= br>4. Go to the web UI and check whether the admin state has changed to =E2= =80=9CDecommission
In Progress=E2=80=9D for the datanodes being decommis= sioned. They will start copying
their blocks to other datanodes in the c= luster.

5. When all the datanodes report their state as =E2=80=9CDec= ommissioned,=E2=80=9D then all the blocks
have been replicated. Shut dow= n the decommissioned nodes.
6. Remove the nodes from the include file, a= nd run:
% hadoop dfsadmin -refreshNodes
7. Remove the nodes from the = slaves file.

 Decommission data nodes in small percentage(less = than 2%) at time don't cause any effect on cluster. But it better to pause = MR-Jobs before you triggering Decommission to ensure  no task running = in decommissioning subjected nodes.
 If very small percentage of ta= sk running in the decommissioning node it can submit to other task tracker,= but percentage queued jobs  larger then threshold  then there is= chance of job failure. Once triggering the 'hadoop dfsadmin -refreshNodes'= command and decommission started, you can resume the MR jobs.

So= urce : The Definitive Guide [Tom White]<= /b>



On Tuesday, February 12, 2013 5:20:07 PM UTC+5:30, Dhana= sekaran Anbalagan wrote:
Hi Guys,

It's recommenced do with removi= ng one the datanode in production cluster.
via Decommission the&n= bsp;particular datanode. please guide me.
 
-Dhanasekaran,

Did I learn something today? I= f not, I wasted it.
------=_Part_615_15594027.1360683057178-- ------=_Part_614_14055976.1360683057177--