Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of philippe.signoret@gmail.com
 designates 209.85.128.44 as permitted sender)
MIME-Version: 1.0
From: Philippe Signoret <philippe.signoret@gmail.com>
Date: Sat, 25 May 2013 17:13:18 +0200
Message-ID: 
 <CA+JmpbWcYBMRRU+Sz_BfJ71ukoPAOKfFFkOZdhUK5-qa+nwn0g@mail.gmail.com>
Subject: Nicely removing and adding nodes
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a11c3b142283b0f04dd8c5c58

--001a11c3b142283b0f04dd8c5c58
Content-Type: text/plain; charset=ISO-8859-1

I'm running Hadoop 1.1.2 on a cluster with 10ish computers. I would like to
nicely add and remove nodes, both for HDFS and MapReduce.

I've noticed the *datanode* process dies once decomissioning is done, so
this is what I do to remove a node:

   - Add node to *mapred.exclude*
   - Add node to *hdfs.exclude*
   - $ hadoop mradmin -refreshNodes
   - $ hadoop dfsadmin -refreshNodes
   - $ hadoop-daemon.sh stop tasktracker

To add athe node back in (assuming it was removed like above):

   - Remove from *mapred.exclude*
   - Remove from *hdfs.exclude*
   - $ hadoop mradmin -refreshNodes
   - $ hadoop dfsadmin -refreshNodes
   - $ hadoop-daemon.sh start tasktracker
   - $ hadoop-daemon.sh start datanode

Is this the correct way to scale up and down "nicely"?

By "nicely", I mean without data loss, and without stopping tasks running
on the nodes that I'm removing. (I.e. I'm assuming that *$ hadoop-daemon.sh
stop tasktracker* lets the tasktracker finish any currently running tasks
before dying).

Thanks,
Philippe

--001a11c3b142283b0f04dd8c5c58
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I&#39;m running Hadoop 1.1.2 on a cluster with 10ish compu=
ters. I would like to nicely add and remove nodes, both for HDFS and MapRed=
uce.<div style><br></div><div style>I&#39;ve noticed the=A0<b>datanode</b>=
=A0process dies once decomissioning is done, so this is what I do to remove=
 a node:</div>

<div style><ul style><li style>Add node to <b>mapred.exclude</b></li><li st=
yle>Add node to <b>hdfs.exclude</b></li><li>$ hadoop mradmin -refreshNodes<=
/li><li>$ hadoop dfsadmin -refreshNodes</li><li style>$ hadoop-daemon.sh st=
op tasktracker</li>

</ul></div><div style>To add athe node back in (assuming it was removed lik=
e above):</div><div style><div style><ul style><li style>Remove from <b>map=
red.exclude</b></li><li style>Remove from <b>hdfs.exclude</b></li><li style=
>

$ hadoop mradmin -refreshNodes</li><li>$ hadoop dfsadmin -refreshNodes</li>=
<li style>$ hadoop-daemon.sh start tasktracker</li><li style>$ hadoop-daemo=
n.sh start datanode</li></ul></div></div><div><div style>Is this the correc=
t way to scale up and down &quot;nicely&quot;?=A0</div>

<div style><br></div><div style>By &quot;nicely&quot;, I mean without data =
loss, and without stopping tasks running on the nodes that I&#39;m removing=
. (I.e. I&#39;m assuming that <b>$ hadoop-daemon.sh stop tasktracker</b> le=
ts the tasktracker finish any currently running tasks before dying).</div>

<div style><br></div><div style>Thanks,</div><div style>Philippe</div></div=
></div>

--001a11c3b142283b0f04dd8c5c58--