Mailing-List: contact hdfs-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of saint.ack@gmail.com
 designates 209.85.214.49 as permitted sender)
MIME-Version: 1.0
Sender: saint.ack@gmail.com
Date: Wed, 30 Jan 2013 22:34:57 -0800
Message-ID: 
 <CADcMMgH1pfvz1vi6a=xX8kafOT=M8gQP=CiR85zqu5gQyN47Vg@mail.gmail.com>
Subject: How to remove three disks from three different nodes in a ten node
 cluster in less than an hour without losing replicas?
From: Stack <stack@duboce.net>
To: hdfs-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=0015174c1d845b374204d48fd36f

--0015174c1d845b374204d48fd36f
Content-Type: text/plain; charset=UTF-8

Here is a little puzzle.

An admin works for a cash-strapped, popular web shop.  At the datacenter
she has a ten node cluster that is heavily used.  It runs hot all day long
and decommissioning a node with its background replicating of 12 disks
worth of data messes up the work load she has on top of it and makes her
clients very unhappy.  Replicating the data of one node takes at least an
hour.  This cluster has three bad disks in three different nodes
(replication factor is 3).  The admin lives an hour from the datacenter.
 She can't afford a cage monkey and so must replace the disks herself.

If she left home at 2pm and had to be back by 6pm before the kids came home
from school, how would she replace the three disks without for sure losing
a replica?

Is the only answer remove one, wait on clean fsck run, remove the next one?

Thanks,
St.Ack

--0015174c1d845b374204d48fd36f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div style>Here is a little puzzle.</div><div style><br></=
div>An admin works for a cash-strapped, popular web shop. =C2=A0At the data=
center she has a ten node cluster that is heavily used. =C2=A0It runs hot a=
ll day long and decommissioning a node with its background replicating of 1=
2 disks worth of data messes up the work load she has on top of it and make=
s her clients very unhappy. =C2=A0Replicating the data of one node takes at=
 least an hour. =C2=A0This cluster has three bad disks in three different n=
odes (replication factor is 3). =C2=A0The admin lives an hour from the data=
center. =C2=A0She can&#39;t afford a cage monkey and so must replace the di=
sks herself.<div>
<br></div><div>If she left home at 2pm and had to be back by 6pm before the=
 kids came home from school, how would she replace the three disks without =
for sure losing a replica?</div><div><br></div><div style>Is the only answe=
r remove one, wait on clean fsck run, remove the next one?</div>
<div style><br></div><div style>Thanks,</div><div style>St.Ack</div><div><b=
r><div><br></div><div><br></div></div></div>

--0015174c1d845b374204d48fd36f--