hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Colin McCabe <cmcc...@alumni.cmu.edu>
Subject Re: How to remove three disks from three different nodes in a ten node cluster in less than an hour without losing replicas?
Date Mon, 04 Feb 2013 22:53:58 GMT
It sounds like what you would like is a way to decommission just one
storage directory on the DataNode. We don't currently support that.

You might be able to get something approaching this result with
"chmod 000 $storage_directory_root".  That would at least prevent new
blocks from being created on the disk which you don't trust any more.  It
would also cause the existing blocks to be re-replicated when the
DirectoryScanner re-ran and noticed it couldn't get to them.  Note that I
haven't actually tested the chmod solution, though, so your milage may vary.


On Wed, Jan 30, 2013 at 10:34 PM, Stack <stack@duboce.net> wrote:

> Here is a little puzzle.
> An admin works for a cash-strapped, popular web shop.  At the datacenter
> she has a ten node cluster that is heavily used.  It runs hot all day long
> and decommissioning a node with its background replicating of 12 disks
> worth of data messes up the work load she has on top of it and makes her
> clients very unhappy.  Replicating the data of one node takes at least an
> hour.  This cluster has three bad disks in three different nodes
> (replication factor is 3).  The admin lives an hour from the datacenter.
>  She can't afford a cage monkey and so must replace the disks herself.
> If she left home at 2pm and had to be back by 6pm before the kids came
> home from school, how would she replace the three disks without for sure
> losing a replica?
> Is the only answer remove one, wait on clean fsck run, remove the next one?
> Thanks,
> St.Ack

View raw message