hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David B. Ritch" <david.ri...@gmail.com>
Subject Re: Decommissioning Individual Disks
Date Fri, 11 Sep 2009 03:06:49 GMT
Thank you both.  That's what we did today.  It seems fairly reasonable
when a node has a few disks, say 3-5.  However, at some sites, with
larger nodes, it seems more awkward.  When a node has a dozen or more
disks (as used in the larger terasort benchmarks), migrating the data
off all the disks is likely to be more of an issue.  I hope that there
is a better solution to this before my client moves to much larger
nodes!  ;-)


On 9/10/2009 10:07 PM, Amandeep Khurana wrote:
> I think decommissioning the node and replacing the disk is a cleaner
> approach. That's what I'd recommend doing as well..
> On 9/10/09, Alex Loddengaard <alex@cloudera.com> wrote:
>> Hi David,
>> Unfortunately there's really no way to do what you're hoping to do in an
>> automatic way.  You can move the block files (including their .meta files)
>> from one disk to another.  Do this when the datanode daemon is stopped.
>>  Then, when you start the datanode daemon, it will scan dfs.data.dir and be
>> totally happy if blocks have moved hard drives.  I've never tried to do this
>> myself, but others on the list have suggested this technique for "balancing
>> disks."
>> You could also change your process around a little.  It's not too crazy to
>> decommission an entire node, replace one of its disks, then bring it back
>> into the cluster.  Seems to me that this is a much saner approach: your ops
>> team will tell you which disk needs replacing.  You decommission the node,
>> they replace the disk, you add the node back to the pool.  Your call I
>> guess, though.
>> Hope this was helpful.
>> Alex
>> On Thu, Sep 10, 2009 at 6:30 PM, David B. Ritch
>> <david.ritch@gmail.com>wrote:
>>> What do you do with the data on a failing disk when you replace it?
>>> Our support person comes in occasionally, and often replaces several
>>> disks when he does.  These are disks that have not yet failed, but
>>> firmware indicates that failure is imminent.  We need to be able to
>>> migrate our data off these disks before replacing them.  If we were
>>> replacing entire servers, we would decommission them - but we have 3
>>> data disks per server.  If we were replacing one disk at a time, we
>>> wouldn't worry about it (because of redundancy).  We can decommission
>>> the servers, but moving all the data off of all their disks is a waste.
>>> What's the best way to handle this?
>>> Thanks!
>>> David

View raw message