hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Seigel <ja...@tynt.com>
Subject Re: decommissioning node woes
Date Fri, 18 Mar 2011 16:08:18 GMT
Just a note.  If you just shut the node off, the blocks will replicate faster.

James.


On 2011-03-18, at 10:03 AM, Ted Dunning wrote:

> If nobody else more qualified is willing to jump in, I can at least provide
> some pointers.
> 
> What you describe is a bit surprising.  I have zero experience with any 0.21
> version, but decommissioning was working well
> in much older versions, so this would be a surprising regression.
> 
> The observations you have aren't all inconsistent with how decommissioning
> should work.  The fact that your nodes look up
> after starting the decommissioning isn't so strange.  The idea is that no
> new data will be put on the node, nor should it be
> counted as a replica, but it will help in reading data.
> 
> So that isn't such a big worry.
> 
> The fact that it takes forever and a day, however, is a big worry.  I cannot
> provide any help there just off hand.
> 
> What happens when a datanode goes down?  Do you see under-replicated files?
> Does the number of such files decrease over time?
> 
> On Fri, Mar 18, 2011 at 4:23 AM, Rita <rmorgan466@gmail.com> wrote:
> 
>> Any help?
>> 
>> 
>> On Wed, Mar 16, 2011 at 9:36 PM, Rita <rmorgan466@gmail.com> wrote:
>> 
>>> Hello,
>>> 
>>> I have been struggling with decommissioning data  nodes. I have a 50+
>> data
>>> node cluster (no MR) with each server holding about 2TB of storage. I
>> split
>>> the nodes into 2 racks.
>>> 
>>> 
>>> I edit the 'exclude' file and then do a -refreshNodes. I see the node
>>> immediate in 'Decommiosied node' and I also see it as a 'live' node!
>>> Eventhough I wait 24+ hours its still like this. I am suspecting its a
>> bug
>>> in my version.  The data node process is still running on the node I am
>>> trying to decommission. So, sometimes I kill -9 the process and I see the
>>> 'under replicated' blocks...this can't be the normal procedure.
>>> 
>>> There were even times that I had corrupt blocks because I was impatient
>> --
>>> waited 24-34 hours
>>> 
>>> I am using 23 August, 2010: release 0.21.0 <
>> http://hadoop.apache.org/hdfs/releases.html#23+August%2C+2010%3A+release+0.21.0+available
>>> 
>>> version.
>>> 
>>> Is this a known bug? Is there anything else I need to do to decommission
>> a
>>> node?
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> --
>>> --- Get your facts first, then you can distort them as you please.--
>>> 
>> 
>> 
>> 
>> --
>> --- Get your facts first, then you can distort them as you please.--
>> 


Mime
View raw message