hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Allen Wittenauer <awittena...@linkedin.com>
Subject Re: decommissioning nodes help
Date Tue, 13 Jul 2010 22:43:41 GMT

Do you have a topology defined?

On Jul 13, 2010, at 2:46 PM, Arun Ramakrishnan wrote:

> I don't know where the problem was. J-D said somewhere that decommissioning process is
well tested and less likely to have bugs. 
> 
> Anyways, I just resorted to killing 2 nodes. Wait till fsck reports 100% replication
to 3. Kill 2 more nodes ... and so on.
> Worked fine.
> 
> Thanks
> Arun
> 
> -----Original Message-----
> From: Varene Olivier [mailto:varene@echo.fr] 
> Sent: Tuesday, July 13, 2010 1:32 AM
> To: hdfs-user@hadoop.apache.org
> Subject: Re: decommissioning nodes help
> 
> Are your datanodes double attached to the network ?
> If this is the case, you can indeed see your datanodes as double entries.
> You should also check the match between your DNS resolution and the 
> hostname of your datanodes.
> 
> 
> To solve your issue, you can switch off one data node at a time (by 
> killing) the process.
> The master should see that and perform action to maintain the 
> replication level.
> Do it slowly :) (or you might loose some data)
> You can have an idea if the process is over or not if the io on block 
> writing is over
> 
> Cheers
> 
> 
> Arun Ramakrishnan a écrit :
>> That's what I thought.
>> 
>> But,this was what I see in -report for the excluded nodes.
>> 
>> **************
>> ecommission Status : Normal
>> Configured Capacity: 0 (0 KB)
>> DFS Used: 0 (0 KB)
>> Non DFS Used: 0 (0 KB)
>> DFS Remaining: 0(0 KB)
>> DFS Used%: 100%
>> DFS Remaining%: 0%
>> Last contact: Wed Dec 31 16:00:00 PST 1969
>> ***************
>> 
>> In the UI, the excluded nodes show up in both live and dead nodes. And its been several
hours now. The block counts across the nodes is exactly the same.
>> The cluster is not accessed by any clients, its not busy at all.
>> 
>> And I have set dfs.balance.bandwidthPerSec = 2000000 in hdfs-site.xml
>> 
>> Anyway, I think I am lost here. Am just resorting to killing 2 nodes at a time sorta
backwardish strategy. At least I know it works.
>> 
>> Thanks
>> Arun
>> 
>> -----Original Message-----
>> From: Varene Olivier [mailto:varene@echo.fr] 
>> Sent: Friday, July 09, 2010 7:44 AM
>> To: hdfs-user@hadoop.apache.org
>> Subject: Re: decommissioning nodes help
>> 
>> Hello,
>> 
>> you should see in the Web interface
>> 
>> http://yourDatanodeMaster:50070/
>> the status of your node to Decommissioning
>> when done, it is removed from the list of active nodes
>> 
>> With a huge bandwith to perform the sync, the process is very fast
>> so, to answer your other mail, process might be done
>> 
>> you can also this the status of your node via CLI
>> 
>> # hadoop dfsadmin -report
>> 
>> Name : ...
>> Decommission Status : <StatusOfYourNode>
>> ...
>> 
>> 
>> Hope it helps
>> 
>> 
>> 
>> Arun Ramakrishnan a écrit :
>>> Hi guys
>>> 
>>> I am a stuck in my attempt to remove nodes from hdfs.
>>> 
>>> I followed the steps in https://issues.apache.org/jira/browse/HDFS-1125
>>> 
>>> a)     add node to dfs.hosts.exclude
>>> 
>>> b)      dfsadmin -refreshNodes
>>> 
>>> c)      wait for decom to finish
>>> 
>>> d)     remove node from both dfs.hosts and dfs.hosts.exclude
>>> 
>>> 
>>> 
>>> But after step a) and b) how do I know if decommission is complete.
>>> 
>>> I am in the process of decommissioning 6 nodes and don't want to loose 
>>> any blocks ( rep factor is 3 ) with a restart.
>>> 
>>> 
>>> 
>>> I also opened https://issues.apache.org/jira/browse/HDFS-1290 if anyone 
>>> is interested.
>>> 
>>> 
>>> 
>>> Thanks
>>> 
>>> Arun
>>> 
>>> 
>>> 


Mime
View raw message