Return-Path: Delivered-To: apmail-lucene-hadoop-user-archive@locus.apache.org Received: (qmail 57257 invoked from network); 27 Mar 2007 08:23:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Mar 2007 08:23:23 -0000 Received: (qmail 48252 invoked by uid 500); 27 Mar 2007 08:23:29 -0000 Delivered-To: apmail-lucene-hadoop-user-archive@lucene.apache.org Received: (qmail 48229 invoked by uid 500); 27 Mar 2007 08:23:29 -0000 Mailing-List: contact hadoop-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-user@lucene.apache.org Delivered-To: mailing list hadoop-user@lucene.apache.org Received: (qmail 48220 invoked by uid 99); 27 Mar 2007 08:23:29 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (hermes.apache.org: local policy) Received: from [213.188.134.18] (HELO mailengine02.web2000.activeisp.com) (213.188.134.18) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Mar 2007 01:23:29 -0700 Received: from eak.trank.no (unverified [81.93.163.38]) by webmail.activeisp.com (Rockliffe SMTPRA 6.1.22) with ESMTP id for ; Tue, 27 Mar 2007 10:25:23 +0200 From: Espen Amble Kolstad Organization: T-Rank To: hadoop-user@lucene.apache.org Subject: Re: Decommission in hadoop-0.12.2 Date: Tue, 27 Mar 2007 10:22:36 +0200 User-Agent: KMail/1.9.6 References: <200703270849.56606.espen@trank.no> <200703270951.00427.espen@trank.no> <4608CFDD.7010503@getopt.org> In-Reply-To: <4608CFDD.7010503@getopt.org> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200703271022.36446.espen@trank.no> X-Virus-Checked: Checked by ClamAV on apache.org On Tuesday 27 March 2007 10:03:41 Andrzej Bialecki wrote: > Espen Amble Kolstad wrote: > > On Tuesday 27 March 2007 09:27:58 Andrzej Bialecki wrote: > >> Espen Amble Kolstad wrote: > >>> Hi, > >>> > >>> I'm trying to decommission a node with hadoop-0.12.2. > >>> I use the property dfs.hosts.exclude, since the command haddop > >>> dfsadmin -decommission seems to be gone. > >>> I then start the cluster with an emtpy exclude-file, add the name of > >>> the node to decommission and run hadoop dfsadmin -refreshNodes. > >>> The log then says: > >>> 2007-03-27 08:42:59,168 INFO fs.FSNamesystem - Start Decommissioning > >>> node 81.93.168.215:50010 > >>> > >>> But nothing happens. > >>> I've left it in this state over night, but still nothing. > >>> > >>> Am I missing something ? > >> > >> What does the dfsadmin -report says about this node? It takes time to > >> ensure that all blocks are replicated from this node to other nodes. > > > > Hi, > > > > dfsadmin -report: > > > > Name: 81.93.168.215:50010 > > State : Decommission in progress > > Total raw bytes: 1438871724032 (1.30 TB) > > Used raw bytes: 270070137404 (0.24 TB) > > % used: 18.76% > > Last contact: Tue Mar 27 09:42:26 CEST 2007 > > > > In the web-interface (dfshealth.jsp) no change can be seen in % or the > > number of blocks on any of the nodes. > > You may want to check the datanode logs if there are any exceptions > reported.. Also, things are taking time - I believe the datanodes > synchronize their block information piecewise, so that they don't > overwhelm the namenode. It surely takes some time in my case, even > though the disk size per node that I use is much smaller. > > Regarding the number of blocks - if all blocks are already present on > other datanodes at least in 1 copy, then no new blocks need to be > created - I'm not sure when the namenode decides that these blocks > should get additional replicas: during the decommissioning or after it's > complete ... > > It would be nice to have a progress meter on the decommissioning > process, though. Hi, I have replication set to 1 for the whole hdfs, so there should not be any other replicas. I can't find any errors in my logs. And the namenode-log looks like this (at INFO level): 2007-03-27 08:42:59,168 INFO fs.FSNamesystem - Start Decommissioning node 81.93.168.215:50010 2007-03-27 09:04:48,831 INFO fs.FSNamesystem - Roll Edit Log 2007-03-27 09:04:49,500 INFO fs.FSNamesystem - Roll FSImage 2007-03-27 10:04:50,221 INFO fs.FSNamesystem - Roll Edit Log 2007-03-27 10:04:50,360 INFO fs.FSNamesystem - Roll FSImage - Espen