Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 55183 invoked from network); 25 Jun 2009 10:29:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 25 Jun 2009 10:29:28 -0000 Received: (qmail 39036 invoked by uid 500); 25 Jun 2009 10:29:37 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 38946 invoked by uid 500); 25 Jun 2009 10:29:37 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 38936 invoked by uid 99); 25 Jun 2009 10:29:37 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 10:29:37 +0000 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.220.220] (HELO mail-fx0-f220.google.com) (209.85.220.220) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Jun 2009 10:29:29 +0000 Received: by fxm20 with SMTP id 20so1135088fxm.29 for ; Thu, 25 Jun 2009 03:29:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.204.119.71 with SMTP id y7mr2337649bkq.24.1245925746313; Thu, 25 Jun 2009 03:29:06 -0700 (PDT) In-Reply-To: <4A434D98.7080701@opera.com> References: <4A434486.6050608@opera.com> <4A434D98.7080701@opera.com> Date: Thu, 25 Jun 2009 11:29:06 +0100 Message-ID: Subject: Re: Rebalancing Hadoop Cluster running 15.3 From: Tom White To: core-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org You can change the value of hadoop.root.logger in conf/log4j.properties to change the log level globally. See also the section "Custom Logging levels" in the same file to set levels on a per-component basis. You can also use hadoop daemonlog to set log levels on a temporary basis (they are reset on restart). I'm not sure if this was in Hadoop 0.15. Cheers, Tom On Thu, Jun 25, 2009 at 11:12 AM, Usman Waheed wrote: > Hi Tom, > > Thanks for the trick :). > > I tried by setting the replication to 3 in the hadoop-default.xml but then > the namenode-logfile in /var/log/hadoop started getting full with the > messages marked in bold: > > 2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE* > SafeModeInfo.leave: Safe mode is OFF. > 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE* > Network topology has 1 racks and 3 datanodes > 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE* > UnderReplicatedBlocks has 48545 blocks > 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate > blk_-4602580985572290582 to datanode(s) 10.20.11.44:50010 > 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate > blk_-4602036196619511999 to datanode(s) 10.20.11.44:50010 > 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate > blk_-4601863051065326105 to datanode(s) 10.20.11.44:50010 > 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate > blk_-4601770656364938220 to datanode(s) 10.20.11.44:50010 > 2009-06-24 14:39:10,829 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.addStoredBlock: blockMap updated: 10.20.11.44:50010 is added to > blk_-4601770656364938220 > 2009-06-24 14:39:10,832 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate > blk_-4601706607039808418 to datanode(s) 10.20.11.44:50010 > 2009-06-24 14:39:10,833 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate > blk_-4601652202073012439 to datanode(s) 10.20.11.44:50010 > 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate > blk_-4601470720696217621 to datanode(s) 10.20.11.44:50010 > 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK* > NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate > blk_-4601267705629076611 to datanode(s) 10.20.11.44:50010 > *2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1 > 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1 > 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1 > 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1 > 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1 > 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1 > 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1 > 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to > place enough replicas, still in need of 1* > > It is a very small cluster with limited disk space. The disk was getting > full because of all these extra messages there were being written to the > logfile. Eventually the file system would file up and hadoop hangs. > This happened when i set the dfs.replication = 3 in the hadoop-default.xml > and restarted the cluster. > > Is there a way i can turn off these WARN messages which are filling up the > file system. I can run the command on the command line like you advised with > replication set to 3 and then once done, set it back to 2. > Currently the dfs.replication is set to 2 in the hadoop-default.xml. > > Thanks, > Usman > >> Hi Usman, >> >> Before the rebalancer was introduced one trick people used was to >> increase the replication on all the files in the system, wait for >> re-replication to complete, then decrease the replication to the >> original level. You can do this using hadoop fs -setrep. >> >> Cheers, >> Tom >> >> On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed wrote: >> >>> >>> Hi, >>> >>> One of our test clusters is running HADOOP 15.3 with replication level >>> set >>> to 2. The datanodes are not balanced at all. >>> >>> Datanode_1: 52% >>> Datanode_2: 82% >>> Datanode_3: 30% >>> >>> 15.3 does not have the rebalancer capability, we are planning to upgrade >>> but >>> not for now. >>> >>> If i take out Datanode_1 from the cluster (decommission for sometime) >>> will >>> HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%? >>> Then i can re-introduce Datanode_1 back into the cluster. >>> >>> Comments/Suggestions please? >>> >>> Thanks, >>> Usman >>> >>> > >