Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <4A434D98.7080701@opera.com>
References: <4A434486.6050608@opera.com>
	 <ac79ea400906250246x32a1c634j1933d0ffffea60ec@mail.gmail.com>
	 <4A434D98.7080701@opera.com>
Date: Thu, 25 Jun 2009 11:29:06 +0100
Message-ID: <ac79ea400906250329o22cf1718j9093b9b05ed4c50b@mail.gmail.com>
Subject: Re: Rebalancing Hadoop Cluster running 15.3
From: Tom White <tom@cloudera.com>
To: core-user@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

You can change the value of hadoop.root.logger in
conf/log4j.properties to change the log level globally. See also the
section "Custom Logging levels" in the same file to set levels on a
per-component basis.

You can also use hadoop daemonlog to set log levels on a temporary
basis (they are reset on restart). I'm not sure if this was in Hadoop
0.15.

Cheers,
Tom

On Thu, Jun 25, 2009 at 11:12 AM, Usman Waheed<usmanw@opera.com> wrote:
> Hi Tom,
>
> Thanks for the trick :).
>
> I tried by setting the replication to 3 in the hadoop-default.xml but then
> the namenode-logfile in /var/log/hadoop started getting full with the
> messages marked in bold:
>
> 2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE*
> SafeModeInfo.leave: Safe mode is OFF.
> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Network topology has 1 racks and 3 datanodes
> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
> UnderReplicatedBlocks has 48545 blocks
> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4602580985572290582 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4602036196619511999 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601863051065326105 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601770656364938220 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,829 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.20.11.44:50010 is added to
> blk_-4601770656364938220
> 2009-06-24 14:39:10,832 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4601706607039808418 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,833 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4601652202073012439 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601470720696217621 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601267705629076611 to datanode(s) 10.20.11.44:50010
> *2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1*
>
> It is a very small cluster with limited disk space. The disk was getting
> full because of all these extra messages there were being written to the
> logfile. Eventually the file system would file up and hadoop hangs.
> This happened when i set the dfs.replication = 3 in the hadoop-default.xml
> and restarted the cluster.
>
> Is there a way i can turn off these WARN messages which are filling up the
> file system. I can run the command on the command line like you advised with
> replication set to 3 and then once done, set it back to 2.
> Currently the dfs.replication is set to 2 in the hadoop-default.xml.
>
> Thanks,
> Usman
>
>> Hi Usman,
>>
>> Before the rebalancer was introduced one trick people used was to
>> increase the replication on all the files in the system, wait for
>> re-replication to complete, then decrease the replication to the
>> original level. You can do this using hadoop fs -setrep.
>>
>> Cheers,
>> Tom
>>
>> On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed<usmanw@opera.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> One of our test clusters is running HADOOP 15.3 with replication level
>>> set
>>> to 2. The datanodes are not balanced at all.
>>>
>>> Datanode_1: 52%
>>> Datanode_2: 82%
>>> Datanode_3: 30%
>>>
>>> 15.3 does not have the rebalancer capability, we are planning to upgrade
>>> but
>>> not for now.
>>>
>>> If i take out Datanode_1 from the cluster (decommission for sometime)
>>> will
>>> HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
>>> Then i can re-introduce Datanode_1 back into the cluster.
>>>
>>> Comments/Suggestions please?
>>>
>>> Thanks,
>>> Usman
>>>
>>>
>
>