hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tom White <...@cloudera.com>
Subject Re: Rebalancing Hadoop Cluster running 15.3
Date Thu, 25 Jun 2009 10:29:06 GMT
You can change the value of hadoop.root.logger in
conf/log4j.properties to change the log level globally. See also the
section "Custom Logging levels" in the same file to set levels on a
per-component basis.

You can also use hadoop daemonlog to set log levels on a temporary
basis (they are reset on restart). I'm not sure if this was in Hadoop
0.15.

Cheers,
Tom

On Thu, Jun 25, 2009 at 11:12 AM, Usman Waheed<usmanw@opera.com> wrote:
> Hi Tom,
>
> Thanks for the trick :).
>
> I tried by setting the replication to 3 in the hadoop-default.xml but then
> the namenode-logfile in /var/log/hadoop started getting full with the
> messages marked in bold:
>
> 2009-06-24 14:39:06,338 INFO org.apache.hadoop.dfs.StateChange: STATE*
> SafeModeInfo.leave: Safe mode is OFF.
> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Network topology has 1 racks and 3 datanodes
> 2009-06-24 14:39:06,339 INFO org.apache.hadoop.dfs.StateChange: STATE*
> UnderReplicatedBlocks has 48545 blocks
> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4602580985572290582 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,655 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4602036196619511999 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601863051065326105 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:07,666 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601770656364938220 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,829 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.addStoredBlock: blockMap updated: 10.20.11.44:50010 is added to
> blk_-4601770656364938220
> 2009-06-24 14:39:10,832 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4601706607039808418 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,833 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.45:50010 to replicate
> blk_-4601652202073012439 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601470720696217621 to datanode(s) 10.20.11.44:50010
> 2009-06-24 14:39:10,834 INFO org.apache.hadoop.dfs.StateChange: BLOCK*
> NameSystem.pendingTransfer: ask 10.20.11.43:50010 to replicate
> blk_-4601267705629076611 to datanode(s) 10.20.11.44:50010
> *2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,899 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,900 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1
> 2009-06-24 14:39:13,901 WARN org.apache.hadoop.fs.FSNamesystem: Not able to
> place enough replicas, still in need of 1*
>
> It is a very small cluster with limited disk space. The disk was getting
> full because of all these extra messages there were being written to the
> logfile. Eventually the file system would file up and hadoop hangs.
> This happened when i set the dfs.replication = 3 in the hadoop-default.xml
> and restarted the cluster.
>
> Is there a way i can turn off these WARN messages which are filling up the
> file system. I can run the command on the command line like you advised with
> replication set to 3 and then once done, set it back to 2.
> Currently the dfs.replication is set to 2 in the hadoop-default.xml.
>
> Thanks,
> Usman
>
>> Hi Usman,
>>
>> Before the rebalancer was introduced one trick people used was to
>> increase the replication on all the files in the system, wait for
>> re-replication to complete, then decrease the replication to the
>> original level. You can do this using hadoop fs -setrep.
>>
>> Cheers,
>> Tom
>>
>> On Thu, Jun 25, 2009 at 10:33 AM, Usman Waheed<usmanw@opera.com> wrote:
>>
>>>
>>> Hi,
>>>
>>> One of our test clusters is running HADOOP 15.3 with replication level
>>> set
>>> to 2. The datanodes are not balanced at all.
>>>
>>> Datanode_1: 52%
>>> Datanode_2: 82%
>>> Datanode_3: 30%
>>>
>>> 15.3 does not have the rebalancer capability, we are planning to upgrade
>>> but
>>> not for now.
>>>
>>> If i take out Datanode_1 from the cluster (decommission for sometime)
>>> will
>>> HADOOP balance so that Datanode_2 and Datanode_3 will even out to 56%?
>>> Then i can re-introduce Datanode_1 back into the cluster.
>>>
>>> Comments/Suggestions please?
>>>
>>> Thanks,
>>> Usman
>>>
>>>
>
>

Mime
View raw message