hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kamil RogoĊ„ <kamil.ro...@cantstopgames.com>
Subject Under-Replicated Blocks
Date Thu, 16 Aug 2012 09:06:23 GMT
Hello

Sometimes I get small glitch with replication between hdfs nodes. 
Datanodes are online, but one of them is hanging.

Default replication factor:   3
Average block replication:    2.9940512
Corrupt blocks:               0
Missing replicas:             163 (0.19868357 %)
Number of data-nodes:         3
Number of racks:              1

Node      Last Contact      Admin State      Blocks  Failed Volumes
hdfs1                0       In Service 27471                   0
hdfs2                2       In Service 27305                   0
hdfs3                2       In Service       27401       0


As you see number of blocks is not equal.

Generaly datanodes are working:

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Scheduling block 
blk_8238726012137032582_1388695 file 
/home/hdfs/3/data/current/subdir62/subdir31/blk_8238726012137032582 for 
deletion
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deleted block 
blk_8238726012137032582_1388695 at file 
/home/hdfs/3/data/current/subdir62/subdir31/blk_8238726012137032582

INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Starting 
asynchronous block report scan
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Finished 
asynchronous block report scan in 72ms
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconciled 
asynchronous block report against current state in 8 ms
INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 
27401 blocks got processed in 72 msecs
INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: 
Verification succeeded for blk_-8356262741701254215_854916

On namenode logs I only see many lines like this:

WARN org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Not able to 
place enough replicas, still in need of 1(excluded: 192.168.0.101:50010, 
192.168.0.102:50010, 192.168.0.103:50010)


Restarting datanodes helps, but what is the reason?

Thanks for any tips,
k.

Mime
View raw message