hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tuan Nguyen <tua...@gmail.com>
Subject DataNode stop reclaim the deleting block under heavy write
Date Sat, 20 Mar 2010 11:01:19 GMT
Hi,

We are running stress test to evaluate the hbase.  The test  run fine and
complete. But we have a small problem with one node.  Here is our
configuration and problems:

1. We have 1 master and 4 slaves. the master is used for both namenode and
hbase master server.  The slaves are used for both datanode and region
server.
2.  We have set  xceivier to 8192 and enable the lzo compression.
3. From another machine,  we create 8 threads to write the data into the
cluster,  each record is about 5kb to 100kb.
4. The test run fine for  the first 2 hours - 3 hours, but then one of node
 get the following warning:

2010-03-19 20:26:22,814 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Deleting block
blk_9088042710721149043_145344 file
/mnt/moom/hadoop/0.20.1/dfs/data/current/subdir6/subdir33/blk_90880427107211490432010-03-19
20:26:22,846
WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Error processing
datanode Commandjava.io.IOException: Error in deleting blocks.
at
org.apache.hadoop.hdfs.server.datanode.FSDataset.invalidate(FSDataset.java:1361)

 at
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:868)

at
org.apache.hadoop.hdfs.server.datanode.DataNode.processCommand(DataNode.java:830)

at
org.apache.hadoop.hdfs.server.datanode.DataNode.offerService(DataNode.java:710)

at org.apache.hadoop.hdfs.server.datanode.DataNode.run(DataNode.java:1186)

at java.lang.Thread.run(Thread.java:619)

5. After the warning,  I do not see the info  Deleting block
blk_xxxxxxxxxxxxxxxxxxxxxxxxx  message on this node anymore and we loose the
disk space very fast on this datanode. I guess because the hbase compact the
region and delete the old region,  but  the datanode is unable to reclaim
the free block.

6. After 5 - 6 hours, the datanode is completely run out of the space, but
the test is continue running at slower insert rate.
7. The entire test finish after 14 hours.
8. Right after the test finish, this datanode start resume reclaim the
deleting blocks.

We run the test twice and the same problem occurs on the same node.  I am
wonder what is the possible reason that cause our problem and any
configuration parameter we can tune to fix the problem

Thank for your help!
Tuan Nguyen.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message