hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Pierre OCALAN <jpoca...@gmail.com>
Subject blocksBeingWritten content consume a lot of disk space
Date Thu, 30 Jun 2011 18:38:25 GMT

Every day each map/reduce processes I schedule on my cluster leave files
behind on all the DataNodes in a directory named blocksBeingWritten. After 1
week the amount of files left behind reach 70 GB on each blockBeingWritten
directory on each DataNodes.

I have noticed that once I restart a DataNodes this directory is cleaned up.

Can someone please help me understand what exactly are those files contained
in this directory and why the DataNodes seems to delete them just when it's
restarted ?

Here below an example of the files that I seen in the
blockBeingWritten directory:
-rw-r--r-- 1 hdfs hadoop  2.0K Jun 14 14:24
-rw-r--r-- 1 hdfs hadoop  254K Jun 14 14:24 blk_2226351414820476901
-rw-r--r-- 1 hdfs hadoop   26K Jun 14 14:25
-rw-r--r-- 1 hdfs hadoop  3.2M Jun 14 14:25 blk_651476714389509127
-rw-r--r-- 1 hdfs hadoop  182K Jun 14 14:58
-rw-r--r-- 1 hdfs hadoop   23M Jun 14 14:58 blk_1727419676952982071
-rw-r--r-- 1 hdfs hadoop  447K Jun 14 14:59
-rw-r--r-- 1 hdfs hadoop   56M Jun 14 14:59 blk_687415755671726127
-rw-r--r-- 1 hdfs hadoop  476K Jun 14 15:02
-rw-r--r-- 1 hdfs hadoop   60M Jun 14 15:02 blk_-1767796325092574815

Thank you,


View raw message