hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kester, Scott" <SKes...@weather.com>
Subject Rapid growth in Non DFS Used disk space
Date Fri, 13 May 2011 17:40:49 GMT
We have an 11 node Hadoop cluster running 20.2 that has been in production for 15 months now.
 The system is used to process log files that are ingested daily, and the oldest files in
the HDFS are deleted to free up space as needed, typically when the free space is less than
10% (the delete is done using 'hadoop fs -rmr' on the parent directory of the files to be
deleted).  When the HDFS was originally built it had 1TB of 'Non DFS' space out of the 20TB
total.  This 1TB stayed constant for at least the first year the system has been in use.

However over the last few weeks I have seen the 'Non DFS Used' as reported by the NameNode
dfshealth.jsp page grow to 2G and rising.  The total number of files/directories and blocks
in use has remained fairly constant over this time.  I am concerned that the Non DFS Used
is going to consume more and more of the HDFS if left unchecked.  Running fcsk gave "The filesystem
under path '/' is HEALTHY".


A) What exactly is hadoop reporting as 'Non DFS Used', and how is it calculated?  Are these
files on the same partition(s) as the HDFS files, but are not actually part of the HDFS?

2) Any ideas on what is driving the growth in Non DFS Used space?   I looked for things like
growing log files on the datanodes but didn't find anything.


View raw message