hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saumitra <saumitra.offic...@gmail.com>
Subject HDFS file system size issue
Date Sun, 13 Apr 2014 19:54:00 GMT

We are running HDFS on 9-node hadoop cluster, hadoop version is 1.2.1. We are using default
HDFS block size.

We have noticed that disks of slaves are almost full. From name node’s status page (namenode:50070),
we could see that disks of live nodes are 90% full and DFS Used% in cluster summary page 
is ~1TB.

However hadoop dfs -dus / shows that file system size is merely 38GB. 38GB number looks to
be correct because we keep only few Hive tables and hadoop’s /tmp (distributed cache and
job outputs) in HDFS. All other data is cleaned up. I cross-checked this from hadoop dfs -ls.
Also I think that there is no internal fragmentation because the files in our Hive tables
are well-chopped in ~50MB chunks. Here are last few lines of hadoop fsck / -files -blocks

 Total size:	38086441332 B
 Total dirs:	232
 Total files:	802
 Total blocks (validated):	796 (avg. block size 47847288 B)
 Minimally replicated blocks:	796 (100.0 %)
 Over-replicated blocks:	0 (0.0 %)
 Under-replicated blocks:	6 (0.75376886 %)
 Mis-replicated blocks:		0 (0.0 %)
 Default replication factor:	2
 Average block replication:	3.0439699
 Corrupt blocks:		0
 Missing replicas:		6 (0.24762692 %)
 Number of data-nodes:		9
 Number of racks:		1
FSCK ended at Sun Apr 13 19:49:23 UTC 2014 in 135 milliseconds

My question is that why disks of slaves are getting full even though there are only few files
in DFS?
View raw message