hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Bailey" <ni...@mailtrust.com>
Subject Re: Hadoop dfs usage and actual size discrepancy
Date Wed, 09 Dec 2009 21:25:08 GMT
Output from bottom of fsck report:

 Total size:    8711239576255 B (Total open files size: 3571494 B)
 Total dirs:    391731
 Total files:   2612976 (Files currently being written: 3)
 Total blocks (validated):      2274747 (avg. block size 3829542 B) (Total open file blocks
(not validated): 1)
 Minimally replicated blocks:   2274747 (100.0 %)
 Over-replicated blocks:        75491 (3.3186548 %)
 Under-replicated blocks:       36945 (1.6241367 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.017153
 Corrupt blocks:                0
 Missing replicas:              36945 (0.53830105 %)
 Number of data-nodes:          25
 Number of racks:               1

Output from top of dfsadmin -report:

Total raw bytes: 110689488793600 (100.67 TB)
Remaining raw bytes: 46994184353977 (42.74 TB)
Used raw bytes: 55511654282643 (50.49 TB)
% used: 50.15%

Total effective bytes: 0 (0 KB)
Effective replication multiplier: Infinity

Not sure what the last two lines fo the dfsadmin report mean, but we have a neglible amount
of over replicated blocks according to fsck.  The rest of the dfsadmin report confirms what
the web interface says in that the nodes have way more data than 8.6TB * 3.


-----Original Message-----
From: "Brian Bockelman" <bbockelm@cse.unl.edu>
Sent: Wednesday, December 9, 2009 3:35pm
To: common-user@hadoop.apache.org
Cc: core-user@hadoop.apache.org
Subject: Re: Hadoop dfs usage and actual size discrepancy

Hey Nick,


hadoop fsck /
hadoop dfsadmin -report

Should give you information about, for example, the non-HDFS data and the average replication

Or is this how you determined you had a replication factor of 3?


On Dec 9, 2009, at 9:33 PM, Nick Bailey wrote:

> We have a hadoop cluster with a 100TB capacity, and according to the dfs web interface
we are using 50% of our capacity (50TB).  However doing 'hadoop fs -dus /' says the total
size of everything is  about 8.6TB.  Everything has a replication factor of 3 so we should
only be using around 26TB of our cluster.
> I've verified the replication factors and I've also checked the datanode machines to
see if something non hadoop related is accidentally being stored on the drives hadoop is using
for storage, but nothing is.
> Has anyone had a similar problem and have any debugging suggestions?
> Thanks,
> Nick Bailey

View raw message