hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Bailey" <ni...@mailtrust.com>
Subject Re: Hadoop dfs usage and actual size discrepancy
Date Wed, 09 Dec 2009 22:44:41 GMT
Well for that specific machine, du pretty much matches the report.  Not all of our nodes are
at 4.11TB that one is actually overloaded and we are running the balancer currently to fix
it.  

Restarting the datanode on that machine didn't seem to clear out any data.  I'll probably
go ahead and restart all the datanodes but I'm not hopeful to that clearing out all the data.

Thanks for helping out though. Any other ideas out there?

-Nick

-----Original Message-----
From: "Brian Bockelman" <bbockelm@cse.unl.edu>
Sent: Wednesday, December 9, 2009 4:57pm
To: common-user@hadoop.apache.org
Cc: core-user@hadoop.apache.org
Subject: Re: Hadoop dfs usage and actual size discrepancy

Hey Nick,

Non-DFS Used must be something new in 19.x, I guess.

What happens if you do "du -hs" on the datanode directory?  Are they all approximately 4.11TB?
 What happens after you restart a datanode?  Does it clean out a bunch of data?

Never seen this locally, and we beat the bejesus out of our cluster...

Brian

On Dec 9, 2009, at 10:54 PM, Nick Bailey wrote:

> Brian,
> 
> Hadoop version 18.3. More specifically cloudera's version.  Our dfsadmin -report doesn't
contain any lines with "Non DFS Used". so that grep won't work. Here is an example of the
report for one of the nodes
> 
> 
> Name: XXXXXXXXXXXXX
> State          : In Service
> Total raw bytes: 4919829360640 (4.47 TB)
> Remaining raw bytes: 108009550121(100.59 GB)
> Used raw bytes: 4520811248473 (4.11 TB)
> % used: 91.89%
> Last contact: Wed Dec 09 16:50:10 EST 2009
> 
> Besides what I already posted the rest of the report is just a repeat of that for every
node.
> 
> Nick
> 
> -----Original Message-----
> From: "Brian Bockelman" <bbockelm@cse.unl.edu>
> Sent: Wednesday, December 9, 2009 4:48pm
> To: common-user@hadoop.apache.org
> Cc: core-user@hadoop.apache.org
> Subject: Re: Hadoop dfs usage and actual size discrepancy
> 
> Hey Nick,
> 
> What's the output of this:
> 
> hadoop dfsadmin -report | grep "Non DFS Used" | grep -v "0 KB" | awk '{sum += $4} END
{print sum}'
> 
> What version of Hadoop is this?
> 
> Brian
> 
> On Dec 9, 2009, at 10:25 PM, Nick Bailey wrote:
> 
>> Output from bottom of fsck report:
>> 
>> Total size:    8711239576255 B (Total open files size: 3571494 B)
>> Total dirs:    391731
>> Total files:   2612976 (Files currently being written: 3)
>> Total blocks (validated):      2274747 (avg. block size 3829542 B) (Total open file
blocks (not validated): 1)
>> Minimally replicated blocks:   2274747 (100.0 %)
>> Over-replicated blocks:        75491 (3.3186548 %)
>> Under-replicated blocks:       36945 (1.6241367 %)
>> Mis-replicated blocks:         0 (0.0 %)
>> Default replication factor:    3
>> Average block replication:     3.017153
>> Corrupt blocks:                0
>> Missing replicas:              36945 (0.53830105 %)
>> Number of data-nodes:          25
>> Number of racks:               1
>> 
>> 
>> 
>> Output from top of dfsadmin -report:
>> 
>> Total raw bytes: 110689488793600 (100.67 TB)
>> Remaining raw bytes: 46994184353977 (42.74 TB)
>> Used raw bytes: 55511654282643 (50.49 TB)
>> % used: 50.15%
>> 
>> Total effective bytes: 0 (0 KB)
>> Effective replication multiplier: Infinity
>> 
>> 
>> Not sure what the last two lines fo the dfsadmin report mean, but we have a neglible
amount of over replicated blocks according to fsck.  The rest of the dfsadmin report confirms
what the web interface says in that the nodes have way more data than 8.6TB * 3.
>> 
>> Thoughts?
>> 
>> 
>> 
>> -----Original Message-----
>> From: "Brian Bockelman" <bbockelm@cse.unl.edu>
>> Sent: Wednesday, December 9, 2009 3:35pm
>> To: common-user@hadoop.apache.org
>> Cc: core-user@hadoop.apache.org
>> Subject: Re: Hadoop dfs usage and actual size discrepancy
>> 
>> Hey Nick,
>> 
>> Try:
>> 
>> hadoop fsck /
>> hadoop dfsadmin -report
>> 
>> Should give you information about, for example, the non-HDFS data and the average
replication factor.
>> 
>> Or is this how you determined you had a replication factor of 3?
>> 
>> Brian
>> 
>> On Dec 9, 2009, at 9:33 PM, Nick Bailey wrote:
>> 
>>> We have a hadoop cluster with a 100TB capacity, and according to the dfs web
interface we are using 50% of our capacity (50TB).  However doing 'hadoop fs -dus /' says
the total size of everything is  about 8.6TB.  Everything has a replication factor of 3 so
we should only be using around 26TB of our cluster.
>>> 
>>> I've verified the replication factors and I've also checked the datanode machines
to see if something non hadoop related is accidentally being stored on the drives hadoop is
using for storage, but nothing is.
>>> 
>>> Has anyone had a similar problem and have any debugging suggestions?
>>> 
>>> Thanks,
>>> Nick Bailey
>>> 
>> 
>> 
> 
> 




Mime
View raw message