hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Bailey" <ni...@mailtrust.com>
Subject Re: Hadoop dfs usage and actual size discrepancy
Date Wed, 09 Dec 2009 23:57:33 GMT
Actually looks like restarting has helped.  DFS used has gone down to 43TB from 50TB and appears
to still be going down.

Don't know what was wrong with the DataNode process.  Possibly a cloudera problem.  Thanks
for the help Brian.

-Nick



-----Original Message-----
From: "Nick Bailey" <nickb@mailtrust.com>
Sent: Wednesday, December 9, 2009 5:55pm
To: common-user@hadoop.apache.org
Cc: common-user@hadoop.apache.org, common-user@hadoop.apache.org, core-user@hadoop.apache.org
Subject: Re: Hadoop dfs usage and actual size discrepancy

One interesting thing is the output of the command to restart the datanode.

$ sudo service hadoop-datanode restart
Stopping Hadoop datanode daemon (hadoop-datanode): no datanode to stop
                                                           [  OK  ]
Starting Hadoop datanode daemon (hadoop-datanode): starting datanode, logging to /log/location
                                                           [  OK  ]

Notice when stopping the datanode it says 'no datanode to stop'.  It says this even though
the datanode is definetly running.  Also there is only 1 datanode process, and it isn't getting
stopped by this command, so basically I actually didn't restart anything.  I checked and at
least a few of the other nodes are also exhibiting this behavior.

I don't know if its related, after killing the process and actually restarting the datanode,
it still doesn't appear to be clearing out any extra data.  I'll manually restart the datanodes
by killing processes for now and see if maybe that helps.

-Nick


-----Original Message-----
From: "Nick Bailey" <nickb@mailtrust.com>
Sent: Wednesday, December 9, 2009 5:44pm
To: common-user@hadoop.apache.org
Cc: common-user@hadoop.apache.org, core-user@hadoop.apache.org
Subject: Re: Hadoop dfs usage and actual size discrepancy

Well for that specific machine, du pretty much matches the report.  Not all of our nodes are
at 4.11TB that one is actually overloaded and we are running the balancer currently to fix
it.  

Restarting the datanode on that machine didn't seem to clear out any data.  I'll probably
go ahead and restart all the datanodes but I'm not hopeful to that clearing out all the data.

Thanks for helping out though. Any other ideas out there?

-Nick

-----Original Message-----
From: "Brian Bockelman" <bbockelm@cse.unl.edu>
Sent: Wednesday, December 9, 2009 4:57pm
To: common-user@hadoop.apache.org
Cc: core-user@hadoop.apache.org
Subject: Re: Hadoop dfs usage and actual size discrepancy

Hey Nick,

Non-DFS Used must be something new in 19.x, I guess.

What happens if you do "du -hs" on the datanode directory?  Are they all approximately 4.11TB?
 What happens after you restart a datanode?  Does it clean out a bunch of data?

Never seen this locally, and we beat the bejesus out of our cluster...

Brian

On Dec 9, 2009, at 10:54 PM, Nick Bailey wrote:

> Brian,
> 
> Hadoop version 18.3. More specifically cloudera's version.  Our dfsadmin -report doesn't
contain any lines with "Non DFS Used". so that grep won't work. Here is an example of the
report for one of the nodes
> 
> 
> Name: XXXXXXXXXXXXX
> State          : In Service
> Total raw bytes: 4919829360640 (4.47 TB)
> Remaining raw bytes: 108009550121(100.59 GB)
> Used raw bytes: 4520811248473 (4.11 TB)
> % used: 91.89%
> Last contact: Wed Dec 09 16:50:10 EST 2009
> 
> Besides what I already posted the rest of the report is just a repeat of that for every
node.
> 
> Nick
> 
> -----Original Message-----
> From: "Brian Bockelman" <bbockelm@cse.unl.edu>
> Sent: Wednesday, December 9, 2009 4:48pm
> To: common-user@hadoop.apache.org
> Cc: core-user@hadoop.apache.org
> Subject: Re: Hadoop dfs usage and actual size discrepancy
> 
> Hey Nick,
> 
> What's the output of this:
> 
> hadoop dfsadmin -report | grep "Non DFS Used" | grep -v "0 KB" | awk '{sum += $4} END
{print sum}'
> 
> What version of Hadoop is this?
> 
> Brian
> 
> On Dec 9, 2009, at 10:25 PM, Nick Bailey wrote:
> 
>> Output from bottom of fsck report:
>> 
>> Total size:    8711239576255 B (Total open files size: 3571494 B)
>> Total dirs:    391731
>> Total files:   2612976 (Files currently being written: 3)
>> Total blocks (validated):      2274747 (avg. block size 3829542 B) (Total open file
blocks (not validated): 1)
>> Minimally replicated blocks:   2274747 (100.0 %)
>> Over-replicated blocks:        75491 (3.3186548 %)
>> Under-replicated blocks:       36945 (1.6241367 %)
>> Mis-replicated blocks:         0 (0.0 %)
>> Default replication factor:    3
>> Average block replication:     3.017153
>> Corrupt blocks:                0
>> Missing replicas:              36945 (0.53830105 %)
>> Number of data-nodes:          25
>> Number of racks:               1
>> 
>> 
>> 
>> Output from top of dfsadmin -report:
>> 
>> Total raw bytes: 110689488793600 (100.67 TB)
>> Remaining raw bytes: 46994184353977 (42.74 TB)
>> Used raw bytes: 55511654282643 (50.49 TB)
>> % used: 50.15%
>> 
>> Total effective bytes: 0 (0 KB)
>> Effective replication multiplier: Infinity
>> 
>> 
>> Not sure what the last two lines fo the dfsadmin report mean, but we have a neglible
amount of over replicated blocks according to fsck.  The rest of the dfsadmin report confirms
what the web interface says in that the nodes have way more data than 8.6TB * 3.
>> 
>> Thoughts?
>> 
>> 
>> 
>> -----Original Message-----
>> From: "Brian Bockelman" <bbockelm@cse.unl.edu>
>> Sent: Wednesday, December 9, 2009 3:35pm
>> To: common-user@hadoop.apache.org
>> Cc: core-user@hadoop.apache.org
>> Subject: Re: Hadoop dfs usage and actual size discrepancy
>> 
>> Hey Nick,
>> 
>> Try:
>> 
>> hadoop fsck /
>> hadoop dfsadmin -report
>> 
>> Should give you information about, for example, the non-HDFS data and the average
replication factor.
>> 
>> Or is this how you determined you had a replication factor of 3?
>> 
>> Brian
>> 
>> On Dec 9, 2009, at 9:33 PM, Nick Bailey wrote:
>> 
>>> We have a hadoop cluster with a 100TB capacity, and according to the dfs web
interface we are using 50% of our capacity (50TB).  However doing 'hadoop fs -dus /' says
the total size of everything is  about 8.6TB.  Everything has a replication factor of 3 so
we should only be using around 26TB of our cluster.
>>> 
>>> I've verified the replication factors and I've also checked the datanode machines
to see if something non hadoop related is accidentally being stored on the drives hadoop is
using for storage, but nothing is.
>>> 
>>> Has anyone had a similar problem and have any debugging suggestions?
>>> 
>>> Thanks,
>>> Nick Bailey
>>> 
>> 
>> 
> 
> 








Mime
View raw message