hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tapas Sarangi <tapas.sara...@gmail.com>
Subject disk used percentage is not symmetric on datanodes (balancer)
Date Mon, 18 Mar 2013 21:01:56 GMT

I am using one of the old legacy version (0.20) of hadoop for our cluster. We have scheduled
for an upgrade to the newer version within a couple of months, but I would like to understand
a couple of things before moving towards the upgrade plan.

We have about 200 datanodes and some of them have larger storage than others. The storage
for the datanodes varies between 12 TB to 72 TB. 

We found that the disk-used percentage is not symmetric through all the datanodes. For larger
storage nodes the percentage of disk-space used is much lower than that of other nodes with
smaller storage space. In larger storage nodes the percentage of used disk space varies, but
on average about 30-50%. For the smaller storage nodes this number is as high as 99.9%. Is
this expected ? If so, then we are not using a lot of the disk space effectively. Is this
solved in a future release ?

If no, I would like to know  if there are any checks/debugs that one can do to find an improvement
with the current version or upgrading hadoop should solve this problem. 

I am happy to provide additional information if needed.

Thanks for any help.


View raw message