hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Carey (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1312) Re-balance disks within a Datanode
Date Thu, 27 Sep 2012 23:33:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13465219#comment-13465219

Scott Carey commented on HDFS-1312:

Isn't the datanode internal block placement policy an easier/simpler solution?

IMO if you simply placed blocks on disks based on the weight of free space available then
this would not be a big issue.  You would always run out of space with all drives near the
same capacity.  The drawback would be write performance bottlenecks in more extreme cases.

If you were 90% full on 11 drives and 100% empty on one, then ~50% of new blocks would go
to the new drive (however, few reads would hit this drive) . That is not ideal for performance
but not a big problem either since it should rapidly become more balanced.

In most situations, we would be talking about systems that have 3 to 11 drives that are 50%
to 70% full and one empty drive.  This would lead to between ~17% and 55% of writes going
to the drive instead of the 8% or 25% that would happen if round-robin.

IMO the default datanode block placement should be weighted towards disks with less space.
 There are other cases besides disk failure that can lead to imbalanced space usage, including
heterogeneous partition sizes.   That would mitigate the need for any complicated background
rebalance tasks.  

Perhaps on start-up a datanode could optionally do some local rebalancing before joining the
> Re-balance disks within a Datanode
> ----------------------------------
>                 Key: HDFS-1312
>                 URL: https://issues.apache.org/jira/browse/HDFS-1312
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: data-node
>            Reporter: Travis Crawford
> Filing this issue in response to ``full disk woes`` on hdfs-user.
> Datanodes fill their storage directories unevenly, leading to situations where certain
disks are full while others are significantly less used. Users at many different sites have
experienced this issue, and HDFS administrators are taking steps like:
> - Manually rebalancing blocks in storage directories
> - Decomissioning nodes & later readding them
> There's a tradeoff between making use of all available spindles, and filling disks at
the sameish rate. Possible solutions include:
> - Weighting less-used disks heavier when placing new blocks on the datanode. In write-heavy
environments this will still make use of all spindles, equalizing disk use over time.
> - Rebalancing blocks locally. This would help equalize disk use as disks are added/replaced
in older cluster nodes.
> Datanodes should actively manage their local disk so operator intervention is not needed.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message