hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Loughran (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1120) Make DataNode's block-to-device placement policy pluggable
Date Tue, 04 May 2010 09:44:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12863740#action_12863740

Steve Loughran commented on HDFS-1120:

I think the probability gets larger the more disks/server, and now that 12HDD units are coming
out, you can plan to see it some time after you spec out your next datacentre.

# deletion of large block size files can leave a disk unbalanced.
# MR temp space in the same disks can fill up then free disks
# Replacement of failed HDDs leaves that disk permanently underutilised.

the third one is new; on a 12 disk server, with most of all 12 disks allocated to HDFS, one
block in 12 would go to any specific disk. If one disk is replaced, it still only gets 1/12
of the blocks, even though if all the other disks were 70-80% full, its the disk with the
most space. The disks would only be balanced if the new disk got more of the writes (which
could have adverse consequences for future IO rates), or some rebalancing on a single machine
moves data from one disk to another (or to be precise, copies, validates the block checksums,
then deletes). 

I actually think HDFS-1121 should come first: provide a way of measuring the distribution
on disks on a single server. Once we have the data we can start worrying about ways to correct
any distribution issues.

> Make DataNode's block-to-device placement policy pluggable
> ----------------------------------------------------------
>                 Key: HDFS-1120
>                 URL: https://issues.apache.org/jira/browse/HDFS-1120
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Jeff Hammerbacher
> As discussed on the mailing list, as the number of disk drives per server increases,
it would be useful to allow the DataNode's policy for new block placement to grow in sophistication
from the current round-robin strategy.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message