hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2472) Extend UnderReplicatedBlocks queue to give blocks whose existing copies are all on a single rack priority over multi-rack blocks
Date Wed, 19 Oct 2011 18:19:10 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13130846#comment-13130846

Eli Collins commented on HDFS-2472:

Note that the code isn't toally un-rack aware, UnderReplicatedBlocks#getPriority gives priority
to blocks with a single replica and to blocks w replicas getting decommissioned. 

> Extend UnderReplicatedBlocks queue to give blocks whose existing copies are all on a
single rack priority over multi-rack blocks
> --------------------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-2472
>                 URL: https://issues.apache.org/jira/browse/HDFS-2472
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>            Priority: Minor
> Google's Availability in Globally Distributed Storage Systems paper [http://research.google.com/pubs/archive/36737.pdf]
 shows that storage node failures are often correlated with other failures on the same rack.
This could be due to rack-local failures: switches, power, etc, or operations actions (take
down a whole rack for a rolling OS upgrade). Whatever the root cause, the paper argues (section
5.2) that rack-aware placement would increase the MTTF of a single block by a factor of three
(typically). Some decisions can be made a block placement time, but that would be a separate
> Here I propose giving priority to blocks that are under replicated and where all blocks
are on the same rack above those blocks that are under-replicated and the remaining blocks
are on different racks.
> # Provided the failure does not take down the entire rack in one go (e.g. switch failure),
this policy would decrease the time in which all blocks would be on a single rack, so reduce
the consequences of a rack failure. 
> # On a single-rack system, all under-replicated blocks would go into this queue, so the
state would effectively be that of today's system.
> This may make the demand for off-rack bandwidth ramp up immediately, because priority
will be given to blocks that must be replicated off rack. I am not sure that it will, however,
as multi-rack replication policies would generate the same amount of traffic to re-replicate
the blocks anyway. 
> The main barrier to implementing this feature is that currently the UnderReplicatedBlocks
queues do not get provided with any information about block locality. We'd need to change
the add() method to take information about where the current replicas are, and the BlockManager
would have to provide some information as to whether the block was rack-local, not-rack local
or unknown; the latter because the PendingReplicationBlocks structure does not know where
things come from. "unknown" items would have to be given the same priority as rack-only blocks
(pessimistic) or the same priority as multi-rack blocks (optimistic). I would be biased towards
the pessimistic approach as it would ensure that on single-rack systems there would be no
obvious change in behaviour. On a multi-rack system it would give priority to timed out PendingReplication
blocks ahead of multi-rack under-replicated blocks. That may be a good thing in itself

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message