hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-15) All replicas of a block end up on only 1 rack
Date Wed, 29 Dec 2010 23:31:48 GMT

     [ https://issues.apache.org/jira/browse/HDFS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins updated HDFS-15:
----------------------------

    Attachment: hdfs-15-b20-1.patch

Here's a patch that applies against branch 20.  The original patch was done after the block
management refactoring so it isn't a straight forward application to 20, so I've written a
new patch, but the fix is in the same spirit as the code in trunk.

The bug is hard to reproduce as you have to have a failure/decommission on the only cross
rack replica of a block in the window of time this block is over-replicated.

The patch adds following new tests that cover rack policy violations not covered by the existing
tests. Some of them fail when looped repeatedly w/o the fix (after commenting out the asserts
that check neededReplications which will always fail). I'll forward port these test to trunk
in another jira. 

* Test that blocks that have a sufficient number of total replicas, but are not replicated
cross rack, get replicated cross rack when a rack becomes available.
* Test that new blocks for an underreplicated file will get replicated cross rack.
* Mark a block as corrupt, test that when it is re-replicated that it is still replicated
across racks.
* Reduce the replication factor of a file, making sure that the only block that is across
racks is not removed when deleting replicas.
* Test that when a block is replicated because a replica is lost due to host failure the the
rack policy is preserved.
* Test that when the execss replicas of a block are reduced due to a node re-joining the cluster
the rack policy is not violated.
* Test that rack policy is still respected when blocks are replicated due to node decommissioning.
* Test that rack policy is still respected when blocks are replicated due to node decommissioning,
even when the blocks are over-replicated.

> All replicas of a block end up on only 1 rack
> ---------------------------------------------
>
>                 Key: HDFS-15
>                 URL: https://issues.apache.org/jira/browse/HDFS-15
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 0.20.3
>            Reporter: Hairong Kuang
>            Assignee: Jitendra Nath Pandey
>            Priority: Critical
>             Fix For: 0.20.3, 0.21.0
>
>         Attachments: hdfs-15-b20-1.patch, HDFS-15.4.patch, HDFS-15.5.patch, HDFS-15.6.patch,
HDFS-15.patch, HDFS-15.patch.2, HDFS-15.patch.3
>
>
> HDFS replicas placement strategy guarantees that the replicas of a block exist on at
least two racks when its replication factor is greater than one. But fsck still reports that
the replicas of some blocks  end up on one rack.
> The cause of the problem is that decommission and corruption handling only check the
block's replication factor but not the rack requirement. When an over-replicated block loses
a replica due to decomission, corruption, or heartbeat lost, namenode does not take any action
to guarantee that remaining replicas are on different racks.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message