hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jitendra Nath Pandey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-15) All replicas of a block end up on only 1 rack
Date Wed, 19 Aug 2009 21:07:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-15?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745196#action_12745196

Jitendra Nath Pandey commented on HDFS-15:

Updated proposal:
   A different queue (neededReplicationsForRacks) is maintained for blocks which do not have
sufficient racks assigned. Blocks in this queue are treated with lower priority than the blocks
in neededReplications, therefore the priority logic is same as in comment#3. The reasons for
this change:
  1. The semantics of neededReplications queue remains unchanged and it contains only those
blocks which are really under-replicated.
  2. Keeps the code cleaner as we maintain seperate queue for blocks with not enough racks.
No code change needed to UnderReplicatedBlocks.java. This new queue can be implemented as
just a TreeSet<Block>, therefore no new class needs to be implemented.
  Point # 3 in previous comment remains same except that  a block is added to neededReplicationsForRacks
if it doesn't have enough racks.
  Point #4 in previous comment remains unchanged.

> All replicas of a block end up on only 1 rack
> ---------------------------------------------
>                 Key: HDFS-15
>                 URL: https://issues.apache.org/jira/browse/HDFS-15
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Hairong Kuang
>            Assignee: Jitendra Nath Pandey
>            Priority: Critical
> HDFS replicas placement strategy guarantees that the replicas of a block exist on at
least two racks when its replication factor is greater than one. But fsck still reports that
the replicas of some blocks  end up on one rack.
> The cause of the problem is that decommission and corruption handling only check the
block's replication factor but not the rack requirement. When an over-replicated block loses
a replica due to decomission, corruption, or heartbeat lost, namenode does not take any action
to guarantee that remaining replicas are on different racks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message