hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4477) All replicas of a block end up on only 1 rack
Date Thu, 23 Oct 2008 21:00:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642279#action_12642279

Hairong Kuang commented on HADOOP-4477:

My proposal is to include both under-replicated blocks and  blocks that do not satisfy rack
requirement in the neededReplication queue. The neededReplication queue supports four priorities:
Priority 0: Blocks that have only one replicas;
Priority 1: Blocks whose replicas are on only one rack;
Priority 2: Blocks whose number of replicas is no greater than 1/3 of it replication factor;
Priority 3: All other under-replicated blocks.

In general we should have priority 4 which includes those blocks that do not belong to priorities
0-3 and do not satisfy the HDFS rack requirement. Currently HDFS provides only two-rack guarantee
so priority 1 covers all rack requirement break cases.

In methods addStoredBlock, removeStoredBlock,   startDecomission,  and markBlockAsCorrupt
in FSNamesystem, put both under-replication and 1 rack blocks into the neededReplication queue.
Replicator will in addition replicate one more replicas for only 1 rack not under-replicated

> All replicas of a block end up on only 1 rack
> ---------------------------------------------
>                 Key: HADOOP-4477
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4477
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.20.0
> HDFS replicas placement strategy guarantees that the replicas of a block exist on at
least two racks when its replication factor is greater than one. But fsck still reports that
the replicas of some blocks  end up on one rack.
> The cause of the problem is that decommission and corruption handling only check the
block's replication factor but not the rack requirement. When an over-replicated block loses
a replica due to decomission, corruption, or heartbeat lost, namenode does not take any action
to guarantee that remaining replicas are on different racks.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message