hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4477) All replicas of a block end up on only 1 rack
Date Thu, 23 Oct 2008 21:00:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12642279#action_12642279
] 

Hairong Kuang commented on HADOOP-4477:
---------------------------------------

My proposal is to include both under-replicated blocks and  blocks that do not satisfy rack
requirement in the neededReplication queue. The neededReplication queue supports four priorities:
Priority 0: Blocks that have only one replicas;
Priority 1: Blocks whose replicas are on only one rack;
Priority 2: Blocks whose number of replicas is no greater than 1/3 of it replication factor;
Priority 3: All other under-replicated blocks.

In general we should have priority 4 which includes those blocks that do not belong to priorities
0-3 and do not satisfy the HDFS rack requirement. Currently HDFS provides only two-rack guarantee
so priority 1 covers all rack requirement break cases.

In methods addStoredBlock, removeStoredBlock,   startDecomission,  and markBlockAsCorrupt
in FSNamesystem, put both under-replication and 1 rack blocks into the neededReplication queue.
Replicator will in addition replicate one more replicas for only 1 rack not under-replicated
blocks.

> All replicas of a block end up on only 1 rack
> ---------------------------------------------
>
>                 Key: HADOOP-4477
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4477
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.20.0
>
>
> HDFS replicas placement strategy guarantees that the replicas of a block exist on at
least two racks when its replication factor is greater than one. But fsck still reports that
the replicas of some blocks  end up on one rack.
> The cause of the problem is that decommission and corruption handling only check the
block's replication factor but not the rack requirement. When an over-replicated block loses
a replica due to decomission, corruption, or heartbeat lost, namenode does not take any action
to guarantee that remaining replicas are on different racks.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message