hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1172) Blocks in newly completed files are considered under-replicated too quickly
Date Mon, 28 Sep 2015 18:49:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14933772#comment-14933772
] 

Jing Zhao commented on HDFS-1172:
---------------------------------

bq. BlockManager#hasEnoughEffectiveReplicas added by HDFS-8938 takes pending replicas into
account. numCurrentReplica in BlockManager#addStoredBlock was fixed to take pending replicas
into account by HDFS-8623.

These two jiras are mainly doing only code refactoring. The logic has been there for a while.

bq. I think it is better to leave BlockManager#checkReplication as is here. Though it may
add block having pending replicas to neededReplications, the replication will not be scheduled
as far as the replica is in pendingReplications because BlockManager#hasEnoughEffectiveReplicas
takes it into account.

The question is, if we expect later replication monitor to remove the block from {{neededReplication}},
why do we add it in the first place? Also if a block's effective replica number (including
pending replica number) is >= than its replication factor, the block should not be in {{neededReplication}}.
This is more consistent with the current logic.



> Blocks in newly completed files are considered under-replicated too quickly
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-1172
>                 URL: https://issues.apache.org/jira/browse/HDFS-1172
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 0.21.0
>            Reporter: Todd Lipcon
>         Attachments: HDFS-1172-150907.patch, HDFS-1172.008.patch, HDFS-1172.009.patch,
HDFS-1172.patch, hdfs-1172.txt, hdfs-1172.txt, replicateBlocksFUC.patch, replicateBlocksFUC1.patch,
replicateBlocksFUC1.patch
>
>
> I've seen this for a long time, and imagine it's a known issue, but couldn't find an
existing JIRA. It often happens that we see the NN schedule replication on the last block
of files very quickly after they're completed, before the other DNs in the pipeline have a
chance to report the new block. This results in a lot of extra replication work on the cluster,
as we replicate the block and then end up with multiple excess replicas which are very quickly
deleted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message