hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "HuangTao (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14847) Blocks are over-replicated while EC decommissioning
Date Sat, 14 Sep 2019 07:47:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16929715#comment-16929715

HuangTao commented on HDFS-14847:

I just verified with [~ferhui]'s UT and my snippet, and failed.

I will write a new issues to record my scenario later.

> Blocks are over-replicated while EC decommissioning
> ---------------------------------------------------
>                 Key: HDFS-14847
>                 URL: https://issues.apache.org/jira/browse/HDFS-14847
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.2.0, 3.0.3, 3.1.2, 3.3.0
>            Reporter: Fei Hui
>            Assignee: Fei Hui
>            Priority: Critical
>         Attachments: HDFS-14847.001.patch, HDFS-14847.002.patch
> Found that Some blocks are over-replicated while ec decommissioning. Messages in log
as follow
> {quote}
> INFO BlockStateChange: Block: blk_-9223372035714984112_363779142, Expected Replicas:
9, live replicas: 8, corrupt replicas: 0, decommissioned replicas: 0, decommissioning replicas:
3, maintenance replicas: 0, live entering maintenance replicas: 0, excess replicas: 0, Is
Open File: false, Datanodes having this block: , Current Datanode:, Is current datanode decommissioning:
true, Is current datanode entering maintenance: false
> {quote}
> Decommisions hang for a long time.
> Deep into the code and find that There is a problem in ErasureCodingWork.java
> For Example, there are 2 nodes(dn0, dn1) in decommission and an ec block group with the
2 nodes. After creating an ErasureCodingWork to reconstruct, it will create 2 replication
> If dn0 replicates in success and dn1 replicates in failure, Then it will always create
replication work for dn0. The block on dn0 is over-replicated and The block on dn1 will never
> Here is the initial path for this.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message