hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ayush Saxena (Jira)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-14699) Erasure Coding: Can NOT trigger the reconstruction when have the dup internal blocks and missing one internal block
Date Mon, 02 Sep 2019 13:45:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920873#comment-16920873
] 

Ayush Saxena commented on HDFS-14699:
-------------------------------------

Let me try once again to explain :
Say, I applied your patch and just kept the UT and removed the fix you made, the UT should
fail but it didn't.

Or In other words :
You in your local just have this UT without your fix, it will pass, which ideally it should
fail.

> Erasure Coding: Can NOT trigger the reconstruction when have the dup internal blocks
and missing one internal block
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14699
>                 URL: https://issues.apache.org/jira/browse/HDFS-14699
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>    Affects Versions: 3.2.0, 3.1.1, 3.3.0
>            Reporter: Zhao Yi Ming
>            Assignee: Zhao Yi Ming
>            Priority: Critical
>              Labels: patch
>         Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, HDFS-14699.02.patch, HDFS-14699.03.patch,
HDFS-14699.04.patch, image-2019-08-20-19-58-51-872.png, image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the same scenario
as you said https://issues.apache.org/jira/browse/HDFS-8881. Following are our testing steps,
hope it can helpful.(following DNs have the testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 internal
block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 internal block(12
live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 decommission block,
now we have 12 internal block(11 live block and 1 decommission block)
>  # after wait for about 600s (before the heart beat come) commission the decommissioned
DN again, now we have 12 internal block(11 live block and 1 duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production env. Could
you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message