hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhao Yi Ming (Jira)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-14699) Erasure Coding: Can NOT trigger the reconstruction when have the dup internal blocks and missing one internal block
Date Mon, 02 Sep 2019 09:52:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-14699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16920737#comment-16920737
] 

Zhao Yi Ming edited comment on HDFS-14699 at 9/2/19 9:51 AM:
-------------------------------------------------------------

[~ayushtkn] Thanks for your comments! And I am not sure what your mean about the "Why can't
we just pull the whole if part up, rather than just pulling half part?" Do you mean the PR?
or the code base? or something else? In my understanding you just need apply the patch in
your dev env, then run the UT. For the Jira, I am not sure why the patch 03 did NOT run the
UT, so I create a patch 04, hope it can run the UT successfully.

BTW: I checked the patch 02 UT history, the UT ran and succeeded.

!image-2019-09-02-17-51-46-742.png!


was (Author: zhaoyim):
[~ayushtkn] Thanks for your comments! And I am not sure what your mean about the "Why can't
we just pull the whole if part up, rather than just pulling half part?" Do you mean the PR?
or the code base? or something else? In my understanding you just need apply the patch in
your dev env, then run the UT. For the Jira, I am not sure why the patch 03 did NOT run the
UT, so I create a patch 04, hope it can run the UT successfully.

BTW: I checked the patch 02 UT history, the UT ran and succeeded.

!image-2019-09-02-17-49-24-286.png!

> Erasure Coding: Can NOT trigger the reconstruction when have the dup internal blocks
and missing one internal block
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-14699
>                 URL: https://issues.apache.org/jira/browse/HDFS-14699
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ec
>    Affects Versions: 3.2.0, 3.1.1, 3.3.0
>            Reporter: Zhao Yi Ming
>            Assignee: Zhao Yi Ming
>            Priority: Critical
>              Labels: patch
>         Attachments: HDFS-14699.00.patch, HDFS-14699.01.patch, HDFS-14699.02.patch, HDFS-14699.03.patch,
HDFS-14699.04.patch, image-2019-08-20-19-58-51-872.png, image-2019-09-02-17-49-24-286.png,
image-2019-09-02-17-51-46-742.png
>
>
> We are tried the EC function on 80 node cluster with hadoop 3.1.1, we hit the same scenario
as you said https://issues.apache.org/jira/browse/HDFS-8881. Following are our testing steps,
hope it can helpful.(following DNs have the testing internal blocks)
>  # we customized a new 10-2-1024k policy and use it on a path, now we have 12 internal
block(12 live block)
>  # decommission one DN, after the decommission complete. now we have 13 internal block(12
live block and 1 decommission block)
>  # then shutdown one DN which did not have the same block id as 1 decommission block,
now we have 12 internal block(11 live block and 1 decommission block)
>  # after wait for about 600s (before the heart beat come) commission the decommissioned
DN again, now we have 12 internal block(11 live block and 1 duplicate block)
>  # Then the EC is not reconstruct the missed block
> We think this is a critical issue for using the EC function in a production env. Could
you help? Thanks a lot!



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message