Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 30 Oct 2015 03:38:27 +0000 (UTC)
From: "Walter Su (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12906483.1445421223000.103801.1446176307856@Atlassian.JIRA>
In-Reply-To: <JIRA.12906483.1445421223000@Atlassian.JIRA>
References: <JIRA.12906483.1445421223000@Atlassian.JIRA>
 <JIRA.12906483.1445421223481@arcas>
Subject: [jira] [Updated] (HDFS-9275) Wait previous ErasureCodingWork to
 finish before schedule another one
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-9275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Walter Su updated HDFS-9275:
----------------------------
    Attachment: HDFS-9275.05.patch

Thanks, [~hitliuyi].
bq. Also check this in scheduleRecovery to avoid unnecessary choose targets.
Good idea. I moved it to {{scheduleRecovery}}.
bq. move the block group to end of queue of same priority in neededReplications, otherwise it's chosen first again next time.
Don't have to. {{UnderReplicatedBlocks}} has a inside bookmark.

Uploaded 05 patch.

> Wait previous ErasureCodingWork to finish before schedule another one
> ---------------------------------------------------------------------
>
>                 Key: HDFS-9275
>                 URL: https://issues.apache.org/jira/browse/HDFS-9275
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Walter Su
>            Assignee: Walter Su
>         Attachments: HDFS-9275.01.patch, HDFS-9275.02.patch, HDFS-9275.03.patch, HDFS-9275.04.patch, HDFS-9275.05.patch
>
>
> In {{ErasureCodingWorker}}, for the same block group, one task doesn't know which internal blocks is in recovering by other tasks. We could end up with recovering 2 identical block with same index. So, {{ReplicationMonitor}} should wait previous work to finish before schedule another one.
> This is related to the occasional failure of {{TestRecoverStripedFile}}.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)