Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 1 Aug 2017 23:03:01 +0000 (UTC)
From: "Lukas Majercak (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13058983.1490380693000.67511.1501628581053@Atlassian.JIRA>
In-Reply-To: <JIRA.13058983.1490380693000@Atlassian.JIRA>
References: <JIRA.13058983.1490380693000@Atlassian.JIRA> <JIRA.13058983.1490380693637@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-11576) Block recovery will fail
 indefinitely if recovery time > heartbeat interval
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 01 Aug 2017 23:03:06 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109960#comment-16109960 ] 

Lukas Majercak commented on HDFS-11576:
---------------------------------------

Hi guys,

Sorry for the delay again.
[~shv], thanks for the feedback, I've :
1. Renamed UnderRecoveryBlocks to PendingRecoveryBlocks
2. Created a config for the timeout multiplier instead (need it to be configurable for the tests), and made it 60x by default
3. Removed one method from the PendingRecoveryBlocks class and changed the names to add/remove/contains

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-11576
>                 URL: https://issues.apache.org/jira/browse/HDFS-11576
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs, namenode
>    Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>            Priority: Critical
>         Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to NN, which fails because X < X+1
> ... 


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org