Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Sun, 17 Sep 2017 19:52:03 +0000 (UTC)
From: "Brahma Reddy Battula (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.13058983.1490380693000.132400.1505677923073@Atlassian.JIRA>
In-Reply-To: <JIRA.13058983.1490380693000@Atlassian.JIRA>
References: <JIRA.13058983.1490380693000@Atlassian.JIRA> <JIRA.13058983.1490380693637@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HDFS-11576) Block recovery will fail
 indefinitely if recovery time > heartbeat interval
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Sun, 17 Sep 2017 19:52:13 -0000


    [ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16169398#comment-16169398 ] 

Brahma Reddy Battula commented on HDFS-11576:
---------------------------------------------

[~lukmajercak] thanks for reporting and working on this issue.

latest patch lgtm. [~shv] do you any comments on latest patch..?

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-11576
>                 URL: https://issues.apache.org/jira/browse/HDFS-11576
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs, namenode
>    Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>            Priority: Critical
>         Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, HDFS-11576.003.patch, HDFS-11576.004.patch, HDFS-11576.005.patch, HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008.patch, HDFS-11576.009.patch, HDFS-11576.010.patch, HDFS-11576.011.patch, HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is always longer than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to NN, which fails because X < X+1
> ... 


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org