hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lukas Majercak (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11576) Block recovery will fail indefinitely if recovery time > heartbeat interval
Date Tue, 01 Aug 2017 23:03:01 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16109960#comment-16109960
] 

Lukas Majercak commented on HDFS-11576:
---------------------------------------

Hi guys,

Sorry for the delay again.
[~shv], thanks for the feedback, I've :
1. Renamed UnderRecoveryBlocks to PendingRecoveryBlocks
2. Created a config for the timeout multiplier instead (need it to be configurable for the
tests), and made it 60x by default
3. Removed one method from the PendingRecoveryBlocks class and changed the names to add/remove/contains

> Block recovery will fail indefinitely if recovery time > heartbeat interval
> ---------------------------------------------------------------------------
>
>                 Key: HDFS-11576
>                 URL: https://issues.apache.org/jira/browse/HDFS-11576
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode, hdfs, namenode
>    Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha1, 3.0.0-alpha2
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>            Priority: Critical
>         Attachments: HDFS-11576.001.patch, HDFS-11576.002.patch, HDFS-11576.003.patch,
HDFS-11576.004.patch, HDFS-11576.005.patch, HDFS-11576.006.patch, HDFS-11576.007.patch, HDFS-11576.008,
HDFS-11576.repro.patch
>
>
> Block recovery will fail indefinitely if the time to recover a block is always longer
than the heartbeat interval. Scenario:
> 1. DN sends heartbeat 
> 2. NN sends a recovery command to DN, recoveryID=X
> 3. DN starts recovery
> 4. DN sends another heartbeat
> 5. NN sends a recovery command to DN, recoveryID=X+1
> 6. DN calls commitBlockSyncronization after succeeding with first recovery to NN, which
fails because X < X+1
> ... 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message