hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11499) Decommissioning stuck because of failing recovery
Date Sun, 05 Mar 2017 20:12:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896532#comment-15896532
] 

ASF GitHub Bot commented on HDFS-11499:
---------------------------------------

GitHub user lukmajercak opened a pull request:

    https://github.com/apache/hadoop/pull/199

    HDFS-11499 Decommissioning stuck because of failing recovery

    

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/lukmajercak/hadoop HDFS-11499

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/hadoop/pull/199.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #199
    
----
commit 3609b1353e64a24dee4746b8fa23ed7547768d68
Author: Lukas Majercak <lumajerc@microsoft.com>
Date:   2017-03-05T20:04:06Z

    HDFS-11499 add TestDecommission.testDecommissionWithOpenFileAndDatanodeFailing for testing
recovery

commit 3f97d89f75d8a20f878da8c438141f9b6adf7da0
Author: Lukas Majercak <lumajerc@microsoft.com>
Date:   2017-03-05T20:05:08Z

    HDFS-11499 count decommissioning replicas when completing last block in BlockManager.commitOrCompleteLastBlock

----


> Decommissioning stuck because of failing recovery
> -------------------------------------------------
>
>                 Key: HDFS-11499
>                 URL: https://issues.apache.org/jira/browse/HDFS-11499
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs, namenode
>    Affects Versions: 2.7.1, 2.7.2, 2.7.3, 3.0.0-alpha2
>            Reporter: Lukas Majercak
>            Assignee: Lukas Majercak
>
> Block recovery will fail to finalize the file if the locations of the last, incomplete
block are being decommissioned. Vice versa, the decommissioning will be stuck, waiting for
the last block to be completed.
> {code:xml}
> org.apache.hadoop.ipc.RemoteException(java.lang.IllegalStateException): Failed to finalize
INodeFile testRecoveryFile since blocks[255] is non-complete, where blocks=[blk_1073741825_1001,
blk_1073741826_1002...
> {code}
> The fix is to count replicas on decommissioning nodes when completing last block in BlockManager.commitOrCompleteLastBlock,
as we know that the DecommissionManager will not decommission a node that has UC blocks.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message