Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 23 Nov 2015 21:00:12 +0000 (UTC)
From: "Kihwal Lee (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12840908.1435336329000.165458.1448312412038@Atlassian.JIRA>
In-Reply-To: <JIRA.12840908.1435336329000@Atlassian.JIRA>
References: <JIRA.12840908.1435336329000@Atlassian.JIRA>
 <JIRA.12840908.1435336329622@arcas>
Subject: [jira] [Commented] (HDFS-8676) Delayed rolling upgrade finalization
 can cause heartbeat expiration and write failures
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-8676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15023052#comment-15023052 ] 

Kihwal Lee commented on HDFS-8676:
----------------------------------

bq. Does this issue exist in 2.6.x? Should this be backported to branch-2.6?
yes & yes, imo.

> Delayed rolling upgrade finalization can cause heartbeat expiration and write failures
> --------------------------------------------------------------------------------------
>
>                 Key: HDFS-8676
>                 URL: https://issues.apache.org/jira/browse/HDFS-8676
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Kihwal Lee
>            Assignee: Walter Su
>            Priority: Critical
>             Fix For: 3.0.0, 2.7.2
>
>         Attachments: HDFS-8676.01.patch, HDFS-8676.02.patch
>
>
> In big busy clusters where the deletion rate is also high, a lot of blocks can pile up in the datanode trash directories until an upgrade is finalized.  When it is finally finalized, the deletion of trash is done in the service actor thread's context synchronously.  This blocks the heartbeat and can cause heartbeat expiration.  
> We have seen a namenode losing hundreds of nodes after a delayed upgrade finalization.  The deletion of trash directories should be made asynchronous.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)