Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Mon, 27 Jan 2014 22:07:40 +0000 (UTC)
From: "Kihwal Lee (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12689348.1389905458822.21741.1390860460764@arcas>
In-Reply-To: <JIRA.12689348.1389905458822@arcas>
References: <JIRA.12689348.1389905458822@arcas>
Subject: [jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow
 when many leases need recovery
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883409#comment-13883409 ] 

Kihwal Lee commented on HDFS-5790:
----------------------------------

I wondered why commitBlockSynchronization() sometimes takes long and this jira explains why.  When the original lease holders disappear, the lease holders are changed to namenode for block recovery. So if a lot of files get abandoned at around the same time, NN will be that writer with a large number of open files. 

The patch looks good. The paths managed by LeaseManager are supposed to be updated on deletions and renames, so there is no point in searching there when the reference to inode is already known. For all user-initiated calls, the inode is obtained using the user-supplied path and then checkLease() is called before calling findPath(). So if something is to fail in findPath(), it should fail earlier in the code path. The patch seems fine in terms of both consistency and correctness.

+1

> LeaseManager.findPath is very slow when many leases need recovery
> -----------------------------------------------------------------
>
>                 Key: HDFS-5790
>                 URL: https://issues.apache.org/jira/browse/HDFS-5790
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, performance
>    Affects Versions: 2.4.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-5790.txt, hdfs-5790.txt
>
>
> We recently saw an issue where the NN restarted while tens of thousands of files were open. The NN then ended up spending multiple seconds for each commitBlockSynchronization() call, spending most of its time inside LeaseManager.findPath(). findPath currently works by looping over all files held for a given writer, and traversing the filesystem for each one. This takes way too long when tens of thousands of files are open by a single writer.


--
This message was sent by Atlassian JIRA
(v6.1.5#6160)