Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 08A36107ED for ; Mon, 27 Jan 2014 22:07:53 +0000 (UTC) Received: (qmail 47867 invoked by uid 500); 27 Jan 2014 22:07:47 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 47672 invoked by uid 500); 27 Jan 2014 22:07:45 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 47493 invoked by uid 99); 27 Jan 2014 22:07:40 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Jan 2014 22:07:40 +0000 Date: Mon, 27 Jan 2014 22:07:40 +0000 (UTC) From: "Kihwal Lee (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883409#comment-13883409 ] Kihwal Lee commented on HDFS-5790: ---------------------------------- I wondered why commitBlockSynchronization() sometimes takes long and this jira explains why. When the original lease holders disappear, the lease holders are changed to namenode for block recovery. So if a lot of files get abandoned at around the same time, NN will be that writer with a large number of open files. The patch looks good. The paths managed by LeaseManager are supposed to be updated on deletions and renames, so there is no point in searching there when the reference to inode is already known. For all user-initiated calls, the inode is obtained using the user-supplied path and then checkLease() is called before calling findPath(). So if something is to fail in findPath(), it should fail earlier in the code path. The patch seems fine in terms of both consistency and correctness. +1 > LeaseManager.findPath is very slow when many leases need recovery > ----------------------------------------------------------------- > > Key: HDFS-5790 > URL: https://issues.apache.org/jira/browse/HDFS-5790 > Project: Hadoop HDFS > Issue Type: Bug > Components: namenode, performance > Affects Versions: 2.4.0 > Reporter: Todd Lipcon > Assignee: Todd Lipcon > Attachments: hdfs-5790.txt, hdfs-5790.txt > > > We recently saw an issue where the NN restarted while tens of thousands of files were open. The NN then ended up spending multiple seconds for each commitBlockSynchronization() call, spending most of its time inside LeaseManager.findPath(). findPath currently works by looping over all files held for a given writer, and traversing the filesystem for each one. This takes way too long when tens of thousands of files are open by a single writer. -- This message was sent by Atlassian JIRA (v6.1.5#6160)