hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
Date Mon, 27 Jan 2014 22:07:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13883409#comment-13883409

Kihwal Lee commented on HDFS-5790:

I wondered why commitBlockSynchronization() sometimes takes long and this jira explains why.
 When the original lease holders disappear, the lease holders are changed to namenode for
block recovery. So if a lot of files get abandoned at around the same time, NN will be that
writer with a large number of open files. 

The patch looks good. The paths managed by LeaseManager are supposed to be updated on deletions
and renames, so there is no point in searching there when the reference to inode is already
known. For all user-initiated calls, the inode is obtained using the user-supplied path and
then checkLease() is called before calling findPath(). So if something is to fail in findPath(),
it should fail earlier in the code path. The patch seems fine in terms of both consistency
and correctness.


> LeaseManager.findPath is very slow when many leases need recovery
> -----------------------------------------------------------------
>                 Key: HDFS-5790
>                 URL: https://issues.apache.org/jira/browse/HDFS-5790
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, performance
>    Affects Versions: 2.4.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-5790.txt, hdfs-5790.txt
> We recently saw an issue where the NN restarted while tens of thousands of files were
open. The NN then ended up spending multiple seconds for each commitBlockSynchronization()
call, spending most of its time inside LeaseManager.findPath(). findPath currently works by
looping over all files held for a given writer, and traversing the filesystem for each one.
This takes way too long when tens of thousands of files are open by a single writer.

This message was sent by Atlassian JIRA

View raw message