hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12278) LeaseManager#removeLease operation is inefficient in 2.8.
Date Wed, 09 Aug 2017 16:41:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16120217#comment-16120217

Daryn Sharp commented on HDFS-12278:

For context regarding the impact of the change to a priority queue:  Hours after a 2.8 upgrade,
avg rpc processing time increased from sub-ms to 21ms.  Rpc queue time was multiple seconds.
 Killing large jobs only made it worse.  The fair call queue was completely overflowing for
~5h.  I haven't seen anything this horrific in many years.

While the NN log was spewing logs of skipping calls from timing out clients, we noticed lease
monitor recovery log messages ~5-12ms apart during which time the lease monitor holds the
write lock.  Killing jobs made it worse because it created more orphaned leases.

> LeaseManager#removeLease operation is inefficient in 2.8.
> ---------------------------------------------------------
>                 Key: HDFS-12278
>                 URL: https://issues.apache.org/jira/browse/HDFS-12278
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.8.0
>            Reporter: Rushabh S Shah
>            Assignee: Rushabh S Shah
>            Priority: Blocker
> After HDFS-6757, LeaseManager #removeLease became expensive. 
> HDFS-6757 changed the {{sortedLeases}} object from TreeSet to PriorityQueue. 
> Previously the {{remove(Object)}} operation from {{sortedLeases}} was {{O(log n)}} but
after the change it became {{O( n)}} since it has to find the object first. 
> Recently we had an incident in one of our production cluster just hours after we upgraded
from 2.7 to 2.8 
> The {{sortledLeases}} object had approximately 100,000 items within it. 
> While removing the lease, it will acquire the LeaseManager lock and that will slow down
the lookup of lease also.  
> HDFS-6757 is a good improvement which replaced the path by inode id.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message