hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
Date Thu, 12 May 2016 15:10:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15281592#comment-15281592
] 

Kihwal Lee commented on HDFS-10220:
-----------------------------------

The throughput is pathetic, but it seems in the ballpark of what I have seen. In my experience,
the {{commitBlockSynchronization()}} load generated by lease recovery also affects performance
greatly.  The lease recovery may fill up the edit buffer and cause auto-sync. Depending on
the speed of edit syncing, a massive lease recovery can overwhelm the edit buffering and I/O.
 It will be nice if find out what the actual bottleneck is, so that we can improve the performance.

The average rpc time of {{commitBlockSynchronization()}} I observed lately is around 600us.
 After releasing 1K paths, there can be 1K  {{commitBlockSynchronization()}} calls in the
worst case. That will translate to 600ms, so overall about 800ms will be spent in the namespace
write lock. Since the lease manager sleeps for 2 seconds, the NN will be spending about 0.8/2.2
= 36% of time exclusively on lease/block recovery.

This might only be acceptable to lightly loaded namenodes. Setting the limit to 100ms will
lower it to 24% and 50ms will make it 12%.  We also have to weigh in how important these lease
recoveries are.  I don't think this kind of mass lease recoveries are normal. These are usually
caused by faulty user code (e.g. not closing files before committing).  This should not penalize
other users by greatly degrading NN performance.  So I lean toward something like 50ms or
shorter. 

I want to hear what others think.

> Namenode failover due to too long loking in LeaseManager.Monitor
> ----------------------------------------------------------------
>
>                 Key: HDFS-10220
>                 URL: https://issues.apache.org/jira/browse/HDFS-10220
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Nicolas Fraison
>            Assignee: Nicolas Fraison
>            Priority: Minor
>         Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, HADOOP-10220.003.patch,
HADOOP-10220.004.patch, HADOOP-10220.005.patch, HADOOP-10220.006.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the zkfc with
lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks
are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when some lease
must be released. Due to the really big number of lease to be released the namenode has taken
too many times to release them blocking all other tasks and making the zkfc thinking that
the namenode was not available/stuck.
> The idea of this patch is to limit the number of leased released each time we check for
lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message