hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yiqun Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-13946) Log longest FSN write/read lock held stack trace
Date Tue, 09 Oct 2018 13:39:00 GMT

     [ https://issues.apache.org/jira/browse/HDFS-13946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Yiqun Lin updated HDFS-13946:
-----------------------------
    Status: Patch Available  (was: Open)

Attach the patch. To avoid log flooding, I did some refactor of output and removed current
thread stack trace when log report.
 [~xkrogen], would you  mind taking a look? As I know you did some work for this and maybe
more familiar with this, :).

> Log longest FSN write/read lock held stack trace
> ------------------------------------------------
>
>                 Key: HDFS-13946
>                 URL: https://issues.apache.org/jira/browse/HDFS-13946
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 3.1.1
>            Reporter: Yiqun Lin
>            Assignee: Yiqun Lin
>            Priority: Minor
>         Attachments: HDFS-13946.001.patch
>
>
> FSN write/read lock log statement only prints longest lock held interval not its stack
trace during suppress warning interval. Only current thread is printed, but it looks not so
useful. Once NN is slowing down, the most important thing we take care is that which operation
holds longest time of the lock.
>  Following is log printed based on current logic.
> {noformat}
> 2018-09-30 13:56:06,700 INFO [IPC Server handler 119 on 8020] org.apache.hadoop.hdfs.server.namenode.FSNamesystem:
FSNamesystem write lock held for 11 ms via
> java.lang.Thread.getStackTrace(Thread.java:1589)
> org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:945)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystemLock.writeUnlock(FSNamesystemLock.java:198)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.writeUnlock(FSNamesystem.java:1688)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.removeBlocks(FSNamesystem.java:4281)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:4247)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4183)
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4167)
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:848)org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:311)
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:625)
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222)
> java.security.AccessController.doPrivileged(Native Method)
> javax.security.auth.Subject.doAs(Subject.java:415)
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
> org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
>         Number of suppressed write-lock reports: 14
>         Longest write-lock held interval: 70
> {noformat}
> Also it will be good for the trouble shooting.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message