hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 邓飞 (JIRA) <j...@apache.org>
Subject [jira] [Commented] (HDFS-9293) FSEditLog's 'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which may cause standby NN too busy to communicate
Date Fri, 23 Oct 2015 09:37:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14970720#comment-14970720
] 

邓飞 commented on HDFS-9293:
--------------------------

thank Walter, it's my mistake,that fixed at 2.7.1

> FSEditLog's  'OpInstanceCache' instance of threadLocal cache exists dirty 'rpcId',which
may cause standby NN too busy  to communicate 
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-9293
>                 URL: https://issues.apache.org/jira/browse/HDFS-9293
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>    Affects Versions: 2.2.0, 2.7.1
>            Reporter: 邓飞
>            Assignee: 邓飞
>
>   In our cluster (hadoop 2.2.0-HA,700+ DN),we found standby NN tail editlog slowly,and
hold the fsnamesystem writelock during the work and the DN's heartbeart/blockreport IPC request
blocked.Lead to Active NN remove stale DN which can't send heartbeat  because blocking at
process Standby NN Regiest common(FIXED at 2.7.1).
>   Below is the standby NN  stack:
> "Edit log tailer" prio=10 tid=0x00007f28fcf35800 nid=0x1a7d runnable [0x00007f0dd1d76000]
>    java.lang.Thread.State: RUNNABLE
> 	at java.util.PriorityQueue.remove(PriorityQueue.java:360)
> 	at org.apache.hadoop.util.LightWeightCache.put(LightWeightCache.java:217)
> 	at org.apache.hadoop.ipc.RetryCache.addCacheEntry(RetryCache.java:270)
> 	- locked <0x00007f12817714b8> (a org.apache.hadoop.ipc.RetryCache)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.addCacheEntry(FSNamesystem.java:724)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.applyEditLogOp(FSEditLogLoader.java:406)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:199)
> 	at org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:112)
> 	at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:733)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:227)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:321)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:279)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:296)
> 	at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:456)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:292)
>    
>     When apply editLogOp,if the IPC retryCache is found,need  to remove the previous
from priorityQueue(O(N)), The updateblock is don't  need record rpcId on editlog except  'client
request updatePipeline',but we found many 'UpdateBlocksOp' has repeat ipcId.
>      
>   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message