hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-9659) EditLogTailerThread to Active Namenode RPC should timeout
Date Mon, 01 Feb 2016 08:31:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-9659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15125933#comment-15125933
] 

Vinayakumar B commented on HDFS-9659:
-------------------------------------

Oh!. Now I seen the first patch, which is same as latest.

Retries on initial connect exceptions are required, when Active NN machine is down, and only
StandbyNN is started.

Since SNN is a daemon, {{EditLogTailer}} needs to be running regardless of Connect exceptions
until explicit shutdown or failover.
[~drankye], You agree with this.?

+1 on latest patch

I will commit tomorrow, if there is no objection.

> EditLogTailerThread to Active Namenode RPC should timeout
> ---------------------------------------------------------
>
>                 Key: HDFS-9659
>                 URL: https://issues.apache.org/jira/browse/HDFS-9659
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha, namenode
>    Affects Versions: 3.0.0
>            Reporter: Surendra Singh Lilhore
>            Assignee: Surendra Singh Lilhore
>            Priority: Critical
>         Attachments: HDFS-9659.01.patch, HDFS-9659.02.patch, HDFS-9659.patch
>
>
> {{EditLogTailerThread}} to Active {{Namenode}} RPC doesn't have timeout and it’s removed
in HDFS-6440.
> When inject the disk slow and consume system IO to the active name node, the nameservice
can't switch and this is because SNN not able to stop {{EditLogTailerThread}}.
> *Thread dump from SNN*
> {noformat}
> "IPC Server handler 33 on 25000" #118 daemon prio=5 os_prio=0 tid=0x00007f2384409800
nid=0x26c89 in Object.wait() [0x00007f2376ac7000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	at java.lang.Thread.join(Thread.java:1245)
> 	- locked <0x00000006d517f538> (a org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread)
> 	at java.lang.Thread.join(Thread.java:1319)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.stop(EditLogTailer.java:183)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.stopStandbyServices(FSNamesystem.java:1284)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.stopStandbyServices(NameNode.java:1852)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.exitState(StandbyState.java:72)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.HAState.setStateInternal(HAState.java:62)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.setState(StandbyState.java:49)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNode.transitionToActive(NameNode.java:1684)
> {noformat}
> *Thread dump for {{EditLogTailerThread}}*, it is stuck in {{NamenodeProtocolTranslatorPB.rollEditLog()}}
rpc call.
> {noformat}
> "Edit log tailer" #150 prio=5 os_prio=0 tid=0x00007f2395569800 nid=0x26cac in Object.wait()
[0x00007f2374aa7000]
>    java.lang.Thread.State: WAITING (on object monitor)
> 	at java.lang.Object.wait(Native Method)
> 	at java.lang.Object.wait(Object.java:502)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1503)
> 	- locked <0x00000006d581bb90> (a org.apache.hadoop.ipc.Client$Call)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1448)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy16.rollEditLog(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.NamenodeProtocolTranslatorPB.rollEditLog(NamenodeProtocolTranslatorPB.java:148)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:301)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$2.doWork(EditLogTailer.java:298)
> 	at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$MultipleNameNodeProxy.call(EditLogTailer.java:420)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message