hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashwin Shankar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6107) Job history server becomes unresponsive due to stuck thread in epollWait
Date Wed, 11 Feb 2015 22:36:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14317121#comment-14317121
] 

Ashwin Shankar commented on MAPREDUCE-6107:
-------------------------------------------

[~suma.shivaprasad], we havent deployed the image with HDFS-7005 yet, so I can't comment.
Also we haven't seen this issue come up again in the image without the fix.

> Job history server becomes unresponsive due to stuck thread in epollWait
> ------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6107
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6107
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 2.4.0
>            Reporter: Ashwin Shankar
>         Attachments: jstack.log
>
>
> About once every week, we see job history server becomes unresponsive on one of our 2000
node hadoop cluster. Looking at the thread dump, I see that multiple threads are blocked on
locks acquired by couple of threads, which in turn are endlessly stuck in epollWait while
talking to hdfs to get a history file.
> When the number of blocked threads touches the thread pool size, JHS becomes unresponsive
to new clients requests.
> Thread dump attached.
> Has anyone seen this before ?
> Here is the thread stuck at epollWait.
> {code}
> "IPC Server handler 4 on 10020" daemon prio=10 tid=0x00007f7eb10f5000 nid=0x144d runnable
[0x00007f7e9108d000]
>    java.lang.Thread.State: RUNNABLE
>         at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
>         at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269)
>         at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79)
>         at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87)
>         - locked <0x00000006c89d3240> (a sun.nio.ch.Util$2)
>         - locked <0x00000006c89d3228> (a java.util.Collections$UnmodifiableSet)
>         - locked <0x00000006bb32f8b8> (a sun.nio.ch.EPollSelectorImpl)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message