hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "xieguiming (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2386) TT jetty server stuck in tight loop around epoll_wait
Date Sun, 27 May 2012 04:47:25 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284104#comment-13284104
] 

xieguiming commented on MAPREDUCE-2386:
---------------------------------------

Hi:
On my cluster, one TT also stuck. It's not responding to any HTTP connections 

1> the thread stack info:

"1989360587@qtp-1863318328-0 - Acceptor0 SelectChannelConnector@0.0.0.0:10060" prio=10 tid=0x00007fb9fc2a6800
nid=0x612e runnable [0x00007fba0015b000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method)
	at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210)
	at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69)
	- locked <0x00007fba14758c70> (a sun.nio.ch.Util$1)
	- locked <0x00007fba14758c58> (a java.util.Collections$UnmodifiableSet)
	- locked <0x00007fba124d8aa8> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88)
	at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:652)
	at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192)
	at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124)
	at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708)
	at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)

2> I use netstat cmd to check the 50060 port state, and find 83 connections are on CLOSE_WAIT
or SYN_RECV state.
tcp        0      0 172.16.4.7:50060        172.16.4.6:52526        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:41380        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:41908        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52495        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:39167        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:38799        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52416        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:47010        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:42449        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:50107        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52558        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52402        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.6:52085        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:45092        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:41542        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:55977        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.4:43743        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:42118        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:44535        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:41890        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:56001        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.5:42057        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.3:56121        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:39173        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.8:38937        SYN_RECV    
tcp        0      0 172.16.4.7:50060        172.16.4.2:44992        SYN_RECV    
tcp      129      0 :::50060                :::*                    LISTEN      
tcp      243      0 172.16.4.7:50060        172.16.4.7:35878        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50557        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33735        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:40670        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45702        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50653        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50538        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:48535        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52049        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45529        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38282        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51933        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33008        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50188        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47068        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50638        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50629        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50676        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.4:45076        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:37301        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:35873        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33733        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45487        CLOSE_WAIT  
tcp        1      0 172.16.4.7:50060        172.16.4.8:47078        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51939        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50578        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50630        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:35526        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:57037        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52755        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:51096        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50207        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51951        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:35876        CLOSE_WAIT  
tcp        1      0 172.16.4.7:50060        172.16.4.4:42804        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52771        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52110        CLOSE_WAIT  
tcp        1      0 172.16.4.7:50060        172.16.4.4:42686        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45688        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50590        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:48497        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:37370        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33010        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:51908        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33003        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45469        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33002        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33737        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50198        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52746        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47067        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:37300        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50705        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38319        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:47550        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.1:56333        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52004        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47065        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52814        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33739        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33734        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47069        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:47063        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38392        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:50716        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.4:45128        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:38317        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33007        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33006        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.8:33736        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:49722        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50185        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:52820        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45273        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:49730        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:49957        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.6:47477        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45720        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52011        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52079        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.3:50583        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.7:52037        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.5:45437        CLOSE_WAIT  
tcp      243      0 172.16.4.7:50060        172.16.4.2:50168        CLOSE_WAIT  

                
> TT jetty server stuck in tight loop around epoll_wait
> -----------------------------------------------------
>
>                 Key: MAPREDUCE-2386
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2386
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.23.0
>         Environment: RHEL 6.0 "Santiago"
>            Reporter: Todd Lipcon
>
> In some load testing, I got a TaskTracker into a state where its Jetty server is in a
tight loop calling epoll_wait, which is returning EINVAL:
> [pid 19573] epoll_wait(157, 40829000, 8192, 0) = -1 EINVAL (Invalid argument)
> It's not responding to any HTTP connections - connections are accepted and then just
hang.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message