Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DC6EEC089 for ; Sun, 27 May 2012 04:47:29 +0000 (UTC) Received: (qmail 65540 invoked by uid 500); 27 May 2012 04:47:29 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 65324 invoked by uid 500); 27 May 2012 04:47:26 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 65290 invoked by uid 99); 27 May 2012 04:47:25 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 May 2012 04:47:25 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id 58FE81402B5 for ; Sun, 27 May 2012 04:47:25 +0000 (UTC) Date: Sun, 27 May 2012 04:47:25 +0000 (UTC) From: "xieguiming (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1493117475.6599.1338094045366.JavaMail.jiratomcat@issues-vm> In-Reply-To: <1550908976.5965.1300251449584.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-2386) TT jetty server stuck in tight loop around epoll_wait MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13284104#comment-13284104 ] xieguiming commented on MAPREDUCE-2386: --------------------------------------- Hi: On my cluster, one TT also stuck. It's not responding to any HTTP connections 1> the thread stack info: "1989360587@qtp-1863318328-0 - Acceptor0 SelectChannelConnector@0.0.0.0:10060" prio=10 tid=0x00007fb9fc2a6800 nid=0x612e runnable [0x00007fba0015b000] java.lang.Thread.State: RUNNABLE at sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) at sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:210) at sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:65) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:69) - locked <0x00007fba14758c70> (a sun.nio.ch.Util$1) - locked <0x00007fba14758c58> (a java.util.Collections$UnmodifiableSet) - locked <0x00007fba124d8aa8> (a sun.nio.ch.EPollSelectorImpl) at sun.nio.ch.SelectorImpl.selectNow(SelectorImpl.java:88) at org.mortbay.io.nio.SelectorManager$SelectSet.doSelect(SelectorManager.java:652) at org.mortbay.io.nio.SelectorManager.doSelect(SelectorManager.java:192) at org.mortbay.jetty.nio.SelectChannelConnector.accept(SelectChannelConnector.java:124) at org.mortbay.jetty.AbstractConnector$Acceptor.run(AbstractConnector.java:708) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) 2> I use netstat cmd to check the 50060 port state, and find 83 connections are on CLOSE_WAIT or SYN_RECV state. tcp 0 0 172.16.4.7:50060 172.16.4.6:52526 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.3:41380 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.5:41908 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.6:52495 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.8:39167 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.8:38799 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.6:52416 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.6:47010 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.5:42449 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.2:50107 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.6:52558 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.6:52402 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.6:52085 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.2:45092 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.3:41542 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.3:55977 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.4:43743 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.5:42118 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.2:44535 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.3:41890 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.3:56001 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.5:42057 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.3:56121 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.8:39173 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.8:38937 SYN_RECV tcp 0 0 172.16.4.7:50060 172.16.4.2:44992 SYN_RECV tcp 129 0 :::50060 :::* LISTEN tcp 243 0 172.16.4.7:50060 172.16.4.7:35878 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:50557 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33735 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:40670 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45702 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50653 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50538 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:48535 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:52049 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45529 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:38282 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:51933 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33008 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.2:50188 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:47068 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50638 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:50629 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50676 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.4:45076 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:37301 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:35873 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33733 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45487 CLOSE_WAIT tcp 1 0 172.16.4.7:50060 172.16.4.8:47078 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:51939 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50578 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:50630 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.1:35526 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.1:57037 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:52755 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.1:51096 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.2:50207 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:51951 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:35876 CLOSE_WAIT tcp 1 0 172.16.4.7:50060 172.16.4.4:42804 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:52771 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:52110 CLOSE_WAIT tcp 1 0 172.16.4.7:50060 172.16.4.4:42686 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45688 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50590 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:48497 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:37370 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33010 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:51908 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33003 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45469 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33002 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33737 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.2:50198 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:52746 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:47067 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:37300 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50705 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:38319 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:47550 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.1:56333 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:52004 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:47065 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:52814 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33739 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33734 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:47069 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:47063 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:38392 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:50716 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.4:45128 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:38317 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33007 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33006 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.8:33736 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.2:49722 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.2:50185 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:52820 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45273 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.2:49730 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:49957 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.6:47477 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45720 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:52011 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:52079 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.3:50583 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.7:52037 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.5:45437 CLOSE_WAIT tcp 243 0 172.16.4.7:50060 172.16.4.2:50168 CLOSE_WAIT > TT jetty server stuck in tight loop around epoll_wait > ----------------------------------------------------- > > Key: MAPREDUCE-2386 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2386 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 0.23.0 > Environment: RHEL 6.0 "Santiago" > Reporter: Todd Lipcon > > In some load testing, I got a TaskTracker into a state where its Jetty server is in a tight loop calling epoll_wait, which is returning EINVAL: > [pid 19573] epoll_wait(157, 40829000, 8192, 0) = -1 EINVAL (Invalid argument) > It's not responding to any HTTP connections - connections are accepted and then just hang. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira