Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 244789B75 for ; Thu, 16 Feb 2012 23:29:13 +0000 (UTC) Received: (qmail 34901 invoked by uid 500); 16 Feb 2012 23:29:09 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 34537 invoked by uid 500); 16 Feb 2012 23:29:09 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 33833 invoked by uid 99); 16 Feb 2012 23:29:08 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2012 23:29:08 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Feb 2012 23:23:18 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 09E7B1BBB95 for ; Thu, 16 Feb 2012 23:22:58 +0000 (UTC) Date: Thu, 16 Feb 2012 23:22:58 +0000 (UTC) From: "Todd Lipcon (Commented) (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <1829183694.48770.1329434578041.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <688141835.26222.1328912219431.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-3851) Allow more aggressive action on detection of the jetty issue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209881#comment-13209881 ] Todd Lipcon commented on MAPREDUCE-3851: ---------------------------------------- Ratio over the last N requests sound good. I was worried about false triggering - eg if you get one random exception once every 100000 calls, but your TT is up for months, it will eventually crash with the other approach. > Allow more aggressive action on detection of the jetty issue > ------------------------------------------------------------ > > Key: MAPREDUCE-3851 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3851 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.0.0 > Reporter: Kihwal Lee > Assignee: Thomas Graves > Fix For: 1.1.0, 1.0.1 > > Attachments: MAPREDUCE-3851.patch > > > MAPREDUCE-2529 added the useful failure detection mechanism. In this jira, I propose we add a periodic check inside TT and configurable action to self-destruct. Blacklisting helps but is not enough. Hung jetty still accepts connection and it takes very long time for clients to fail out. Short jobs are delayed for hours because of this. This feature will be a nice companion to MAPREDUCE-3184. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira