Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 10543 invoked from network); 23 Aug 2007 14:58:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Aug 2007 14:58:55 -0000 Received: (qmail 78282 invoked by uid 500); 23 Aug 2007 14:58:48 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 78234 invoked by uid 500); 23 Aug 2007 14:58:48 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 78213 invoked by uid 99); 23 Aug 2007 14:58:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2007 07:58:48 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2007 14:58:51 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 1DDB371420C for ; Thu, 23 Aug 2007 07:58:31 -0700 (PDT) Message-ID: <31384899.1187881111119.JavaMail.jira@brutus> Date: Thu, 23 Aug 2007 07:58:31 -0700 (PDT) From: "Enis Soztutar (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1158) JobTracker should collect statistics of failed map output fetches, and take decisions to reexecute map tasks and/or restart the (possibly faulty) Jetty server on the TaskTracker In-Reply-To: <7318675.1174848452133.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522165 ] Enis Soztutar commented on HADOOP-1158: --------------------------------------- Here is the code in jetty to print the above warning. {code} log.info("LOW ON THREADS (("+getMaxThreads()+"-"+getThreads()+"+"+getIdleThreads()+")<"+getMinThreads()+") on "+ this); {code} it seems jetty is configured with max threads = 40, isn't it insufficient? > JobTracker should collect statistics of failed map output fetches, and take decisions to reexecute map tasks and/or restart the (possibly faulty) Jetty server on the TaskTracker > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-1158 > URL: https://issues.apache.org/jira/browse/HADOOP-1158 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.12.2 > Reporter: Devaraj Das > Assignee: Arun C Murthy > Fix For: 0.15.0 > > Attachments: HADOOP-1158_20070702_1.patch, HADOOP-1158_2_20070808.patch, HADOOP-1158_3_20070809.patch, HADOOP-1158_4_20070817.patch, HADOOP-1158_5_20070823.patch > > > The JobTracker should keep a track (with feedback from Reducers) of how many times a fetch for a particular map output failed. If this exceeds a certain threshold, then that map should be declared as lost, and should be reexecuted elsewhere. Based on the number of such complaints from Reducers, the JobTracker can blacklist the TaskTracker. This will make the framework reliable - it will take care of (faulty) TaskTrackers that sometimes always fail to serve up map outputs (for which exceptions are not properly raised/handled, for e.g., if the exception/problem happens in the Jetty server). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.