From hadoop-dev-return-12670-apmail-lucene-hadoop-dev-archive=lucene.apache.org@lucene.apache.org Fri Jun 01 18:27:42 2007 Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 73195 invoked from network); 1 Jun 2007 18:27:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Jun 2007 18:27:40 -0000 Received: (qmail 19678 invoked by uid 500); 1 Jun 2007 18:27:42 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 19211 invoked by uid 500); 1 Jun 2007 18:27:41 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 19195 invoked by uid 99); 1 Jun 2007 18:27:41 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jun 2007 11:27:41 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jun 2007 11:27:36 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 0E01C71418E for ; Fri, 1 Jun 2007 11:27:16 -0700 (PDT) Message-ID: <20705481.1180722436054.JavaMail.jira@brutus> Date: Fri, 1 Jun 2007 11:27:16 -0700 (PDT) From: "Doug Cutting (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1431) Map tasks can't timeout for failing to call progress In-Reply-To: <13907972.1180066936133.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500800 ] Doug Cutting commented on HADOOP-1431: -------------------------------------- So, to be conservative, let's get rid of the join and just go with interrupt. Thus, the method should look something like: {code} private void sortWithProgress() { Thread progress = createProgressThread(umbilical); try { sortAndSpillToDisk(); } finally { progress.interrupt(); } } {code} We need not define a nested class, we can re-use the existing createProgressThread() method without alteration. Does that sound good? > Map tasks can't timeout for failing to call progress > ---------------------------------------------------- > > Key: HADOOP-1431 > URL: https://issues.apache.org/jira/browse/HADOOP-1431 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.13.0 > Reporter: Owen O'Malley > Assignee: Arun C Murthy > Priority: Blocker > Fix For: 0.13.0 > > Attachments: HADOOP-1431_1_20070525.patch, HADOOP-1431_2_20070530.patch, HADOOP-1431_3_20070601.patch > > > Currently the map task runner creates a thread that calls progress every second to keep the system from killing the map if the sort takes too long. This is the wrong approach, because it will cause stuck tasks to not be killed. The right solution is to have the sort call progress as it actually makes progress. This is part of what is going on in HADOOP-1374. A map gets stuck at 100% progress, but not done. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.