Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 19340 invoked from network); 2 May 2006 21:30:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 May 2006 21:30:24 -0000 Received: (qmail 4724 invoked by uid 500); 2 May 2006 21:30:23 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 4598 invoked by uid 500); 2 May 2006 21:30:23 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 4589 invoked by uid 99); 2 May 2006 21:30:22 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 May 2006 14:30:22 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [209.237.227.198] (HELO brutus.apache.org) (209.237.227.198) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 May 2006 14:30:20 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id BBFBF714292 for ; Tue, 2 May 2006 21:29:48 +0000 (GMT) Message-ID: <3588072.1146605388767.JavaMail.jira@brutus> Date: Tue, 2 May 2006 21:29:48 +0000 (GMT+00:00) From: "stack@archive.org (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Created: (HADOOP-190) Job fails though task succeeded if we fail to exit MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Job fails though task succeeded if we fail to exit -------------------------------------------------- Key: HADOOP-190 URL: http://issues.apache.org/jira/browse/HADOOP-190 Project: Hadoop Type: Bug Reporter: stack@archive.org This is an odd case. Main cause will be programmer error but I suppose it could happen during normal processing. Whichever, would be grand if hadoop was better able to deal. My map task completed 'successfully' but because I had started threads inside in my task that were not set to be of daemon type that under certain circumstances were left running, my child stuck around after reporting 'done' -- the JVM wouldn't go down while non-daemon threads still running. After ten minutes, TT steps in, kills the child and does cleanup of the successful output. Because JT has been told the task completed successfully, reducers keep showing up looking for the output now removed -- until the job fails. Below is illustration of the problem using log output: .... 060501 090401 task_0001_m_000798_0 0.99491096% adding http://www.score.umd.edu/a um.jpg 24891 image/jpeg 060501 090401 task_0001_m_000798_0 1.0% adding http://www.score.umd.edu/album.jp 24891 image/jpeg 060501 090401 Task task_0001_m_000798_0 is done. ... 060501 091410 task_0001_m_000798_0: Task failed to report status for 608 seconds Killing. .... 060501 091410 Calling cleanup because was killed or FAILED task_0001_m_000798_0 060501 091410 task_0001_m_000798_0 done; removing files. Then, subsequently.... 060501 091422 SEVERE Can't open map output:/1/hadoop/tmp/task_0001_m_000798_0/pa -12.out java.io.FileNotFoundException: LocalFS ... and on and on. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira