Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 37893 invoked from network); 7 May 2007 13:52:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 May 2007 13:52:44 -0000 Received: (qmail 20204 invoked by uid 500); 7 May 2007 13:52:43 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 20173 invoked by uid 500); 7 May 2007 13:52:42 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 20146 invoked by uid 99); 7 May 2007 13:52:42 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2007 06:52:42 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 May 2007 06:52:35 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8080C714065 for ; Mon, 7 May 2007 06:52:15 -0700 (PDT) Message-ID: <17960210.1178545935523.JavaMail.jira@brutus> Date: Mon, 7 May 2007 06:52:15 -0700 (PDT) From: "Hadoop QA (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1252) Disk problems should be handled better by the MR framework In-Reply-To: <32440618.1176383252235.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12494042 ] Hadoop QA commented on HADOOP-1252: ----------------------------------- +1 http://issues.apache.org/jira/secure/attachment/12356859/1252.may7.patch applied and successfully tested against trunk revision r534975. Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/121/testReport/ Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/121/console > Disk problems should be handled better by the MR framework > ---------------------------------------------------------- > > Key: HADOOP-1252 > URL: https://issues.apache.org/jira/browse/HADOOP-1252 > Project: Hadoop > Issue Type: Bug > Components: mapred > Affects Versions: 0.12.3 > Reporter: Devaraj Das > Assigned To: Devaraj Das > Fix For: 0.13.0 > > Attachments: 1252.may7.patch, 1252.new.patch, 1252.patch, 1252.patch > > > The MR framework should recover from Disk Failure problems without causing jobs to hang. Note that this issue is about a short-term solution to solving the problem. For example, by looking at the code and improving the exception handling (to better detect faulty disks and missing files). The long term approach might be to have a FS layer that takes care of failed disks and makes it transparent to the tasks. That will be a separate issue by itself. > Some of the issues that have been reported are HADOOP-1087 and a comment by Koji on HADOOP-1200 (not sure whether those are all). Please add to this issue as much details as possible on disk failures leading to hung jobs (details like relevant exception traces, way to reproduce, etc.). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.