Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 079BF80A1 for ; Mon, 22 Aug 2011 16:01:55 +0000 (UTC) Received: (qmail 60103 invoked by uid 500); 22 Aug 2011 16:01:53 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 59697 invoked by uid 500); 22 Aug 2011 16:01:52 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 59267 invoked by uid 99); 22 Aug 2011 16:01:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 16:01:52 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Aug 2011 16:01:51 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id A009BCA30C for ; Mon, 22 Aug 2011 16:01:30 +0000 (UTC) Date: Mon, 22 Aug 2011 16:01:30 +0000 (UTC) From: "Chris Nauroth (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <636262514.1498.1314028890652.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <896422957.53363.1313772987484.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (MAPREDUCE-2862) Infinite loop in CombineFileInputFormat#getMoreSplits(), with missing blocks MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-2862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088783#comment-13088783 ] Chris Nauroth commented on MAPREDUCE-2862: ------------------------------------------ Sadayuki, thank you for submitting a patch on this. I've been bitten by this one too. This patch would log warnings about "corrupted files". Is it really true that this indicates corruption? My experience has been that I've seen this happen when CombineFileInputFormat tries to read newly written files that have not yet had their first block flushed. This isn't really corruption, so I'm wondering if logging warnings about corrupt files would give a user the wrong impression that the cluster is suffering from corruption. To workaround this, I've been running my jobs with a private patch of CombineFileInputFormat that adds this to the constructor for OneFileInfo: // Bail out if the block has no locations. This guards against an // infinite loop in getMoreSplits. This change is not present in open // source Hadoop. if (oneblock.length <= 0) continue; That prevents these blocks from ever entering the getMoreSplits logic in the first place. If you're interested in that approach instead, let me know, and I'll put the patch together. I'd still need to add a unit test for it too. Thanks again, --Chris > Infinite loop in CombineFileInputFormat#getMoreSplits(), with missing blocks > ---------------------------------------------------------------------------- > > Key: MAPREDUCE-2862 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2862 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Kazuki Ohta > Attachments: MAPREDUCE-2862-warn-and-ignore-corrupted-blocks.patch > > > Hi, we met the infinite loop on CombineFileInputFormat#getMoreSplits(). > At first, we lost some blocks by mis-operation :-(. Then, one job tried to use these missing blocks. At that time getMoreSplits() goes into the infinite loop. > From our investigation, this List could be an empty array. > > https://github.com/apache/hadoop-mapreduce/blob/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java#L363 > Then 'for' loop just after that line does nothing, and entry is not removed from 'blockToNodes'. > Finally this line goes into the infinite loop. > > https://github.com/apache/hadoop-mapreduce/blob/trunk/src/java/org/apache/hadoop/mapreduce/lib/input/CombineFileInputFormat.java#L348 > We're now creating the patch against this problem... -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira