Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3E8B810680 for ; Wed, 12 Feb 2014 21:58:33 +0000 (UTC) Received: (qmail 98686 invoked by uid 500); 12 Feb 2014 21:58:26 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 98562 invoked by uid 500); 12 Feb 2014 21:58:25 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 98446 invoked by uid 99); 12 Feb 2014 21:58:21 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Feb 2014 21:58:21 +0000 Date: Wed, 12 Feb 2014 21:58:21 +0000 (UTC) From: "Jason Dere (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-5756) FileInputFormat.listStatus() including directories in its results MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899641#comment-13899641 ] Jason Dere commented on MAPREDUCE-5756: --------------------------------------- In the 2.x code, isn't that what the recursive flag is there for (mapreduce.input.fileinputformat.input.dir.recursive), to recurse into directories if needed? If the generated input splits include a directory, it looks like this causes the job to fail because it's expecting a file as opposed to a directory. Is the onus then on the caller of listStatus() to go through the file list and remove any directories that were included? Looks like the recursive stuff (with lots of discussion) was added in MAPREDUCE-3193. > FileInputFormat.listStatus() including directories in its results > ----------------------------------------------------------------- > > Key: MAPREDUCE-5756 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Jason Dere > > Trying to track down HIVE-6401, where we see some "is not a file" errors because getSplits() is giving us directories. I believe the culprit is FileInputFormat.listStatus(): > {code} > if (recursive && stat.isDirectory()) { > addInputPathRecursively(result, fs, stat.getPath(), > inputFilter); > } else { > result.add(stat); > } > {code} > Which seems to be allowing directories to be added to the results if recursive is false. Is this meant to return directories? If not, I think it should look like this: > {code} > if (stat.isDirectory()) { > if (recursive) { > addInputPathRecursively(result, fs, stat.getPath(), > inputFilter); > } > } else { > result.add(stat); > } > {code} -- This message was sent by Atlassian JIRA (v6.1.5#6160)