hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5756) FileInputFormat.listStatus() including directories in its results
Date Wed, 12 Feb 2014 21:58:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13899641#comment-13899641
] 

Jason Dere commented on MAPREDUCE-5756:
---------------------------------------

In the 2.x code, isn't that what the recursive flag is there for (mapreduce.input.fileinputformat.input.dir.recursive),
to recurse into directories if needed?
If the generated input splits include a directory, it looks like this causes the job to fail
because it's expecting a file as opposed to a directory.  Is the onus then on the caller of
listStatus() to go through the file list and remove any directories that were included?

Looks like the recursive stuff (with lots of discussion) was added in MAPREDUCE-3193.

> FileInputFormat.listStatus() including directories in its results
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-5756
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5756
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Jason Dere
>
> Trying to track down HIVE-6401, where we see some "is not a file" errors because getSplits()
is giving us directories.  I believe the culprit is FileInputFormat.listStatus():
> {code}
>                 if (recursive && stat.isDirectory()) {
>                   addInputPathRecursively(result, fs, stat.getPath(),
>                       inputFilter);
>                 } else {
>                   result.add(stat);
>                 }
> {code}
> Which seems to be allowing directories to be added to the results if recursive is false.
 Is this meant to return directories? If not, I think it should look like this:
> {code}
>                 if (stat.isDirectory()) {
>                  if (recursive) {
>                   addInputPathRecursively(result, fs, stat.getPath(),
>                       inputFilter);
>                  }
>                 } else {
>                   result.add(stat);
>                 }
> {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message