hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9981) Listing in RawLocalFileSystem is inefficient
Date Wed, 25 Sep 2013 17:54:05 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13777805#comment-13777805

Colin Patrick McCabe commented on HADOOP-9981:

bq. lines longer than 80 chars


bq. Let's use @Ignore annotations on the tests instead of removing them, I assume we want
to add them back in eventually?

good idea.

bq. Our symlink resolution right now is inconsistent: listStatus does not resolve results,
getFileStatus does...

Let's tackle the symlinks semantics for globStatus as part of HADOOP-9984.  This JIRA is just
about the efficiency concerns.

bq. Also noticed that we have a little duplication in TestGlobPaths: trueFilter is the same
as AcceptAllFilter. AcceptPathsEndingInZ is also only used in the removed test.

I added back in the tests with an {{ignore}}, and made {{trueFilter}} an instance of {{AcceptAllFilter}}.
> Listing in RawLocalFileSystem is inefficient
> --------------------------------------------
>                 Key: HADOOP-9981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9981
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Kihwal Lee
>            Assignee: Colin Patrick McCabe
>            Priority: Critical
>         Attachments: HADOOP-9981.001.patch, HADOOP-9981.002.patch
> After HADOOP-9652, listStatus() or globStatus() calls against a local file system directory
is very slow.  A user was loading data from local file system to Hive and it took about 30
seconds. The same operation took less than a second pre-HADOOP-9652. 
> The input path had many other files beside the input files and strace showed that fork
& exec of stat against each and every one of them. jstack confirmed that this was being
done from getNativeFileLinkStatus().

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message