hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Wang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-9981) Listing in RawLocalFileSystem is inefficient
Date Thu, 19 Sep 2013 22:37:53 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-9981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13772403#comment-13772403

Andrew Wang commented on HADOOP-9981:

Hey Kihwal,

We could definitely use {{lstat(2)}} system call instead of {{stat(1)}}, it just requires
a bit more work. {{lstat(2)}} only provides the uid and gid, so these would need to be translated
into string names via {{getpwnam}} and {{getgrname}}. It'd also require calling {{readlink(2)}}
to get the link target if it's a link. I agree we can keep the existing behavior as a fallback
for not having native code.

As a side note, all of this should really be a fallback for JDK7's lstat, which we theoretically
should be allowed to use since JDK6 is EOL.
> Listing in RawLocalFileSystem is inefficient
> --------------------------------------------
>                 Key: HADOOP-9981
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9981
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.3.0
>            Reporter: Kihwal Lee
>            Priority: Critical
> After HADOOP-9652, listStatus() or globStatus() calls against a local file system directory
is very slow.  A user was loading data from local file system to Hive and it took about 30
seconds. The same operation took less than a second pre-HADOOP-9652. 
> The input path had many other files beside the input files and strace showed that fork
& exec of stat against each and every one of them. jstack confirmed that this was being
done from getNativeFileLinkStatus().

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message