hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10942) Globbing optimizations and regression fix
Date Wed, 13 Aug 2014 18:18:11 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095847#comment-14095847

Colin Patrick McCabe commented on HADOOP-10942:

bq. For the immediate file status, the prior code used to loop over the path components even
if there are globs. In this patch, it does an immediate file status on the full path. This
reduces the overhead for FsShell commands.

You always need to loop when there are globs.  You need to see which children match the glob
and which don't.  I think what you meant to write is "the prior code used to loop over the
path components even if there are *not* globs".

Looping is not a problem, though.  Calling {{listStatus}} or {{fileStatus}} is what generates
RPCs.  And the existing globber code doesn't do that unless it needs to.

A simple way of seeing this is to add a LOG.info statement to {{Globber#listStatus}} and {{Globber#getFileStatus}},
and then try {{hadoop fs \-ls}} on a path without globs.  The only output you will see is
a single call to {{getFileStatus}}, because that's the only call that's needed.  The internal
looping that it does inside the function is not important because most loop iterations don't
generate an RPC.

> Globbing optimizations and regression fix
> -----------------------------------------
>                 Key: HADOOP-10942
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10942
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HADOOP-10942.patch
> When globbing was commonized to support both filesystem and filecontext, it regressed
a fix that prevents an intermediate glob that matches a file from throwing a confusing permissions
exception.  The hdfs traverse check requires the exec bit which a file does not have.
> Additional optimizations to reduce rpcs actually increases them if directories contain
1 item.

This message was sent by Atlassian JIRA

View raw message