drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Padma Penumarthy (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.
Date Wed, 02 Nov 2016 20:53:59 GMT
Padma Penumarthy created DRILL-4990:
---------------------------------------

             Summary: Use new HDFS API access instead of listStatus to check if users have
permissions to access workspace.
                 Key: DRILL-4990
                 URL: https://issues.apache.org/jira/browse/DRILL-4990
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 1.8.0
            Reporter: Padma Penumarthy
            Assignee: Padma Penumarthy
             Fix For: 1.9.0


For every query, we build the schema tree (runSQL->getPlan->getNewDefaultSchema->getRootSchema).
All workspaces in all storage plugins are checked and are added to the schema tree if they
are accessible by the user who initiated the query.  For file system plugin, listStatus API
is used to check if  the workspace is accessible or not (WorkspaceSchemaFactory.accessible)
by the user.  The idea seem to be if the user does not have access to file(s) in the workspace,
listStatus will generate an exception and we return false. But, listStatus (which lists all
the entries of a directory) is an expensive operation when there are large number of files
in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570) which provides
the ability to check if the user has permissions on a file/directory.  Use this new API instead
of listStatus. For a directory with 256k+ files, an improvement of upto 10 sec in planning
time was observed when using the new API vs. old way of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message