drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4990) Use new HDFS API access instead of listStatus to check if users have permissions to access workspace.
Date Mon, 06 Nov 2017 04:31:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16239893#comment-16239893
] 

ASF GitHub Bot commented on DRILL-4990:
---------------------------------------

Github user sohami commented on a diff in the pull request:

    https://github.com/apache/drill/pull/652#discussion_r148991210
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/store/dfs/WorkspaceSchemaFactory.java
---
    @@ -151,17 +152,32 @@ public WorkspaceSchemaFactory(
        */
       public boolean accessible(final String userName) throws IOException {
         final FileSystem fs = ImpersonationUtil.createFileSystem(userName, fsConf);
    +    boolean tryListStatus = false;
         try {
    -      // We have to rely on the listStatus as a FileSystem can have complicated controls
such as regular unix style
    -      // permissions, Access Control Lists (ACLs) or Access Control Expressions (ACE).
Hadoop 2.7 version of FileSystem
    -      // has a limited private API (FileSystem.access) to check the permissions directly
    -      // (see https://issues.apache.org/jira/browse/HDFS-6570). Drill currently relies
on Hadoop 2.5.0 version of
    -      // FileClient. TODO: Update this when DRILL-3749 is fixed.
    -      fs.listStatus(wsPath);
    +      // access API checks if a user has certain permissions on a file or directory.
    +      // returns normally if requested permissions are granted and throws an exception
    +      // if access is denied. This API was added in HDFS 2.6 (see HDFS-6570).
    +      // It is less expensive (than listStatus which was being used before) and hides
the
    +      // complicated access control logic underneath.
    +      fs.access(wsPath, FsAction.READ);
         } catch (final UnsupportedOperationException e) {
    -      logger.trace("The filesystem for this workspace does not support this operation.",
e);
    +      logger.debug("The filesystem for this workspace does not support access operation.",
e);
    +      tryListStatus = true;
         } catch (final FileNotFoundException | AccessControlException e) {
    -      return false;
    +      logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e);
    +      tryListStatus = true;
    +    }
    +
    +    // if fs.access fails for some reason, fall back to listStatus.
    +    if (tryListStatus) {
    +      try {
    +        fs.listStatus(wsPath);
    +      } catch (final UnsupportedOperationException e) {
    +        logger.debug("The filesystem for this workspace does not support listStatus operation.",
e);
    +      } catch (final FileNotFoundException | AccessControlException e) {
    +        logger.debug("file {} not found or cannot be accessed", wsPath.toString(), e);
    --- End diff --
    
    let's add `using listStatus` in the log statement to differentiate with previous case


> Use new HDFS API access instead of listStatus to check if users have permissions to access
workspace.
> -----------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4990
>                 URL: https://issues.apache.org/jira/browse/DRILL-4990
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.8.0
>            Reporter: Padma Penumarthy
>            Assignee: Padma Penumarthy
>
> For every query, we build the schema tree (runSQL->getPlan->getNewDefaultSchema->getRootSchema).
All workspaces in all storage plugins are checked and are added to the schema tree if they
are accessible by the user who initiated the query.  For file system plugin, listStatus API
is used to check if  the workspace is accessible or not (WorkspaceSchemaFactory.accessible)
by the user.  The idea seem to be if the user does not have access to file(s) in the workspace,
listStatus will generate an exception and we return false. But, listStatus (which lists all
the entries of a directory) is an expensive operation when there are large number of files
in the directory. A new API is added in Hadoop 2.6 called access (HDFS-6570) which provides
the ability to check if the user has permissions on a file/directory.  Use this new API instead
of listStatus. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message