hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3825) Need generalized multi-token filesystem support
Date Fri, 10 Feb 2012 15:58:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13205518#comment-13205518
] 

Daryn Sharp commented on MAPREDUCE-3825:
----------------------------------------

I'm open to alternatives, but performing the elimination of dups is actually pretty simple:
{code}
  static void obtainTokensForNamenodesInternal(Credentials credentials,
       Path[] ps, Configuration conf) throws IOException {
--- start new code ---
    // use 2 passes to avoid redundant calls to the same filesystems
    // start by getting unique set of filesystems for all paths
    Set<FileSystem> pathFsSet = new HashSet<FileSystem>();
    for (Path p : ps) {
      pathFsSet.add(p.getFileSystem(conf));
    }
    // get the unique set of leaf filesystems
    Set<FileSystem> tokenFsSet = new HashSet<FileSystem>();
    for (FileSystem fs : pathFsSet) {
      tokenFsSet.addAll(fs.getFileSystems());
    }
--- end new code ---
    // get all the tokens from the now flattened list of leaf filesystems
    for (FileSystem fs : tokenFsSet) {
      obtainTokensForNamenodesPrivate(fs, credentials, conf);
    }
  }
{code}

If many files are in the same filesystem, then a lot of necessary processing occurs, esp.
in the case of viewfs.

I may be misunderstanding this variation, but the acquisition of tokens via recursive calls
will require more changes that may break non-hadoop distributed filesystems.  I think it will
require code duplication of the default {{getDelegationTokens(renewer, creds)}}, or a new
api that overrides of this method can use to avoid getting dups.  The proposed default implementation
of {{FileSystem#getDelegations(renewer, creds)}} simply iterates {{this.getFileSystems()}}
too.  I'll write something up and then we can discuss a little more.
                
> Need generalized multi-token filesystem support
> -----------------------------------------------
>
>                 Key: MAPREDUCE-3825
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3825
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: MAPREDUCE-3825.patch
>
>
> This is the counterpart to HADOOP-7967.  The token cache currently tries to assume a
filesystem's token service key.  The assumption generally worked while there was a one to
one mapping of filesystem to token.  With the advent of multi-token filesystems like viewfs,
the token cache will try to use a service key (ie. for viewfs) that will never exist (because
it really gets the mounted fs tokens).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message