hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3825) Need generalized multi-token filesystem support
Date Tue, 07 Feb 2012 21:41:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13202824#comment-13202824

Daryn Sharp commented on MAPREDUCE-3825:

This jira was filed after discussions with Sid on other token/viewfs jiras.  There are multiple
problems that this jira and the linked jira in common are trying to address:
# {{FileSystem#getDelegationTokens(String, Credentials)}} will fetch duplicate tokens.
# {{ViewFileSystem#getDelegationTokens(String)}} will fetch duplicate tokens, ie. a token
for every mount point even if the filesystems are identical.
# {{ViewFileSystem#getDelegationTokens(String, Credentials)}} will skip filesystems w/o a
serviceName, even though that means that the filesystem doesn't have a token, although it
may be filtering a filesystem that does have tokens.
# {{ViewFileSystem#getDelegationTokens(String, Credentials)}} calls {{targetFileSystem.getDelegationTokens(String)}}
which may acquire duplicate tokens, even for the services that viewfs thinks it's already
# {{ViewFileSystem#getDelegationTokens(String, Credentials}} will acquire multiple duplicate
tokens for a filtered filesystem & its contained filesystem because it checks the service
of the filtered fs, not the contained fs.  A duplicate token will be acquired for every path.
 Although not implemented, unionfs will exasperate the multiple duplicate tokens.
# {{TokenCache}} thinks the viewfs authority is a service name so it tries to resolve it as
a hostname:port tuple and fails.
# {{TokenCache}} assumes a 1 to 1 mapping between a filesystem's service and its token which
is broken for a 1 to many token filesystem.  This causes {{TokenCache}} to repeatedly fetch
tokens from a multi-token filesystem because it never gets a token with the expected service.

Those are the issues that I can recall off the top of my head.  The approach I've taken is:
* Allow the retrieval of unique set of filesystems used
* Query each filesystem only once
* Never retrieve duplicate tokens because the list is unique
* Solve the 1 to many problem in {{TokenCache}}
* Fix the filtered filesystem issue by querying its underlying filesystem
* Fix the viewfs mount table problem by only requesting tokens from the mounted filesystems
* Complete backwards compatibility with the existing api

The current model is too complex and won't scale.  It arguably "works" in simple cases, but
it acquires multiple tokens, errors out if the authority isn't a service, violates the contract
that null service is no token, and won't work with more complex layering of filesystems. 
By flattening out the list of filesystems, the {{ViewFileSystem}} implementation is dramatically
simpler, and it will handle all types of filesystem layering w/o acquiring multiple tokens.

bq. When we changed from getDelegationToken() to getDelegationTokens() we had dismissed the
alternate you are proposing since we needed a method to get delegation token from a file system

I'm not sure what this means.  If you are referring to token renewal, this jira is completely
orthogonal and is not trying to implement any of those proposals.  (Although ironically, at
that time I wanted to implement {{getDelegationTokens}} but told there was no use case...)
 Again though, I believe I've maintained complete backwards compatibility.

> Need generalized multi-token filesystem support
> -----------------------------------------------
>                 Key: MAPREDUCE-3825
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3825
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
> This is the counterpart to HADOOP-7967.  The token cache currently tries to assume a
filesystem's token service key.  The assumption generally worked while there was a one to
one mapping of filesystem to token.  With the advent of multi-token filesystems like viewfs,
the token cache will try to use a service key (ie. for viewfs) that will never exist (because
it really gets the mounted fs tokens).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message