hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3825) MR should not be getting duplicate tokens for a MR Job.
Date Mon, 13 Feb 2012 17:58:59 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207012#comment-13207012

Daryn Sharp commented on MAPREDUCE-3825:

bq. The purpose of this JIra is to get delegation tokens without duplication for MR. All the
additional requirements that you state with "I want " "I want" "I want" belong in a different

It's true that the conversation has recently morphed into that, but the original description
is more broad.

>I want this filesystem's delegation token - getDelegationToken(renewer)
I already argued above that i disagree that this is a reasonable API since a multi-filesystem
cannot implement it and furthermore I disagree that a multifilesystem like viewfs should return
The proposal I'm putting forth is: the distinction is that a filesystem _should not_ implement
{{getDelegationToken}} unless the filesystem _itself_ has a token, ie. it needs to make an
authorized connection.  {{ViewFileSystem}} does not make a connection, thus it does not have
a distinct token.  It does however have contained filesystems that makes connections so that's
what {{getDelegationTokens}} operations on.  The method should be common, ie. a filesystem
_will not_ implement this method.

If there isn't a {{getDelegationToken}}, then I believe we are forced into a tightly coupled
design.  Ie. every filesystem needs to bracket their code that obtains a token with copy-n-paste
code.  Is this desirable?  If there is a primitive {{getDelegationToken}}, then the standard/common
code ({{getDelegationTokens}}) to acquire tokens will perform the bracketing, thus confining
the logic to _one_ place where it can be changed in the future.

> I want all delegation tokens used by this filesystem - getDelegationTokens(renewer)
Pass in a an empty credentials to addDTs(renewer, emptyCredentials).

That's what my patch does, but let's say I have a credentials with some tokens.  I want to
acquire any missing tokens, yet I want to know what those tokens are so I can log that I got
them and/or I need to do some special processing on them.  Ie. TokenCache.  How would I do

bq.  I agree that addDTs(renewer, cred) is a weird API, but your getDelegationTokens(renewer,
creds) is equally weird.

It's not mine, it was already there. :)  I argued against it in an earlier jira, but after
further thought, it seems reasonable.

What if we make {{getDelegationToken}} a protected method to avoid external calls?  The only
public facing api will be {{getDelegationTokens(renewer, creds)}}?

> MR should not be getting duplicate tokens for a MR Job.
> -------------------------------------------------------
>                 Key: MAPREDUCE-3825
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3825
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: security
>    Affects Versions: 0.23.1, 0.24.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>         Attachments: MAPREDUCE-3825.patch, TokenCache.pdf
> This is the counterpart to HADOOP-7967.  
> MR gets tokens for all input, output and the default filesystem when a MR job is submitted.

> The APIs in FileSystem make it challenging to avoid duplicate tokens when there are file
systems that have embedded
> filesystems.
> Here is the original description that Daryn wrote: 
> The token cache currently tries to assume a filesystem's token service key.  The assumption
generally worked while there was a one to one mapping of filesystem to token.  With the advent
of multi-token filesystems like viewfs, the token cache will try to use a service key (ie.
for viewfs) that will never exist (because it really gets the mounted fs tokens).
> The descriop

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message