hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2764) Fix renewal of dfs delegation tokens
Date Wed, 31 Aug 2011 18:10:10 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13094738#comment-13094738

Daryn Sharp commented on MAPREDUCE-2764:

Yes, the patch was hurried to be a demonstration of the latest approach I am pursuing.   
As posted, tests also don't pass.

bq. When delegation tokens were added, there was agreement that adding those methods to FileSystem
wasn't appropriate. We can re-open that discussion, but it needs to be discussed more widely.

I removed {{FileSystem.getDelegationToken}} after you expressed displeasure with it ever being
there.  I can re-add it for now if you feel a larger debate is required?

bq. I also assume that the change to fixFontPaths is accidental.

Yes, I didn't mean to include it in the patch.  It allows forrest to run on java 6.

bq. I also like the isRenewable method, although it also controls whether it is cancelable
so probably needs a better name like isManaged.


bq. Please don't remove InterruptedException from APIs. At some point we'll fix RPC to properly
throw InterruptedException when it is waiting.


bq. Changing the semantics of JobClient.addAll isn't ok.

Granted, I forget to use the boolean replace field, but it's not changing the semantics of
the existing method.  The new {{addAll(Credentials, boolean)}} method prevents overwriting
existing credentials.  This fixes an edge case in the JT where the client has already acquired
valid renewable tokens for a service, but they are replaced by bad credentials on disk.

bq. We've been burned badly before by the idiom of using static blocks to register types.
The ServiceLookup is much better.

I was concerned about that too, so I made the operation be a simple map insertion.  I found
a number of other places that have similar static blocks for registering things so I thought
it might be ok.  I'd like to get the rest of the implementation nailed down, then switch it
to a SL.  Is that ok?

bq. Adding a transient field that is lost when going over RPC is problematic from a maintenance
point of view.

The issuer is intentionally transient so: a) no binary compat issues b) it's only relevant
to the client that requested the token.  The NN, DN, etc that uses the token doesn't care
how the client acquired the token.

bq. It also doesn't feel justified, just so that we can continue to intermingle HFTP delegation
tokens with HFTP delegation tokens.

The issuer is much larger than that.  Http(s) is merely the transport used to acquire/renew/cancel
the token.  However, the token acquired is not authorizing future http(s) connections.  The
token authorizes hdfs rpc connections.  +The protocol to manage the token is completely orthogonal
to what the token is authorizing+.  Having the client stomp on the kind and service fields
completely changes the token's semantics and renders it useless unless the original kind and
service fields are swapped back.

The options are then:
* (your proposal) The client does hack-ish swapping of kind and service within the token so
it knows the transport used to manage the token.  The client is no longer allowed to use the
token for what it is actually authorizing.  Ex. The token now looks like it's for http (which
it's not!), so the client can't use it for rpc.  The original values will need to be marshaled
into existing fields during serialization, or the token becomes useless.

* (my early approach) The client creates an hftp token that is a wrapper for the real token
that is acquired.  This prevents the swapping of kind and service within the token, but I
found it unexpectedly difficult to implement.  Note that the client still cannot directly
use the token for what it authorizes, which was an earlier concern by Jitendra.

* (my current approach) The client tracks how it acquired the token via an issuer field. 
The issuer field is only persistent within the client's credentials.  The client uses the
issuer to determine the transport to manage the token.  Unlike the other approaches, the token
is completely binary compatible and, the client can use the token for what it authorizes,
and there's no guesswork in how to renew it!

> Fix renewal of dfs delegation tokens
> ------------------------------------
>                 Key: MAPREDUCE-2764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>             Fix For:
>         Attachments: MAPREDUCE-2764-2.patch, MAPREDUCE-2764.patch, delegation.patch
> The JT may have issues renewing hftp tokens which disrupt long distcp jobs.  The problem
is the JT's delegation token renewal code is built on brittle assumptions.  The token's service
field contains only the "ip:port" pair.  The renewal process assumes that the scheme must
be hdfs.  If that fails due to a {{VersionMismatchException}}, it tries https based on another
assumption that it must be hftp if it's not hdfs.  A number of other exceptions, most commonly
{{IOExceptions}}, can be generated which fouls up the renewal since it won't fallback to https.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message