hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2764) Fix renewal of dfs delegation tokens
Date Sat, 06 Aug 2011 01:09:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080320#comment-13080320

Daryn Sharp commented on MAPREDUCE-2764:

I don't think the goal should be to minimize change in the hopes of reducing risk, but rather
to improve the code base to reduce risk. First, a brief(ish) and polite rebuttal:
# Changes to namenode and rpc client were trivial.  Every token producer should not know how
to properly encode the socket into a service string.  
-          String s = NameNode.getAddress(conf).getAddress().getHostAddress()
-                     + ":" + NameNode.getAddress(conf).getPort();
-          token.setService(new Text(s));
+          token.setService(NameNode.getAddress(conf));
# Changes to rpc client are trivial.  Just like the namenode changes, the encoding of the
service is abstracted.
-          InetSocketAddress addr = remoteId.getAddress();
-          token = tokenSelector.selectToken(new Text(addr.getAddress()
-              .getHostAddress() + ":" + addr.getPort()), 
-              ticket.getTokens());
+          token = tokenSelector.selectToken(server, ticket.getTokens());
# The refactor of the token selectors is essentially creating a base class to eliminate all
the copy-n-paste in the individual selectors.  There's not much risk there.  Adding URI support
was very simple.

The additional cons to the counter-proposal:
* The token renewer will be hardcoded such that mapreduce will need to be recompiled when
another filesystem is added.
* Mapred will require a mapping of token type to schemes.
* Can't simply subclass an existing filesystem with a new scheme w/o recompilation.
* The tokens and selectors are generics-based.  The type is used to do unchecked-casting,
so adding the ability to change the type is playing with fire.
* HftpFilesystem should not be guessing the rpc port when the rpc port is in the original
* Is not a sustainable design pattern.

IMHO, the token renewer should be "dumb" and not require knowledge of every filesystem.  Ergo,
all filesystem tokens should have the same type.  All filesystem tokens should be routed to
their filesystem object.  The filesystem object handles renewal.  To solve the hftp/remote-hdfs
token issue, I'd prefer for the hftp token to simply wrap/contain the remote dfs token instead
of twiddling its fields.

* Overall simple and clean(er) design
* No editing of a token type to scheme mapping for new filesystems
* No recompilation of mapreduce to add a filesystem 
* Eliminates brittle & risky copy-n-paste in the token producers
* Prepares the code to be more flexible and extensible with future service types
* Does not guess the remote rpc port

* Slices across multiple components
* It's a bit more work, mainly due to cleanup of existing code

The "nice thing" about this type of low-level change is it will break immediately if implemented

> Fix renewal of dfs delegation tokens
> ------------------------------------
>                 Key: MAPREDUCE-2764
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2764
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>             Fix For:
>         Attachments: MAPREDUCE-2764.patch
> The JT may have issues renewing hftp tokens which disrupt long distcp jobs.  The problem
is the JT's delegation token renewal code is built on brittle assumptions.  The token's service
field contains only the "ip:port" pair.  The renewal process assumes that the scheme must
be hdfs.  If that fails due to a {{VersionMismatchException}}, it tries https based on another
assumption that it must be hftp if it's not hdfs.  A number of other exceptions, most commonly
{{IOExceptions}}, can be generated which fouls up the renewal since it won't fallback to https.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message