hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
Date Tue, 28 Nov 2017 22:19:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16269577#comment-16269577
] 

Daryn Sharp commented on HADOOP-15059:
--------------------------------------

*Long pondering:* The credentials format is not something that should have been trifled without
careful compatibility considerations.  A major compatibility break like this should have been
proceeded with a bridge release – ie. proactively add v0/v1 support to popular 2.x releases
but continue to use the v0 format.  Flip the 3.x default format to v1 and everything works.
 But it's too late.

I'm concerned the compatibility issues may not be confined to just yarn.  Perhaps a bit contrived
but I've seen enough absurd UGI/token handling that little ceases to amaze me anymore.
# 2.x/3.x code rewrites the creds, only updates the v0-file creds because that's all that
existed since the dawn of security, leading to:
# If 2.x rewrites creds, 2.x subprocess works.
# If 2.x or 3.x rewrites creds, 3.x subprocess is oblivious to intended changes – it found
the unchanged v1-file.
# If 3.x rewrites creds, 2.x subprocess blows up because the v0-file was incidentally changed
to the v1 format.

For the sake of discussion: the ugi methods to read/write the creds would be the better place
to transparently handle the dual-version files instead of all the yarn changes.  It won't
avoid the aforementioned pitfalls, and it doesn't fix the stream methods.  Not to mention
we are now stuck supporting two files.  If the format changes again, do we have to support
3 files?  The whole point of writing the version into the file is to support multiple versions.
––
*Short answer:*  I think 3.0 has to be the bridge release and continue writing the v0 format
but supporting reading v1 format.  3.1 or 3.2 can flip the default format to v1.  Changing
the file format appears to be cosmetic so I don't see any harm in waiting for v1 to be the
default.

> 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15059
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>            Reporter: Junping Du
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, HADOOP-15059.003.patch
>
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed because following
error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created
MRAppMaster for application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
> 	at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
> 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
> 	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
> 	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
> 	... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
> 	at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
> 	... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status
1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we claim "rolling
upgrade" is supported in Hadoop 3, we should fix this before we ship 3.0 otherwise all MR
running applications will get stuck during/after upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message