hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
Date Wed, 22 Nov 2017 15:44:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16262809#comment-16262809
] 

Jason Lowe commented on HADOOP-15059:
-------------------------------------

bq. Are we going to keep binary compatibility across hadoop-2.x and hadoop-3.x?

Wire compatibility between 2.x clients and 3.x servers is a prerequisite to supporting a rolling
upgrade from 2.x to 3.x, but I do not think everyone realizes wire compatibility between a
3.x client and a 2.x server is also very important to many of our users.  There are many cases
where more than one cluster is involved in a workflow.  Requiring that all clusters upgrade
from 2.x to 3.x simultaneously is a huge hurdle for adoption, and most users will upgrade
them one at a time.  As individual clusters upgrade there will be clients/jobs on a newly
upgraded 3.x cluster trying to interact with an older 2.x cluster.

Back to the issue of launching jobs using an incompatible token format -- here's a couple
of options we could consider:

1) YARN nodemanager writes out *two* token credential files, the original 2.x file for backwards
compatibility and a new 3.x file.  The 3.x UGI code looks for the new file and falls back
to the old one if it cannot find it.  The 2.x code will simply load the old format from the
original filename as it does today.

2) Application submission context contains information on which version of credentials to
use for an application.  This gets transferred to the container launch context for each container,
and the nodemanager writes out the appropriate credentials version based on what was specified
in the container launch context.  In other words, the nodemanager knows which version of the
credentials format the container is expecting to find and writes the token file in that format.


> 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-15059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15059
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>            Reporter: Junping Du
>            Priority: Blocker
>
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed because following
error:
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created
MRAppMaster for application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
> 	at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
> 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
> 	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
> 	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
> 	... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
> 	at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
> 	... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status
1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we claim "rolling
upgrade" is supported in Hadoop 3, we should fix this before we ship 3.0 otherwise all MR
running applications will get stuck during/after upgrade.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org


Mime
View raw message