hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-15059) 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
Date Wed, 29 Nov 2017 17:59:00 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Lowe updated HADOOP-15059:
    Attachment: HADOOP-15059.004.patch

Thanks for joining the conversation, Allen, and for pointing out the motivations behind the
protobuf change.  Do you know of existing use cases that are relying on the new format?

I completely agree the new format is a great path forward for extensibility and portability,
but unfortunately it breaks a number of existing use cases.

bq. Let's be clear: this is only a problem if one has a bundled hadoop-common.jar.

It's also important to point out that this is a rather common occurrence.  Besides the typical
habit of users running their *-with-dependencies.jar on the cluster, anyone leveraging the
framework-on-HDFS approach will be bitten by this as soon as the nodemanager upgrades.  

Having frameworks deploy via HDFS rather than picking them up from the nodemanager's jars
has proven to be a very useful way to better isolate apps during cluster rolling upgrades
and support multiple versions of the framework on the cluster simultaneously.

bq. Is the end result of this JIRA going to be that all file formats are locked forever, regardless
of where they come from?

I don't think so.  As discussed above, we should be able to remove support for the Writable
format when Hadoop no longer supports 2.x apps.  Yes, that's likely quite a long time, but
it does not have to be forever.

bq. Hadoop releases have broken rolling upgrade (and non-rolling upgrades, for that matter)
in the middle of the 2.x stream before by removing things such as container execution types.

We've completed rolling upgrades across all of our clusters for every minor release of 2.x
since rolling upgrades were first supported in 2.6, so we must not have hit this landmine.
 Was this the removal of the dedicated Docker container executor in favor of the unified Linux
executor that does everything?

I'm attaching a patch that implements the "bridge release(s)" approach where the code supports
reading the new format but will write the old format by default.  Code can still request the
new format explicitly if necessary.  The main drawback is that we don't get to easily leverage
the benefits of the new format since it's not the default format.  However I'm hoping native
services and other things that need the new protobuf format can leverage dtutil to translate
the credentials format for easier consumption.

> 3.0 deployment cannot work with old version MR tar ball which break rolling upgrade
> -----------------------------------------------------------------------------------
>                 Key: HADOOP-15059
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15059
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: security
>            Reporter: Junping Du
>            Assignee: Jason Lowe
>            Priority: Blocker
>         Attachments: HADOOP-15059.001.patch, HADOOP-15059.002.patch, HADOOP-15059.003.patch,
> I tried to deploy 3.0 cluster with 2.9 MR tar ball. The MR job is failed because following
> {noformat}
> 2017-11-21 12:42:50,911 INFO [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Created
MRAppMaster for application appattempt_1511295641738_0003_000001
> 2017-11-21 12:42:51,070 WARN [main] org.apache.hadoop.util.NativeCodeLoader: Unable to
load native-hadoop library for your platform... using builtin-java classes where applicable
> 2017-11-21 12:42:51,118 FATAL [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster:
Error starting MRAppMaster
> java.lang.RuntimeException: Unable to determine current user
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:254)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:220)
> 	at org.apache.hadoop.conf.Configuration$Resource.<init>(Configuration.java:212)
> 	at org.apache.hadoop.conf.Configuration.addResource(Configuration.java:888)
> 	at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1638)
> Caused by: java.io.IOException: Exception reading /tmp/nm-local-dir/usercache/jdu/appcache/application_1511295641738_0003/container_e03_1511295641738_0003_01_000001/container_tokens
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:208)
> 	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:907)
> 	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:820)
> 	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:689)
> 	at org.apache.hadoop.conf.Configuration$Resource.getRestrictParserDefault(Configuration.java:252)
> 	... 4 more
> Caused by: java.io.IOException: Unknown version 1 in token storage.
> 	at org.apache.hadoop.security.Credentials.readTokenStorageStream(Credentials.java:226)
> 	at org.apache.hadoop.security.Credentials.readTokenStorageFile(Credentials.java:205)
> 	... 8 more
> 2017-11-21 12:42:51,122 INFO [main] org.apache.hadoop.util.ExitUtil: Exiting with status
1: java.lang.RuntimeException: Unable to determine current user
> {noformat}
> I think it is due to token incompatiblity change between 2.9 and 3.0. As we claim "rolling
upgrade" is supported in Hadoop 3, we should fix this before we ship 3.0 otherwise all MR
running applications will get stuck during/after upgrade.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message