ambari-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitry Lysnichenko (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMBARI-20447) YARN service check failed during HDP 2.4-2.6 rolling upgrade with YARN HA enabled
Date Tue, 14 Mar 2017 17:02:41 GMT
Dmitry Lysnichenko created AMBARI-20447:
-------------------------------------------

             Summary: YARN service check failed during HDP 2.4-2.6 rolling upgrade with YARN
HA enabled
                 Key: AMBARI-20447
                 URL: https://issues.apache.org/jira/browse/AMBARI-20447
             Project: Ambari
          Issue Type: Bug
            Reporter: Dmitry Lysnichenko
            Assignee: Dmitry Lysnichenko
            Priority: Blocker
         Attachments: AMBARI-20447.patch



The problem with YARN service check failure is that during Rolling upgrade from HDP-2.4 to
HDP-2.6 (with YARN HA turned on):
# After "core master restart" step, yarn client uses new (HDP-2.6) config and fails with Class
org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found . Forcing yarn
client to use old (HDP-2.4) config until client binary is updated helps here
# After "core slave restart" step, using old YARN client config with old YARN client binary
does not help. NM/RM classpath points to HDP-2.6. App job gets scheduled, but then fails with
log:

{code}17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl
failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.serviceStart(AMRMClientImpl.java:186)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl.serviceStart(AMRMClientAsyncImpl.java:96)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.run(ApplicationMaster.java:559)
at org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster.main(ApplicationMaster.java:299)
Caused by: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider
not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2208)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2232)
... 9 more
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider
not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2114)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2206)
... 10 more
17/03/06 16:39:27 INFO service.AbstractService: Service org.apache.hadoop.yarn.client.api.async.AMRMClientAsync
failed in state STARTED; cause: java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException:
Class org.apache.hadoop.yarn.client.RequestHedgingRMFailoverProxyProvider not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2240)
at org.apache.hadoop.yarn.client.RMProxy.createRMFailoverProxyProvider(RMProxy.java:160)
at org.apache.hadoop.yarn.client.RMProxy.createRMProxy(RMProxy.java:93)
at org.apache.hadoop.yarn.client.ClientRMProxy.createRMProxy(ClientRMProxy.java:72)
at
{code}
# After yarn client is updated to a new binary, service check works fine.
----

Bottom line, this is a known problem with DistributedShell - it was never fixed to not rely
on cluster's configuration. What this means is that client configuration changes like this
can break DistributedShell apps over upgrades.
Unfortunately nothing we do now can fix this broken upgrade for DistributedShell - as to ideally
fix it, we have to go back in time and provide changes.

We have to do two things
# Disable DistributedShell based service-check when we go from 2.4 > 2.6. The RequestHedgingRMFailoverProxyProvider
is added in 2.5, so 2.5 > 2.6 is fine.
# Also fix yarn-site.xml starting 2.6 with the following change to avoid this in the future.
The change is from using $HADOOP_CONF_DIR which is inherited from the NodeManager to /etc/hadoop/conf/
which is always tied to the client version.
{code}
<property>
<name>yarn.application.classpath</name>
<value>/etc/hadoop/conf/,/usr/hdp/current/hadoop-client/*,/usr/hdp/current/hadoop-client/lib/*,/usr/hdp/current/hadoop-hdfs-client/*,/usr/hdp/current/hadoop-hdfs-client/lib/*,/usr/hdp/current/hadoop-yarn-client/*,/usr/hdp/current/hadoop-yarn-client/lib/*</value>
</property>
{code}






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message