ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-13120) Restart Of NodeManager During Rolling Upgrade Runs Command As the Wrong User
Date Thu, 17 Sep 2015 04:10:45 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-13120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14791550#comment-14791550
] 

Hadoop QA commented on AMBARI-13120:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12756396/AMBARI-13120.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 1 new or modified
test files.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 core tests{color}.  The test build failed in ambari-server 

Test results: https://builds.apache.org/job/Ambari-trunk-test-patch/3798//testReport/
Console output: https://builds.apache.org/job/Ambari-trunk-test-patch/3798//console

This message is automatically generated.

> Restart Of NodeManager During Rolling Upgrade Runs Command As the Wrong User
> ----------------------------------------------------------------------------
>
>                 Key: AMBARI-13120
>                 URL: https://issues.apache.org/jira/browse/AMBARI-13120
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.1.0
>            Reporter: Jonathan Hurley
>            Assignee: Jonathan Hurley
>            Priority: Critical
>             Fix For: 2.1.2
>
>         Attachments: AMBARI-13120.patch
>
>
> During core slaves step one of nodemanagers failed to restart:
> {code}
> Traceback (most recent call last):
>   File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py",
line 153, in <module>
>     Nodemanager().execute()
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 219, in execute
>     method(env)
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 476, in restart
>     self.post_rolling_restart(env)
>   File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py",
line 84, in post_rolling_restart
>     nodemanager_upgrade.post_upgrade_check()
>   File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager_upgrade.py",
line 41, in post_upgrade_check
>     _check_nodemanager_startup()
>   File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/decorator.py",
line 54, in wrapper
>     return function(*args, **kwargs)
>   File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager_upgrade.py",
line 74, in _check_nodemanager_startup
>     raise Fail('Unable to determine if the NodeManager has started after upgrade (result
code {0})'.format(str(return_code)))
> resource_management.core.exceptions.Fail: Unable to determine if the NodeManager has
started after upgrade (result code 1)
> {code}
> Looks like expiration of a ticket caused this:
> {code}
> 15/09/15 15:28:39 INFO impl.TimelineClientImpl: Timeline service address: http://os-r7-hpjtks-rudtodalsec-5.novalocal:8188/ws/v1/timeline/
> 15/09/15 15:28:40 INFO client.RMProxy: Connecting to ResourceManager at os-r7-hpjtks-rudtodalsec-15.novalocal/172.22.112.51:8050
> 15/09/15 15:28:40 WARN ipc.Client: Exception encountered while connecting to the server
: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid
credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> Exception in thread "main" java.io.IOException: Failed on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials
provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is:
"os-r7-hpjtks-rudtodalsec-19/172.22.112.64"; destination host is: "os-r7-hpjtks-rudtodalsec-15.novalocal":8050;

> 	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:773)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1431)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1358)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> 	at com.sun.proxy.$Proxy17.getClusterNodes(Unknown Source)
> 	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:266)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:606)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> 	at com.sun.proxy.$Proxy18.getClusterNodes(Unknown Source)
> 	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:520)
> 	at org.apache.hadoop.yarn.client.cli.NodeCLI.listClusterNodes(NodeCLI.java:153)
> 	at org.apache.hadoop.yarn.client.cli.NodeCLI.run(NodeCLI.java:122)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> 	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
> 	at org.apache.hadoop.yarn.client.cli.NodeCLI.main(NodeCLI.java:62)
> Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed
[Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any
Kerberos tgt)]
> 	at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:685)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:648)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:735)
> 	at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373)
> 	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1397)
> 	... 17 more
> Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
> 	at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413)
> 	at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:558)
> 	at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:373)
> 	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:727)
> 	at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:723)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:415)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> 	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:722)
> 	... 20 more
> Caused by: GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos tgt)
> 	at sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
> 	at sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121)
> 	at sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
> 	at sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223)
> 	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
> 	at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
> 	at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
> 	... 29 more
> {code}
> {code}
> [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'kdestroy'
> [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'yarn node -list -states=RUNNING'
> 15/09/16 21:08:21 INFO impl.TimelineClientImpl: Timeline service address: http://os-r7-hpjtks-rudtodalsec-5.novalocal:8188/ws/v1/timeline/
> 15/09/16 21:08:21 INFO client.RMProxy: Connecting to ResourceManager at os-r7-hpjtks-rudtodalsec-15.novalocal/172.22.112.51:8050
> 15/09/16 21:08:21 WARN ipc.Client: Exception encountered while connecting to the server
: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid
credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
> ...
> [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'kinit -kt /etc/security/keytabs/nm.service.keytab
nm/os-r7-hpjtks-rudtodalsec-19.novalocal@EXAMPLE.COM'
> [root@os-r7-hpjtks-rudtodalsec-19 ~]# su - yarn -c 'yarn node -list -states=RUNNING'
> 15/09/16 21:08:59 INFO impl.TimelineClientImpl: Timeline service address: http://os-r7-hpjtks-rudtodalsec-5.novalocal:8188/ws/v1/timeline/
> 15/09/16 21:08:59 INFO client.RMProxy: Connecting to ResourceManager at os-r7-hpjtks-rudtodalsec-15.novalocal/172.22.112.51:8050
> Total Nodes:20
>          Node-Id	     Node-State	Node-Http-Address	Number-of-Running-Containers
> os-r7-hpjtks-rudtodalsec-6.novalocal:25454	        RUNNING	os-r7-hpjtks-rudtodalsec-6.novalocal:8042
                           0
> os-r7-hpjtks-rudtodalsec-16.novalocal:25454	        RUNNING	os-r7-hpjtks-rudtodalsec-16.novalocal:8042
                           0
> os-r7-hpjtks-rudtodalsec-13.novalocal:25454	        RUNNING	os-r7-hpjtks-rudtodalsec-13.novalocal:8042
                           0
> ...
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message