hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rainer Toebbicke <...@pclella.cern.ch>
Subject Client mapred tries to renew a token with renewer specified as nobody
Date Mon, 02 Dec 2013 16:25:53 GMT
Hello,

I am trying to understand why my long-running mapreduce jobs stop after 24 hours (approx)
on a secure cluster.

This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn), authentication specified
as "kerberos". Trying with a short-lived Kerberos ticket (1h) I see that it gets renewed regularly.
Still, the job crashes after 24 hours because the delegation token expires.

On a test cluster with increased logging and shortened dfs.namenode.delegation.token.renew-interval
(for quicker debugging) I see that an immediate renew of the delegation token fails, and then
after the expiry period the Namenode log starts getting clobbered.

Detail:


2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful
for tobbicke@CERN.CH (auth:TOKEN)
2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for tobbicke@CERN.CH (auth:TOKEN) for protocol=interface org.apache.hadoop.hdfs.protocol.ClientProtocol
2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful
for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE)
2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager:
Authorization successful for mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=interface
org.apache.hadoop.hdfs.protocol.ClientProtocol
2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Token renewal requested for identifier: HDFS_DELEGATION_TOKEN token 12 for tobbicke

2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException
as:mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) cause:org.apache.hadoop.security.AccessControlException:
Client mapred tries to renew a token with renewer specified as nobody
2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server handler 9 on 9000, call
org.apache.hadoop.hdfs.protocol.ClientProtocol.renewDelegationToken from 188.184.xxx.xxx:42031:
error: org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token
with renewer specified as nobody
org.apache.hadoop.security.AccessControlException: Client mapred tries to renew a token with
renewer specified as nobody
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecretManager.java:274)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.renewDelegationToken(FSNamesystem.java:5319)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.renewDelegationToken(NameNodeRpcServer.java:377)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:814)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)



Is this as unhealthy as it looks? If the first (immediate) renewal fails I assume others will
share the same fate. Would that explain the 24-hour-lifetime on the "real" cluster and what
could be the reason? How does "nobody" come into the game here?

In any case, linked to this or not, after dfs.namenode.delegation.token.renew-interval ms
the following is logged a zillion times:

2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for
188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring password)
2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9000: readAndProcess
threw exception javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password
[Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN
token 12 for tobbicke) is expired] from client 188.184.xxx.xxx. Count of bytes read: 0
javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken:
token (HDFS_DELEGATION_TOKEN token 12 for tobbicke) is expired]
        at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)
        at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)
        at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1210)
        at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1405)
        at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719)
        at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:518)
        at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:493)
Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN
token 12 for tobbicke) is expired
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:227)
        at org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTokenSecretManager.java:46)
        at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.getPassword(SaslRpcServer.java:194)
        at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHandler.handle(SaslRpcServer.java:220)
        at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:568)
        ... 6 more


Any ideas?

Rainer
Mime
View raw message