Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1D10410DAA for ; Tue, 3 Dec 2013 19:25:30 +0000 (UTC) Received: (qmail 38643 invoked by uid 500); 3 Dec 2013 19:25:18 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 38524 invoked by uid 500); 3 Dec 2013 19:25:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 38517 invoked by uid 99); 3 Dec 2013 19:25:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Dec 2013 19:25:18 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of rtejac@gmail.com designates 209.85.128.53 as permitted sender) Received: from [209.85.128.53] (HELO mail-qe0-f53.google.com) (209.85.128.53) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Dec 2013 19:25:11 +0000 Received: by mail-qe0-f53.google.com with SMTP id nc12so13478841qeb.40 for ; Tue, 03 Dec 2013 11:24:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:in-reply-to:references:from:to:cc :subject:content-type; bh=zOc9qLb/5ixVP37aphTDrpXy/KOzVfDXqfu6Up4qkgY=; b=ytg+LIVcDrVdtkOWn/wWgr7HbMP8iUkpeyvy1eYYqnrYejOaJ6eXppdtdtMfrpaGlg zhcIQvtF+YOcEds8quGHfluvKVxp1qRTOtcI9pd21c2mTFXe+V3ELmrU9eAVmLwzZniq MFFM6/+iDVgIR5kxcshBlWBmkjac2MVUj6LNllDJlpNN+3orcFgqKZcNCtv4x+41frRy 1uyFs8kbZIu51NbKqjl4H//YPAhAhv2MV/tD/ru0mEpIyVZGqpbOpkhsLXYWYJ6ibo/E DojTkornS3xS88AoUYD8NlDKd5sx/S1iFHUXp+Qz4/MIxUGyDP0SAh583/vdfmvLNUWc qQnQ== X-Received: by 10.49.81.178 with SMTP id b18mr32412023qey.0.1386098690800; Tue, 03 Dec 2013 11:24:50 -0800 (PST) Received: from [127.0.0.1] (ec2-54-235-159-148.compute-1.amazonaws.com. [54.235.159.148]) by mx.google.com with ESMTPSA id fc16sm9613296qeb.3.2013.12.03.11.24.49 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 03 Dec 2013 11:24:50 -0800 (PST) MIME-Version: 1.0 X-Mailer: Nodemailer (0.5.0; +http://www.nodemailer.com/) Date: Tue, 03 Dec 2013 11:24:50 -0800 (PST) Message-Id: <1386098689573.654965cc@Nodemailer> In-Reply-To: References: X-Orchestra-Oid: 5EDA2D4F-C66A-4DAC-8CBD-91BCF2620850 X-Orchestra-Sig: 0e7ee45becb4a06ab2e67594568ce58ab14b15ed X-Orchestra-Thrid: T34E92EF0-75A6-48FC-BDA9-72E38438286D_1453330885055499803 X-Orchestra-Thrid-Sig: c81f92720fad9677c3620b49aaf974d7c505366e X-Orchestra-Account: bc066f6e15f1e9cb69e410a96fe91e4cc03d9006 From: "Raviteja Chirala" To: user@hadoop.apache.org Cc: user@hadoop.apache.org Subject: Re: Client mapred tries to renew a token with renewer specified as nobody Content-Type: multipart/alternative; boundary="----Nodemailer-0.5.0-?=_1-1386098690261" X-Virus-Checked: Checked by ClamAV on apache.org ------Nodemailer-0.5.0-?=_1-1386098690261 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Alternatively you can schedule a cron job to do kinit every 20 hours or so.= Just to renew token before it expires.=C2=A0 =E2=80=94 Sent from Mailbox for iPad On Mon, Dec 2, 2013 at 9:12 AM, Rainer Toebbicke wrote: > Hello, > I am trying to understand why my long-running mapreduce jobs stop after = 24 hours (approx) on a secure cluster. > This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not yarn),= authentication specified as =22kerberos=22. Trying with a short-lived = Kerberos ticket (1h) I see that it gets renewed regularly. Still, the job = crashes after 24 hours because the delegation token expires. > On a test cluster with increased logging and shortened dfs.namenode.= delegation.token.renew-interval (for quicker debugging) I see that an = immediate renew of the delegation token fails, and then after the expiry = period the Namenode log starts getting clobbered. > Detail: > 2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.ipc.Server:= Auth successful for tobbicke@CERN.CH (auth:TOKEN) > 2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.= authorize.ServiceAuthorizationManager: Authorization successful for = tobbicke@CERN.CH (auth:TOKEN) for protocol=3Dinterface org.apache.hadoop.= hdfs.protocol.ClientProtocol > 2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.Server:= Auth successful for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE) > 2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.= authorize.ServiceAuthorizationManager: Authorization successful for = mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=3Dinterface org.= apache.hadoop.hdfs.protocol.ClientProtocol > 2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.delegation.= AbstractDelegationTokenSecretManager: Token renewal requested for = identifier: HDFS=5FDELEGATION=5FTOKEN token 12 for tobbicke > 2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.= UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN= .CH (auth:KERBEROS) cause:org.apache.hadoop.security.= AccessControlException: Client mapred tries to renew a token with renewer = specified as nobody > 2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server = handler 9 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.= renewDelegationToken from 188.184.xxx.xxx:42031: error: org.apache.hadoop.= security.AccessControlException: Client mapred tries to renew a token with = renewer specified as nobody > org.apache.hadoop.security.AccessControlException: Client mapred tries to= renew a token with renewer specified as nobody > at org.apache.hadoop.security.token.delegation.= AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecr= etManager.java:274) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.= renewDelegationToken(FSNamesystem.java:5319) > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.= renewDelegationToken(NameNodeRpcServer.java:377) > at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServer= SideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTrans= latorPB.java:814) > at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolPr= otos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProt= os.java:45024) > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvo= ker.call(ProtobufRpcEngine.java:453) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1701) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1697) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.security.UserGroupInformation.= doAs(UserGroupInformation.java:1408) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695) > Is this as unhealthy as it looks=3F If the first (immediate) renewal = fails I assume others will share the same fate. Would that explain the = 24-hour-lifetime on the =22real=22 cluster and what could be the reason=3F = How does =22nobody=22 come into the game here=3F > In any case, linked to this or not, after dfs.namenode.delegation.token.= renew-interval ms the following is logged a zillion times: > 2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.Server:= Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error acquiring= password) > 2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server = listener on 9000: readAndProcess threw exception javax.security.sasl.= SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.= apache.hadoop.security.token.SecretManager$InvalidToken: token = (HDFS=5FDELEGATION=5FTOKEN token 12 for tobbicke) is expired] from client = 188.184.xxx.xxx. Count of bytes read: 0 > javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring = password [Caused by org.apache.hadoop.security.token.= SecretManager$InvalidToken: token (HDFS=5FDELEGATION=5FTOKEN token 12 for = tobbicke) is expired] > at com.sun.security.sasl.digest.DigestMD5Server.= validateClientResponse(DigestMD5Server.java:577) > at com.sun.security.sasl.digest.DigestMD5Server.= evaluateResponse(DigestMD5Server.java:226) > at org.apache.hadoop.ipc.Server$Connection.= saslReadAndProcess(Server.java:1210) > at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.= java:1405) > at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:719) > at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.= java:518) > at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.= java:493) > Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: = token (HDFS=5FDELEGATION=5FTOKEN token 12 for tobbicke) is expired > at org.apache.hadoop.security.token.delegation.= AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTok= enSecretManager.java:227) > at org.apache.hadoop.security.token.delegation.= AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTok= enSecretManager.java:46) > at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHan= dler.getPassword(SaslRpcServer.java:194) > at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackHan= dler.handle(SaslRpcServer.java:220) > at com.sun.security.sasl.digest.DigestMD5Server.= validateClientResponse(DigestMD5Server.java:568) > ... 6 more > Any ideas=3F > Rainer ------Nodemailer-0.5.0-?=_1-1386098690261 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
Alternatively you can schedule a cron job to do kinit every 20 hours = or so. Just to renew token before it expires.=C2=A0
=E2=80=94
Sent from Mailbox for iPad


On Mon, Dec 2, 2013 at 9:12 AM,= Rainer Toebbicke <rtb@pclella.cern.ch> = wrote:

Hello,

I am trying to understand why my long-running mapreduce jobs stop = after 24 hours (approx) on a secure cluster.

This is on Cloudera CDH 4.3.0, hence hadoop 2.0.0, using mrv1 (not = yarn), authentication specified as =22kerberos=22. Trying with a = short-lived Kerberos ticket (1h) I see that it gets renewed regularly. = Still, the job crashes after 24 hours because the delegation token expires.=

On a test cluster with increased logging and shortened dfs.namenode= .delegation.token.renew-interval (for quicker debugging) I see that an = immediate renew of the delegation token fails, and then after the expiry = period the Namenode log starts getting clobbered.

Detail:


2013-12-02 15:57:08,461 INFO SecurityLogger.org.apache.hadoop.= ipc.Server: Auth successful for tobbicke@CERN.CH (auth:TOKEN)
2013-12-02 15:57:08,462 INFO SecurityLogger.org.apache.hadoop.security.= authorize.ServiceAuthorizationManager: Authorization successful for = tobbicke@CERN.CH (auth:TOKEN) for protocol=3Dinterface org.apache.hadoop.= hdfs.protocol.ClientProtocol
2013-12-02 15:57:08,500 INFO SecurityLogger.org.apache.hadoop.ipc.= Server: Auth successful for mapred/xxx.cern.ch@CERN.CH (auth:SIMPLE)
2013-12-02 15:57:08,540 INFO SecurityLogger.org.apache.hadoop.security.= authorize.ServiceAuthorizationManager: Authorization successful for = mapred/xxx.cern.ch@CERN.CH (auth:KERBEROS) for protocol=3Dinterface org.= apache.hadoop.hdfs.protocol.ClientProtocol
2013-12-02 15:57:08,541 INFO org.apache.hadoop.security.token.= delegation.AbstractDelegationTokenSecretManager: Token renewal requested = for identifier: HDFS=5FDELEGATION=5FTOKEN token 12 for tobbicke

2013-12-02 15:57:08,541 ERROR org.apache.hadoop.security.= UserGroupInformation: PriviledgedActionException as:mapred/xxx.cern.ch@CERN= .CH (auth:KERBEROS) cause:org.apache.hadoop.security.= AccessControlException: Client mapred tries to renew a token with renewer = specified as nobody
2013-12-02 15:57:08,541 INFO org.apache.hadoop.ipc.Server: IPC Server = handler 9 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.= renewDelegationToken from 188.184.xxx.xxx:42031: error: org.apache.hadoop.= security.AccessControlException: Client mapred tries to renew a token with = renewer specified as nobody
org.apache.hadoop.security.AccessControlException: Client mapred tries = to renew a token with renewer specified as nobody
at org.apache.hadoop.security.token.delegation.= AbstractDelegationTokenSecretManager.renewToken(AbstractDelegationTokenSecr= etManager.java:274)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.= renewDelegationToken(FSNamesystem.java:5319)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.= renewDelegationToken(NameNodeRpcServer.java:377)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServ= erSideTranslatorPB.renewDelegationToken(ClientNamenodeProtocolServerSideTra= nslatorPB.java:814)
at org.apache.hadoop.hdfs.protocol.proto.= ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.= callBlockingMethod(ClientNamenodeProtocolProtos.java:45024)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcIn= voker.call(ProtobufRpcEngine.java:453)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.= java:1701)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.= java:1697)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.= doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1695)



Is this as unhealthy as it looks=3F If the first = (immediate) renewal fails I assume others will share the same fate. Would = that explain the 24-hour-lifetime on the =22real=22 cluster and what could = be the reason=3F How does =22nobody=22 come into the game here=3F

In any case, linked to this or not, after dfs.namenode.delegation.= token.renew-interval ms the following is logged a zillion times:

2013-12-02 16:58:09,718 WARN SecurityLogger.org.apache.hadoop.ipc.= Server: Auth failed for 188.184.xxx.xxx:44979:null (DIGEST-MD5: IO error = acquiring password)
2013-12-02 16:58:09,719 INFO org.apache.hadoop.ipc.Server: IPC Server = listener on 9000: readAndProcess threw exception javax.security.sasl.= SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.= apache.hadoop.security.token.SecretManager$InvalidToken: token = (HDFS=5FDELEGATION=5FTOKEN token 12 for tobbicke) is expired] from client = 188.184.xxx.xxx. Count of bytes read: 0
javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring = password [Caused by org.apache.hadoop.security.token.= SecretManager$InvalidToken: token (HDFS=5FDELEGATION=5FTOKEN token 12 for = tobbicke) is expired]
at com.sun.security.sasl.digest.DigestMD5Server.= validateClientResponse(DigestMD5Server.java:577)
at com.sun.security.sasl.digest.DigestMD5Server.= evaluateResponse(DigestMD5Server.java:226)
at org.apache.hadoop.ipc.Server$Connection.= saslReadAndProcess(Server.java:1210)
at org.apache.hadoop.ipc.Server$Connection.= readAndProcess(Server.java:1405)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.= java:719)
at org.apache.hadoop.ipc.Server$Listener$Reader.= doRunLoop(Server.java:518)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.= java:493)
Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken:= token (HDFS=5FDELEGATION=5FTOKEN token 12 for tobbicke) is expired
at org.apache.hadoop.security.token.delegation.= AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTok= enSecretManager.java:227)
at org.apache.hadoop.security.token.delegation.= AbstractDelegationTokenSecretManager.retrievePassword(AbstractDelegationTok= enSecretManager.java:46)
at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackH= andler.getPassword(SaslRpcServer.java:194)
at org.apache.hadoop.security.SaslRpcServer$SaslDigestCallbackH= andler.handle(SaslRpcServer.java:220)
at com.sun.security.sasl.digest.DigestMD5Server.= validateClientResponse(DigestMD5Server.java:568)
... 6 more


Any ideas=3F

Rainer


------Nodemailer-0.5.0-?=_1-1386098690261--