Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 157E8956B for ; Tue, 13 Mar 2012 06:39:02 +0000 (UTC) Received: (qmail 87668 invoked by uid 500); 13 Mar 2012 06:39:01 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 87401 invoked by uid 500); 13 Mar 2012 06:39:01 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 86393 invoked by uid 99); 13 Mar 2012 06:38:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 06:38:59 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 06:38:56 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id CE7A61DFF6 for ; Tue, 13 Mar 2012 06:38:34 +0000 (UTC) Date: Tue, 13 Mar 2012 06:38:34 +0000 (UTC) From: "Mingjie Lai (Created) (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: <431020844.7146.1331620714847.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (HDFS-3083) HA+security: failed to run a mapred job from yarn after a manual failover MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org HA+security: failed to run a mapred job from yarn after a manual failover ------------------------------------------------------------------------- Key: HDFS-3083 URL: https://issues.apache.org/jira/browse/HDFS-3083 Project: Hadoop HDFS Issue Type: Bug Components: ha, security Affects Versions: 0.24.0, 0.23.3 Reporter: Mingjie Lai Priority: Critical Fix For: 0.24.0, 0.23.3 Steps to reproduce: - turned on ha and security - run a mapred job, and wait to finish - failover to another namenode - run the mapred job again, it fails.=20 Checking the job delegation token, it's still indicate the original active = namenode. It causes nm failed to obtain a dt for the new nn. (?)=20 {code} $ hdfs dfs -cat hdfs://ns1:8020/tmp/hadoop-yarn/staging/yarn/.staging/job_1= 331619043691_0001/appTokens HDTS ha-hdfs:ns1@(yarn/nn1.hadoop.local@HADOOP.LOCALDOMAINyarn=EF=BF=BD6 =EF=BF=BDL=EF=BF=BD=EF=BF=BD6.=EF=BF=BD=D0=9BF=7Fs=EF=BF=BD=EF=BF=BDr=EF=BF= =BD%=EF=BF=BDB=EF=BF=BD'=EF=BF=BD=EF=BF=BD{pR=EF=BF=BDHDFS_DELEGATION_TOKEN ha-hdfs:ns {code} Exceptions: {code} 12/03/13 06:19:44 INFO mapred.ResourceMgrDelegate: Submitted application ap= plication_1331619043691_0002 to ResourceManager at nn1.hadoop.local/10.177.= 23.38:7090 12/03/13 06:19:45 INFO mapreduce.Job: The url to track the job: http://nn1.= hadoop.local:7050/proxy/application_1331619043691_0002/ 12/03/13 06:19:45 INFO mapreduce.Job: Running job: job_1331619043691_0002 12/03/13 06:19:47 INFO mapreduce.Job: Job job_1331619043691_0002 running in= uber mode : false 12/03/13 06:19:47 INFO mapreduce.Job: map 0% reduce 0% 12/03/13 06:19:47 INFO mapreduce.Job: Job job_1331619043691_0002 failed wit= h state FAILED due to: Application application_1331619043691_0002 failed 1 = times due to AM Container for appattempt_1331619043691_0002_000001 exited w= ith exitCode: -1000 due to: RemoteTrace:=20 org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DE= LEGATION_TOKEN token 40 for yarn) can't be found in cache =09at org.apache.hadoop.ipc.Client.call(Client.java:1159) =09at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEng= ine.java:188) =09at $Proxy28.getFileInfo(Unknown Source) =09at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.= getFileInfo(ClientNamenodeProtocolTranslatorPB.java:622) =09at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) =09at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.= java:39) =09at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces= sorImpl.java:25) =09at java.lang.reflect.Method.invoke(Method.java:597) =09at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI= nvocationHandler.java:164) =09at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat= ionHandler.java:83) =09at $Proxy29.getFileInfo(Unknown Source) =09at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1260) =09at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(Distribute= dFileSystem.java:718) =09at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:88) =09at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:49) =09at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:157) =09at org.apache.hadoop.yarn.util.FSDownload$1.run(FSDownload.java:155) =09at java.security.AccessController.doPrivileged(Native Method) =09at javax.security.auth.Subject.doAs(Subject.java:396) =09at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma= tion.java:1177) =09at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:153) =09at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:49) =09at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) =09at java.util.concurrent.FutureTask.run(FutureTask.java:138) =09at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:44= 1) =09at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) =09at java.util.concurrent.FutureTask.run(FutureTask.java:138) =09at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExec= utor.java:886) =09at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor= .java:908) =09at java.lang.Thread.run(Thread.java:662) at LocalTrace:=20 =09org.apache.hadoop.yarn.exceptions.impl.pb.YarnRemoteExceptionPBImpl: tok= en (HDFS_DELEGATION_TOKEN token 40 for yarn) can't be found in cache =09at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb= .LocalResourceStatusPBImpl.convertFromProtoFormat(LocalResourceStatusPBImpl= .java:217) =09at org.apache.hadoop.yarn.server.nodemanager.api.protocolrecords.impl.pb= .LocalResourceStatusPBImpl.getException(LocalResourceStatusPBImpl.java:147) =09at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.= ResourceLocalizationService$LocalizerRunner.update(ResourceLocalizationServ= ice.java:827) =09at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.= ResourceLocalizationService$LocalizerTracker.processHeartbeat(ResourceLocal= izationService.java:497) =09at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.= ResourceLocalizationService.heartbeat(ResourceLocalizationService.java:222) =09at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.service.Localiz= ationProtocolPBServiceImpl.heartbeat(LocalizationProtocolPBServiceImpl.java= :46) =09at org.apache.hadoop.yarn.proto.LocalizationProtocol$LocalizationProtoco= lService$2.callBlockingMethod(LocalizationProtocol.java:57) =09at org.apache.hadoop.yarn.ipc.ProtoOverHadoopRpcEngine$Server.call(Proto= OverHadoopRpcEngine.java:355) =09at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1660) =09at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1656) =09at java.security.AccessController.doPrivileged(Native Method) =09at javax.security.auth.Subject.doAs(Subject.java:396) =09at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma= tion.java:1177) =09at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1654) .Failing this attempt.. Failing the application. 12/03/13 06:19:47 INFO mapreduce.Job: Counters: 0 Job ended: Tue Mar 13 06:19:47 UTC 2012 The job took 3 seconds. {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrato= rs: https://issues.apache.org/jira/secure/ContactAdministrators!default.jsp= a For more information on JIRA, see: http://www.atlassian.com/software/jira