Return-Path: X-Original-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-yarn-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5F2ECE60E for ; Thu, 21 Feb 2013 23:24:13 +0000 (UTC) Received: (qmail 63387 invoked by uid 500); 21 Feb 2013 23:24:13 -0000 Delivered-To: apmail-hadoop-yarn-issues-archive@hadoop.apache.org Received: (qmail 63351 invoked by uid 500); 21 Feb 2013 23:24:13 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: yarn-issues@hadoop.apache.org Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 63256 invoked by uid 99); 21 Feb 2013 23:24:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 21 Feb 2013 23:24:12 +0000 Date: Thu, 21 Feb 2013 23:24:12 +0000 (UTC) From: "Sandy Ryza (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-413) With log aggregation on, nodemanager dies on startup if it can't connect to HDFS MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/YARN-413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13583660#comment-13583660 ] Sandy Ryza commented on YARN-413: --------------------------------- 2013-02-21 13:27:24,307 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.NodeManager at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.start(NodeManager.java:199) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:322) at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:359) Caused by: org.apache.hadoop.yarn.YarnException: Failed to Start org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:78) at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.start(ContainerManagerImpl.java:248) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 3 more Caused by: org.apache.hadoop.yarn.YarnException: Failed to create remoteLogDir [/tmp/logs] at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:207) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.start(LogAggregationService.java:132) at org.apache.hadoop.yarn.service.CompositeService.start(CompositeService.java:68) ... 5 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot create directory /tmp/logs. Name node is in safe mode. The reported blocks 7 has reached the threshold 0.9990 of total blocks 7. Safe mode will be turned off automatically in 25 seconds. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:3067) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:3045) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3024) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:667) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:468) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:40995) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:482) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1018) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1778) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1774) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1488) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1772) at org.apache.hadoop.ipc.Client.call(Client.java:1237) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202) at $Proxy9.mkdirs(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:163) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:82) at $Proxy9.mkdirs(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:450) at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:2115) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:2086) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:540) at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.verifyAndCreateRemoteLogDir(LogAggregationService.java:204) ... 7 more 2013-02-21 13:27:24,308 INFO org.apache.hadoop.ipc.Server: Stopping server on 47223 2013-02-21 13:27:24,308 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit 2013-02-21 13:27:24,309 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040 > With log aggregation on, nodemanager dies on startup if it can't connect to HDFS > -------------------------------------------------------------------------------- > > Key: YARN-413 > URL: https://issues.apache.org/jira/browse/YARN-413 > Project: Hadoop YARN > Issue Type: Bug > Components: nodemanager > Affects Versions: 2.0.3-alpha > Reporter: Sandy Ryza > > If log aggregation is on, when the nodemanager starts up, it tries to create the remote log directory. If this fails, it kills itself. It doesn't seem like turning log aggregation on should ever cause the nodemanager to die. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira