Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5A7C8CB72 for ; Fri, 22 Jun 2012 19:21:43 +0000 (UTC) Received: (qmail 37245 invoked by uid 500); 22 Jun 2012 19:21:43 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 37203 invoked by uid 500); 22 Jun 2012 19:21:43 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 37184 invoked by uid 99); 22 Jun 2012 19:21:43 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Jun 2012 19:21:43 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id D776814285F for ; Fri, 22 Jun 2012 19:21:42 +0000 (UTC) Date: Fri, 22 Jun 2012 19:21:42 +0000 (UTC) From: "Robert Joseph Evans (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <923238266.45029.1340392902884.JavaMail.jiratomcat@issues-vm> In-Reply-To: <10749900.23793.1338497064127.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (MAPREDUCE-4300) OOM in AM can turn it into a zombie. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399530#comment-13399530 ] Robert Joseph Evans commented on MAPREDUCE-4300: ------------------------------------------------ @Vinod, I think having HeapDumpOnOutOfMemoryError is a great option to have on, but I don't think that having the AM do the monitoring for that really makes since. Like you said trying to tell the RM that we got an OOM could cause another one. But it is worse then that because the JVM can become rather unstable after an OOM. I think the ClassNotFoundError I saw is because my UncaughtExceptionHandler caught an OOM and then when it tried to log it another OOM was thrown. I like the idea. I have the UncaughtExceptionHandler working fairly well and I think it is a good solution for shutting down the AM if we run into some unexpected errors. But I think it would be great to let the NM know about more then just logs for aggregation. It would be good if it could know about other potential files that should be aggregated after a process exits. Perhaps only if the process exist with an error. That way we could throw in heap dumps and other things and debug them later to know what happened. > OOM in AM can turn it into a zombie. > ------------------------------------ > > Key: MAPREDUCE-4300 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4300 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: applicationmaster > Affects Versions: 0.23.3 > Reporter: Robert Joseph Evans > Assignee: Robert Joseph Evans > Attachments: StackDump.txt > > > It looks like 4 threads in the AM died with OOM but not the one pinging the RM. > stderr for this AM > {noformat} > WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter in all the log4j.properties files. > May 30, 2012 4:49:55 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider get > WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility, be warned that this MAY have unexpected behavior if you have more than one injector (with ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets for more information. > May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register > INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a provider class > May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register > INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider class > May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory register > INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource class > May 30, 2012 4:49:55 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate > INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM' > May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider > INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider with the scope "Singleton" > May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider > INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider with the scope "Singleton" > May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory getComponentProvider > INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider with the scope "PerRequest" > Exception in thread "ResponseProcessor for block BP-1114822160--1322528669066:blk_-6528896407411719649_34227308" java.lang.OutOfMemoryError: Java heap space > at com.google.protobuf.CodedInputStream.(CodedInputStream.java:538) > at com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55) > at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:201) > at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:738) > at org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos$PipelineAckProto.parseFrom(DataTransferProtos.java:7287) > at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95) > at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656) > Exception in thread "DefaultSpeculator background processing" java.lang.OutOfMemoryError: Java heap space > at java.util.HashMap.resize(HashMap.java:462) > at java.util.HashMap.addEntry(HashMap.java:755) > at java.util.HashMap.put(HashMap.java:385) > at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getTasks(JobImpl.java:632) > at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleASpeculation(DefaultSpeculator.java:465) > at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleAMapSpeculation(DefaultSpeculator.java:433) > at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.computeSpeculations(DefaultSpeculator.java:509) > at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.access$100(DefaultSpeculator.java:56) > at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator$1.run(DefaultSpeculator.java:176) > at java.lang.Thread.run(Thread.java:619) > Exception in thread "Timer for 'MRAppMaster' metrics system" java.lang.OutOfMemoryError: Java heap space > Exception in thread "Socket Reader #4 for port 50500" java.lang.OutOfMemoryError: Java heap space > {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira