hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4300) OOM in AM can turn it into a zombie.
Date Fri, 22 Jun 2012 19:21:42 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399530#comment-13399530
] 

Robert Joseph Evans commented on MAPREDUCE-4300:
------------------------------------------------

@Vinod,

I think having HeapDumpOnOutOfMemoryError is a great option to have on, but I don't think
that having the AM do the monitoring for that really makes since.  Like you said trying to
tell the RM that we got an OOM could cause another one.  But it is worse then that because
the JVM can become rather unstable after an OOM.  I think the ClassNotFoundError I saw is
because my UncaughtExceptionHandler caught an OOM and then when it tried to log it another
OOM was thrown.

I like the idea. I have the UncaughtExceptionHandler working fairly well and I think it is
a good solution for shutting down the AM if we run into some unexpected errors.  But I think
it would be great to let the NM know about more then just logs for aggregation.  It would
be good if it could know about other potential files that should be aggregated after a process
exits.  Perhaps only if the process exist with an error.  That way we could throw in heap
dumps and other things and debug them later to know what happened.
                
> OOM in AM can turn it into a zombie.
> ------------------------------------
>
>                 Key: MAPREDUCE-4300
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4300
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 0.23.3
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: StackDump.txt
>
>
> It looks like 4 threads in the AM died with OOM but not the one pinging the RM.
> stderr for this AM
> {noformat}
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter
in all the log4j.properties files.
> May 30, 2012 4:49:55 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
get
> WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject
ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility,
be warned that this MAY have unexpected behavior if you have more than one injector (with
ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets
for more information.
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a
provider class
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider
class
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource
class
> May 30, 2012 4:49:55 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM'
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider
with the scope "Singleton"
> May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
> INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider
with the scope "Singleton"
> May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider
with the scope "PerRequest"
> Exception in thread "ResponseProcessor for block BP-1114822160-<IP>-1322528669066:blk_-6528896407411719649_34227308"
java.lang.OutOfMemoryError: Java heap space
> 	at com.google.protobuf.CodedInputStream.(CodedInputStream.java:538)
> 	at com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55)
> 	at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:201)
> 	at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:738)
> 	at org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos$PipelineAckProto.parseFrom(DataTransferProtos.java:7287)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> Exception in thread "DefaultSpeculator background processing" java.lang.OutOfMemoryError:
Java heap space
> 	at java.util.HashMap.resize(HashMap.java:462)
> 	at java.util.HashMap.addEntry(HashMap.java:755)
> 	at java.util.HashMap.put(HashMap.java:385)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getTasks(JobImpl.java:632)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleASpeculation(DefaultSpeculator.java:465)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleAMapSpeculation(DefaultSpeculator.java:433)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.computeSpeculations(DefaultSpeculator.java:509)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.access$100(DefaultSpeculator.java:56)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator$1.run(DefaultSpeculator.java:176)
> 	at java.lang.Thread.run(Thread.java:619)
> Exception in thread "Timer for 'MRAppMaster' metrics system" java.lang.OutOfMemoryError:
Java heap space
> Exception in thread "Socket Reader #4 for port 50500" java.lang.OutOfMemoryError: Java
heap space
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message