hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4300) OOM in AM can turn it into a zombie.
Date Fri, 22 Jun 2012 18:26:42 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13399496#comment-13399496
] 

Ravi Prakash commented on MAPREDUCE-4300:
-----------------------------------------

bq. For your first comment I think we could try and do fault injection. I have a patch now
that is installing the an UncaughtExceptionHandler, and I am testing it simply by setting
the heap size small and seeing what happens.
Sweet! This is great. Thanks! :)

bq. For the second one I don't think the point is to try and prevent an OOM form happening.
Pig tries to do this with a swapping type thing and the code is very brittle, and they still
will get an occasional OOM. OOMs and other errors are going to happen. I think the point is
to make sure that we don't get into a deadlock/zombie like state when they do.
Hmm.... I agree that NOT getting stuck in a zombie state is absolutely imperative. However,
if we fail the daemon, isn't Hadoop just going to retry? Which will basically mean I'll be
retrying a (possibly) big job 3-4 times before finally failing it. I see a need for a "health
check thread" in almost all daemons. And memory headroom, disk health, connection health,
all of these can be tied in. Admittedly, such a framework is probably out of scope for this
JIRA. But just throwing it out in case we want to design towards that goal.
                
> OOM in AM can turn it into a zombie.
> ------------------------------------
>
>                 Key: MAPREDUCE-4300
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4300
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: applicationmaster
>    Affects Versions: 0.23.3
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: StackDump.txt
>
>
> It looks like 4 threads in the AM died with OOM but not the one pinging the RM.
> stderr for this AM
> {noformat}
> WARNING: org.apache.hadoop.metrics.jvm.EventCounter is deprecated. Please use org.apache.hadoop.log.metrics.EventCounter
in all the log4j.properties files.
> May 30, 2012 4:49:55 AM com.google.inject.servlet.InternalServletModule$BackwardsCompatibleServletContextProvider
get
> WARNING: You are attempting to use a deprecated API (specifically, attempting to @Inject
ServletContext inside an eagerly created singleton. While we allow this for backwards compatibility,
be warned that this MAY have unexpected behavior if you have more than one injector (with
ServletModule) running in the same JVM. Please consult the Guice documentation at http://code.google.com/p/google-guice/wiki/Servlets
for more information.
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver as a
provider class
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
> INFO: Registering org.apache.hadoop.yarn.webapp.GenericExceptionHandler as a provider
class
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
register
> INFO: Registering org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices as a root resource
class
> May 30, 2012 4:49:55 AM com.sun.jersey.server.impl.application.WebApplicationImpl _initiate
> INFO: Initiating Jersey application, version 'Jersey: 1.8 06/24/2011 12:17 PM'
> May 30, 2012 4:49:55 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.JAXBContextResolver to GuiceManagedComponentProvider
with the scope "Singleton"
> May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
> INFO: Binding org.apache.hadoop.yarn.webapp.GenericExceptionHandler to GuiceManagedComponentProvider
with the scope "Singleton"
> May 30, 2012 4:49:56 AM com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory
getComponentProvider
> INFO: Binding org.apache.hadoop.mapreduce.v2.app.webapp.AMWebServices to GuiceManagedComponentProvider
with the scope "PerRequest"
> Exception in thread "ResponseProcessor for block BP-1114822160-<IP>-1322528669066:blk_-6528896407411719649_34227308"
java.lang.OutOfMemoryError: Java heap space
> 	at com.google.protobuf.CodedInputStream.(CodedInputStream.java:538)
> 	at com.google.protobuf.CodedInputStream.newInstance(CodedInputStream.java:55)
> 	at com.google.protobuf.AbstractMessageLite$Builder.mergeFrom(AbstractMessageLite.java:201)
> 	at com.google.protobuf.AbstractMessage$Builder.mergeFrom(AbstractMessage.java:738)
> 	at org.apache.hadoop.hdfs.protocol.proto.DataTransferProtos$PipelineAckProto.parseFrom(DataTransferProtos.java:7287)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:656)
> Exception in thread "DefaultSpeculator background processing" java.lang.OutOfMemoryError:
Java heap space
> 	at java.util.HashMap.resize(HashMap.java:462)
> 	at java.util.HashMap.addEntry(HashMap.java:755)
> 	at java.util.HashMap.put(HashMap.java:385)
> 	at org.apache.hadoop.mapreduce.v2.app.job.impl.JobImpl.getTasks(JobImpl.java:632)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleASpeculation(DefaultSpeculator.java:465)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.maybeScheduleAMapSpeculation(DefaultSpeculator.java:433)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.computeSpeculations(DefaultSpeculator.java:509)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator.access$100(DefaultSpeculator.java:56)
> 	at org.apache.hadoop.mapreduce.v2.app.speculate.DefaultSpeculator$1.run(DefaultSpeculator.java:176)
> 	at java.lang.Thread.run(Thread.java:619)
> Exception in thread "Timer for 'MRAppMaster' metrics system" java.lang.OutOfMemoryError:
Java heap space
> Exception in thread "Socket Reader #4 for port 50500" java.lang.OutOfMemoryError: Java
heap space
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message