flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Metzger (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4137) JobManager web frontend does not shut down on OOM exception on JM
Date Tue, 05 Jul 2016 17:30:11 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15362824#comment-15362824
] 

Robert Metzger commented on FLINK-4137:
---------------------------------------

THank you for looking into the problem.
I will try to reproduce the issue tomorrow and check if the configuration setting fixes the
issue.

> JobManager web frontend does not shut down on OOM exception on JM
> -----------------------------------------------------------------
>
>                 Key: FLINK-4137
>                 URL: https://issues.apache.org/jira/browse/FLINK-4137
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination, JobManager, Webfrontend
>            Reporter: Robert Metzger
>            Assignee: Till Rohrmann
>            Priority: Critical
>
> After the following Exception on the JobManager.
> {code}
> 2016-06-30 14:45:06,642 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
    - Completed checkpoint 379 (in 7017 ms)
> 2016-06-30 14:45:06,642 INFO  org.apache.flink.runtime.checkpoint.CheckpointCoordinator
    - Triggering checkpoint 380 @ 1467297906642
> 2016-06-30 14:45:17,902 ERROR akka.actor.ActorSystemImpl                            
       - Uncaught fatal error from thread [flink-akka.remote.default-remote-dispatcher-6]
shutting down ActorSystem [flink]
> java.lang.OutOfMemoryError: Java heap space
> 	at com.google.protobuf.ByteString.copyFrom(ByteString.java:192)
> 	at com.google.protobuf.CodedInputStream.readBytes(CodedInputStream.java:324)
> 	at akka.remote.WireFormats$SerializedMessage.<init>(WireFormats.java:3030)
> 	at akka.remote.WireFormats$SerializedMessage.<init>(WireFormats.java:2980)
> 	at akka.remote.WireFormats$SerializedMessage$1.parsePartialFrom(WireFormats.java:3073)
> 	at akka.remote.WireFormats$SerializedMessage$1.parsePartialFrom(WireFormats.java:3068)
> 	at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> 	at akka.remote.WireFormats$RemoteEnvelope.<init>(WireFormats.java:993)
> 	at akka.remote.WireFormats$RemoteEnvelope.<init>(WireFormats.java:927)
> 	at akka.remote.WireFormats$RemoteEnvelope$1.parsePartialFrom(WireFormats.java:1049)
> 	at akka.remote.WireFormats$RemoteEnvelope$1.parsePartialFrom(WireFormats.java:1044)
> 	at com.google.protobuf.CodedInputStream.readMessage(CodedInputStream.java:309)
> 	at akka.remote.WireFormats$AckAndEnvelopeContainer.<init>(WireFormats.java:241)
> 	at akka.remote.WireFormats$AckAndEnvelopeContainer.<init>(WireFormats.java:175)
> 	at akka.remote.WireFormats$AckAndEnvelopeContainer$1.parsePartialFrom(WireFormats.java:279)
> 	at akka.remote.WireFormats$AckAndEnvelopeContainer$1.parsePartialFrom(WireFormats.java:274)
> 	at com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:141)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:176)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:188)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:193)
> 	at com.google.protobuf.AbstractParser.parseFrom(AbstractParser.java:49)
> 	at akka.remote.WireFormats$AckAndEnvelopeContainer.parseFrom(WireFormats.java:409)
> 	at akka.remote.transport.AkkaPduProtobufCodec$.decodeMessage(AkkaPduCodec.scala:181)
> 	at akka.remote.EndpointReader.akka$remote$EndpointReader$$tryDecodeMessageAndAck(Endpoint.scala:995)
> 	at akka.remote.EndpointReader$$anonfun$receive$2.applyOrElse(Endpoint.scala:928)
> 	at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
> 	at akka.remote.EndpointActor.aroundReceive(Endpoint.scala:415)
> 	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
> 	at akka.actor.ActorCell.invoke(ActorCell.scala:487)
> 	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
> 	at akka.dispatch.Mailbox.run(Mailbox.scala:221)
> 	at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
> 2016-06-30 14:45:18,502 INFO  org.apache.flink.yarn.YarnJobManager                  
       - Stopping JobManager akka.tcp://flink@172.31.23.121:45569/user/jobmanager.
> 2016-06-30 14:45:18,533 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph
       - Source: Custom File Source (1/1) (5f2a1062c796ec6098a0a88227b9eab4) switched from
RUNNING to CANCELING
> {code}
> The JobManager JVM keeps running (keeping the YARN session alive) because the web monitor
is not stopped on such errors.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message