flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nico Kruber <n...@data-artisans.com>
Subject Re: Akka Quarantine & Old YARN Versions
Date Thu, 03 Aug 2017 15:11:48 GMT
Hi Konstantin,
I digged through the linked pull requests (of https://issues.apache.org/jira/
browse/FLINK-3347) a bit just to notice that the fix-version tag was wrong 
(should have been 1.2.1, not 1.2.0) but you have that already.

In there, it was also mentioned that the quarantine monitor is disabled by 
default and can be enabled by setting `taskmanager.exit-on-fatal-akka-error` 
to true. If enabled, it should detect a quarantined task manager and shut it 
down. In that case, YARN should notice it and start a new one, if I'm not 
mistaken.

Are you already working with `taskmanager.exit-on-fatal-akka-error` enabled?


Nico

On Thursday, 3 August 2017 10:53:00 CEST Konstantin Knauf wrote:
> Hi everyone,
> 
> we are running Flink 1.2.1 on YARN 2.4 (I know, way to old :().
> Correlated with the last Flink Upgrade from 1.1.3 -> 1.2.1 we are
> experiencing regular TaskManager failures due to
> 
> [Taskmanager Logs]
> 2017-07-10 15:25:26,448 ERROR Remoting
>                    - Association to
> [akka.tcp://flink@<jobmanager>:45303] with UID [-382428140]
> irrecoverably failed. Quarantining address.
> java.lang.IllegalStateException: Error encountered while processing
> system message acknowledgement buffer: [1 {0, 1}] ack: ACK[3, {}]
>         at
> akka.remote.ReliableDeliverySupervisor$$anonfun$receive$1.applyOrElse(Endpoi
> nt.scala:289) at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
>         at ...
> 
> As far as I understand https://issues.apache.org/jira/browse/FLINK-3345
> the taskmanager should be restarted in this case. In our case YARN does
> not start a new taskmanager container, but the container is just missing
> indefinitely. Is it known, that this does not work on YARN 2.4?
> 
> If it helps, I can also provide the full job and taskmanager logs...
> 
> Cheers & Thanks,
> 
> Konstantin


Mime
View raw message