flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bhumika Bayani (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-8624) flink-mesos: The flink rest-api sometimes becomes unresponsive
Date Fri, 09 Feb 2018 11:00:00 GMT
Bhumika Bayani created FLINK-8624:
-------------------------------------

             Summary: flink-mesos: The flink rest-api sometimes becomes unresponsive
                 Key: FLINK-8624
                 URL: https://issues.apache.org/jira/browse/FLINK-8624
             Project: Flink
          Issue Type: Bug
    Affects Versions: 1.3.2
            Reporter: Bhumika Bayani


Sometimes flink-mesos-scheduler fails/get killed, and marathon brings it up again on some
other node. Sometimes we have observed, the rest-api of the newly created flink instance becomes
unresponsive.

Even if we execute api calls manually with curl, such as 

http://<host>:<port>/overview or http://<host>:<port>/config

we do not receive any response. 

We submit and execute all our flink-jobs using rest-api only. So if rest api becomes un-responsive,
that stops us from running any of the flink jobs and no stream processing happens. 

We tried enabling flink debug logs, but we did not observer anything specific that indicates
why rest api is failing/unresponsive.

We see below exceptions in logs but that is not specific to case when flink-api is hung.
We see them in healthy flink-scheduler too: 

 
{code:java}
Timestamp=2018-02-08 05:43:49,175 LogLevel=INFO
        ThreadId=[Checkpoint Timer] Class=o.a.f.r.c.CheckpointCoordinator Msg=Triggering
checkpoint 10181 @ 1518068629174
Timestamp=2018-02-08 05:43:49,183 LogLevel=DEBUG
        ThreadId=[nioEventLoopGroup-5-3] Class=o.a.f.r.w.WebRuntimeMonitor Msg=Unhandled
exception: {}
akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka://flink/user/jobmanager#753807801]]
after [10000 ms]
        at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)
~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601)
~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599)
~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:474)
~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:425)
~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:429)
~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:381) ~[flink-dist_2.11-1.4-SNAPSHOT.jar:1.4-SNAPSHOT]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
{code}
 

During the time rest api is unresponsive, we have observed flink web UI too does not load/show
any information. 

Restarting the flink-scheduler solves this issue sometimes. 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message