flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <till.rohrm...@gmail.com>
Subject Re: akka timeout
Date Tue, 22 Aug 2017 08:21:11 GMT
Hi Steven,

quick correction for Flink 1.2. Indeed the MetricFetcher does not pick up
the right timeout value from the configuration. Instead it uses a hardcoded
10s timeout. This has only been changed recently and is already committed
in the master. So with the next release 1.4 it will properly pick up the
right timeout settings.

Just out of curiosity, what's the instability issue you're observing?

Cheers,
Till

On Fri, Aug 18, 2017 at 7:07 PM, Steven Wu <stevenz3wu@gmail.com> wrote:

> Till/Chesnay, thanks for the answers. Look like this is a result/symptom
> of underline stability issue that I am trying to track down.
>
> It is Flink 1.2.
>
> On Fri, Aug 18, 2017 at 12:24 AM, Chesnay Schepler <chesnay@apache.org>
> wrote:
>
>> The MetricFetcher always use the default akka timeout value.
>>
>>
>> On 18.08.2017 09:07, Till Rohrmann wrote:
>>
>> Hi Steven,
>>
>> I thought that the MetricFetcher picks up the right timeout from the
>> configuration. Which version of Flink are you using?
>>
>> The timeout is not a critical problem for the job health.
>>
>> Cheers,
>> Till
>>
>> On Fri, Aug 18, 2017 at 7:22 AM, Steven Wu <stevenz3wu@gmail.com> wrote:
>>
>>>
>>> We have set akka.ask.timeout to 60 s in yaml file. I also confirmed the
>>> setting in Flink UI. But I saw akka timeout of 10 s for metric query
>>> service. two questions
>>> 1) why doesn't metric query use the 60 s value configured in yaml file?
>>> does it always use default 10 s value?
>>> 2) could this cause heartbeat failure between task manager and job
>>> manager? or is this jut non-critical failure that won't affect job health?
>>>
>>> Thanks,
>>> Steven
>>>
>>> 2017-08-17 23:34:33,421 WARN org.apache.flink.runtime.webmonitor.metrics.MetricFetcher
>>> - Fetching metrics failed. akka.pattern.AskTimeoutException: Ask timed
>>> out on [Actor[akka.tcp://flink@1.2.3.4:39139/user/MetricQueryServic
>>> e_23cd9db754bb7d123d80e6b1c0be21d6]] after [10000 ms] at
>>> akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:334)
>>> at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at
>>> scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:599)
>>> at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109)
>>> at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:597)
>>> at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:474)
>>> at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:425)
>>> at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:429)
>>> at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:381)
>>> at java.lang.Thread.run(Thread.java:748)
>>>
>>
>>
>>
>

Mime
View raw message