flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Deivasigamani <karthi...@gmail.com>
Subject Re: Checkpoint was declined (tasks not ready)
Date Mon, 09 Oct 2017 10:10:19 GMT
Hi Stephan,
    Once the job restarts due to an async io operator timeout we notice
that its checkpoints never succeed again.  But the job is running fine and
is processing data.
~
Karthik


On Mon, Oct 9, 2017 at 3:19 PM, Stephan Ewen <sewen@apache.org> wrote:

> As long as this does not appear all the time, but only once in a while, it
> should not be a problem.
> It simply means that this particular checkpoint could not be triggered,
> because some sources were not ready yet.
>
> It should try another checkpoint and then be okay.
>
>
> On Fri, Oct 6, 2017 at 4:53 PM, Karthik Deivasigamani <karthik.d@gmail.com
> > wrote:
>
>> We are using Flink 1.3.1 in Standalone mode with a HA job manager setup.
>> ~
>> Karthik
>>
>> On Fri, Oct 6, 2017 at 8:22 PM, Karthik Deivasigamani <
>> karthik.d@gmail.com> wrote:
>>
>>> Hi,
>>>     I'm noticing a weird issue with our flink streaming job. We use
>>> async io operator which makes a HTTP call and in certain cases when the
>>> async task times out, it throws an exception and causing the job to
>>> restart.
>>>
>>> java.lang.Exception: An async function call terminated with an exception. Failing
the AsyncWaitOperator.
>>> 	at org.apache.flink.streaming.api.operators.async.Emitter.output(Emitter.java:136)
>>> 	at org.apache.flink.streaming.api.operators.async.Emitter.run(Emitter.java:83)
>>> 	at java.lang.Thread.run(Thread.java:745)
>>> Caused by: java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException:
Async function call has timed out.
>>> 	at org.apache.flink.runtime.concurrent.impl.FlinkFuture.get(FlinkFuture.java:110)
>>>
>>>
>>> After the job restarts(we have a fixed restart strategy) we notice that
>>> the checkpoints start failing continuously with this message :
>>> Checkpoint was declined (tasks not ready)
>>>
>>> [image: Inline image 1]
>>>
>>> But we see the job is running, its processing data, the accumulators we
>>> have are getting incremented etc but checkpointing fails with tasks not
>>> ready message.
>>>
>>> Wanted to reach out to the community to see if anyone else has
>>> experienced this issue before?
>>> ~
>>> Karthik
>>>
>>
>>
>

Mime
View raw message