flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthias Pohl <matth...@ververica.com>
Subject Re: Triggering Savepoint fails to write data to S3 store
Date Fri, 28 May 2021 15:49:26 GMT
Yes, that would work. But it might be still interesting to understand why
you ran into the timeout. Was it just a big state that just took longer
than expected? Or some network issue? ...that's just for you to understand
the underlying issue in a better way. But I'm glad the savepoint creation
was successful in the end.

Best,
Matthias

On Fri, May 28, 2021 at 2:35 PM Robert Cullen <cinquaterra@gmail.com> wrote:

> Hi Matthias,  You are correct.  After a few minutes I took another look at
> my savepoint folder and the data was there.  I think increasing the timeout
> may resolve the problem?
>
> On Fri, May 28, 2021 at 8:21 AM Matthias Pohl <matthias@ververica.com>
> wrote:
>
>> Hi Robert,
>> it would be interesting to see the corresponding taskmanager/jobmanager
>> logs. That would help in finding out why the savepoint creation failed.
>> Just to verify: The savepoint data wasn't written to S3 even after the
>> timeout happened, was it?
>>
>> Best,
>> Matthias
>>
>> On Thu, May 27, 2021 at 7:50 PM Robert Cullen <cinquaterra@gmail.com>
>> wrote:
>>
>>> I triggered a savepoint from a currently running job. Although the
>>> directory structure gets created in the MINIO S3 store, the command
>>> ultimately fails without writing the data.
>>>
>>> root@flink-client:/opt/flink# ./bin/flink list --target kubernetes-session -Dkubernetes.cluster-id=flink-jobmanager
-Dkubernetes.namespace=cmdaa
>>> 2021-05-27 17:37:00,409 INFO  org.apache.flink.kubernetes.KubernetesClusterDescriptor
     [] - Retrieve flink cluster flink-jobmanager successfully, JobManager Web Interface:
http://flink-jobmanager-rest.cmdaa:8081
>>> Waiting for response...
>>> ------------------ Running/Restarting Jobs -------------------
>>> 27.05.2021 16:50:00 : 72f614340dc1a7416d0613362d1ef83b : Streaming Log Count
(RUNNING)
>>> --------------------------------------------------------------
>>> No scheduled jobs.
>>> root@flink-client:/opt/flink# ./bin/flink savepoint 72f614340dc1a7416d0613362d1ef83b
--target kubernetes-session -Dkubernetes.cluster-id=flink-jobmanager -Dkubernetes.namespace=cmdaa
>>> 2021-05-27 17:37:58,776 INFO  org.apache.flink.kubernetes.KubernetesClusterDescriptor
     [] - Retrieve flink cluster flink-jobmanager successfully, JobManager Web Interface:
http://flink-jobmanager-rest.cmdaa:8081
>>> Triggering savepoint for job 72f614340dc1a7416d0613362d1ef83b.
>>> Waiting for response...
>>>
>>> ------------------------------------------------------------
>>>  The program finished with the following exception:
>>>
>>> org.apache.flink.util.FlinkException: Triggering a savepoint for the job 72f614340dc1a7416d0613362d1ef83b
failed.
>>>         at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:777)
>>>         at org.apache.flink.client.cli.CliFrontend.lambda$savepoint$9(CliFrontend.java:754)
>>>         at org.apache.flink.client.cli.CliFrontend.runClusterAction(CliFrontend.java:1002)
>>>         at org.apache.flink.client.cli.CliFrontend.savepoint(CliFrontend.java:751)
>>>         at org.apache.flink.client.cli.CliFrontend.parseAndRun(CliFrontend.java:1072)
>>>         at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:1132)
>>>         at org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:28)
>>>         at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1132)
>>> Caused by: java.util.concurrent.TimeoutException
>>>         at java.util.concurrent.CompletableFuture.timedGet(CompletableFuture.java:1784)
>>>         at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1928)
>>>         at org.apache.flink.client.cli.CliFrontend.triggerSavepoint(CliFrontend.java:771)
>>>         ... 7 more
>>> root@flink-client:/opt/flink#
>>>
>>> --
>>> Robert Cullen
>>> 240-475-4490
>>>
>>
>
> --
> Robert Cullen
> 240-475-4490
>

Mime
View raw message