flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Flink Application on YARN failed on losing Job Manager | No recovery | Need help debug the cause from logs
Date Fri, 04 Nov 2016 13:52:46 GMT
No you don't need to manually trigger a savepoint. With HA checkpoints
are persisted externally and store a pointer in ZooKeeper to recover
them after a JobManager failure.

On Fri, Nov 4, 2016 at 2:27 PM, Josh <jofo90@gmail.com> wrote:
> I have a follow up question to this - if I'm running a job in 'yarn-cluster'
> mode with HA and then at some point the YARN application fails due to some
> hardware failure (i.e. the YARN application moves to "FINISHED"/"FAILED"
> state), how can I restore the job from the most recent checkpoint?
>
> I can use `flink run -m yarn-cluster -s s3://my-savepoints/id .....` to
> restore from a savepoint, but what if I haven't manually taken a savepoint
> recently?
>
> Thanks,
> Josh
>
> On Fri, Nov 4, 2016 at 10:06 AM, Maximilian Michels <mxm@apache.org> wrote:
>>
>> Hi Anchit,
>>
>> The documentation mentions that you need Zookeeper in addition to
>> setting the application attempts. Zookeeper is needed to retrieve the
>> current leader for the client and to filter out old leaders in case
>> multiple exist (old processes could even stay alive in Yarn). Moreover, it
>> is needed to persist the state of the application.
>>
>>
>> -Max
>>
>>
>> On Thu, Nov 3, 2016 at 7:43 PM, Anchit Jatana
>> <development.anchit@gmail.com> wrote:
>> > Hi Maximilian,
>> >
>> > Thanks for you response. Since, I'm running the application on YARN
>> > cluster
>> > using 'yarn-cluster' mode i.e. using 'flink run -m yarn-cluster ..'
>> > command.
>> > Is there anything more that I need to configure apart from setting up
>> > 'yarn.application-attempts: 10' property inside conf/flink-conf.yaml.
>> >
>> > Just wished to confirm if there is anything more that I need to
>> > configure to
>> > set up HA on 'yarn-cluster' mode.
>> >
>> > Thank you
>> >
>> > Regards,
>> > Anchit
>> >
>> >
>> >
>> > --
>> > View this message in context:
>> > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Flink-Application-on-YARN-failed-on-losing-Job-Manager-No-recovery-Need-help-debug-the-cause-from-los-tp9839p9887.html
>> > Sent from the Apache Flink User Mailing List archive. mailing list
>> > archive at Nabble.com.
>
>

Mime
View raw message