flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: HA failing for 1.6.0 job cluster with docker-compose
Date Thu, 20 Sep 2018 08:34:54 GMT
Hi Tzanko,

in order to make the container entrypoint properly work with HA, we need to
fix the JobID (see https://issues.apache.org/jira/browse/FLINK-10291). At
the moment, we generate a new JobID for every restart of the cluster
entrypoint container. Due to that the system cannot find the existing
checkpoints.

Fixing the JobID is not a big deal and it should be fixed with the next bug
fix release.

Cheers,
Till

On Thu, Sep 20, 2018 at 10:12 AM vino yang <yanghua1127@gmail.com> wrote:

> Hi Tzanko,
>
> Maybe Till is more appropriate to answer this question.
>
> Thanks, vino.
>
> Tzanko Matev <tsanko@gmail.com> 于2018年9月19日周三 下午5:47写道:
>
>> Dear all,
>>
>> I am currently experimenting with a Flink 1.6.0 job cluster. The goal is
>> to run a streaming job on K8s. Right now I am using docker-compose to
>> experiment with the job cluster.
>>
>> I am trying to set-up HA with Zookeeper, but I seem to fail. I have a
>> docker-compose file which contains the following services:
>> - Zookeeper
>> - Flink job manager
>> - Flink task manager
>>
>> The containers are set up as per the documentation for docker-compose,
>> but I have also set up the necessary HA settings in the conf file. However,
>> when I kill the job manager container and start it again, the job being
>> processed does not recover but always starts from scratch. Instead I get
>> the following error:
>>
>> > ERROR org.apache.flink.runtime.rest.handler.job.JobDetailsHandler  -
>> Could not retrieve the redirect address.
>> >
>> > java.util.concurrent.CompletionException:
>> org.apache.flink.runtime.rpc.exceptions.FencingTokenException: Fencing
>> token not set: Ignoring message
>> LocalFencedMessage(8c4887f5c13f6d907d82a55d97ac428f,
>> LocalRpcInvocation(requestRestAddress(Time))) sent to
>> akka.tcp://flink@blockprocessor-job-cluster:50000/user/dispatcher
>> because the fencing token is null.
>>
>> Am I missing something? Is HA implemented for job clusters at all?
>>
>> Best wishes,
>> Tzanko Matev
>>
>>

Mime
View raw message