flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Running JobManager as Deployment instead of Job
Date Mon, 11 Feb 2019 08:28:29 GMT
Hi Vishal,

you can also keep the same cluster id when cancelling a job with savepoint
and then resuming a new job from it. Terminating the job should clean up
all state in Zk.

Cheers,
Till

On Fri, Feb 8, 2019 at 11:26 PM Vishal Santoshi <vishal.santoshi@gmail.com>
wrote:

> In one case however, we do want to retain the same cluster id ( think
> ingress on k8s  and thus SLAs with external touch points ) but it is
> essentially a new job ( added an incompatible change but at the interface
> level it retains the same contract ) , the only way seems to be to remove
> the chroot/subcontext from ZK , and relaunch , essentially deleting ant
> vestiges of the previous incarnation. And that is fine if that is indeed
> the process.
>
>
> On Fri, Feb 8, 2019 at 7:58 AM Till Rohrmann <trohrmann@apache.org> wrote:
>
>> If you keep the same cluster id, the upgraded job should pick up
>> checkpoints from the completed checkpoint store. However, I would recommend
>> to take a savepoint and resume from this savepoint because then you can
>> also specify that you allow non restored state, for example.
>>
>> Cheers,
>> Till
>>
>> On Fri, Feb 8, 2019 at 11:20 AM Vishal Santoshi <
>> vishal.santoshi@gmail.com> wrote:
>>
>>> Is the rationale of using a jobID 000000* also roughly the same. As in a
>>> Flink job cluster is a single job and thus a single job id suffices ?  I am
>>> more wondering about the case when we are doing a compatible changes to a
>>> job and want to resume ( given we are in HA mode and thus have a
>>> chroot/subcontext on ZK for the job cluster ) ,  it would make no sense to
>>> give a brand new job id ?
>>>
>>> On Thu, Feb 7, 2019 at 4:42 AM Till Rohrmann <trohrmann@apache.org>
>>> wrote:
>>>
>>>> Hi Sergey,
>>>>
>>>> the rationale why we are using a K8s job instead of a deployment is
>>>> that a Flink job cluster should terminate after it has successfully
>>>> executed the Flink job. This is unlike a session cluster which should run
>>>> forever and for which a K8s deployment would be better suited.
>>>>
>>>> If in your use case a K8s deployment would better work, then I would
>>>> suggest to change the `job-cluster-job.yaml` accordingly.
>>>>
>>>> Cheers,
>>>> Till
>>>>
>>>> On Tue, Feb 5, 2019 at 4:12 PM Sergey Belikov <belikov.sergey@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> my team is currently experimenting with Flink running in Kubernetes
>>>>> (job cluster setup). And we found out that with JobManager being deployed
>>>>> as "Job" we can't just simply update certain values in job's yaml, e.g.
>>>>> spec.template.spec.containers.image (
>>>>> https://github.com/kubernetes/kubernetes/issues/48388#issuecomment-319493817).
>>>>> This causes certain troubles in our CI/CD pipelines so we are thinking
>>>>> about using "Deployment" instead of "Job".
>>>>>
>>>>> With that being said I'm wondering what was the motivation behind
>>>>> using "Job" resource for deploying JobManager? And are there any pitfalls
>>>>> related to using Deployment and not Job for JobManager?
>>>>>
>>>>> Thank you in advance.
>>>>> --
>>>>> Best regards,
>>>>> Sergey Belikov
>>>>>
>>>>

Mime
View raw message