flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ron Crocker <rcroc...@newrelic.com>
Subject Re: Flink rolling upgrade support
Date Thu, 22 Dec 2016 16:21:13 GMT
Hi Stephan -

I agree that the savepoint-shutdown-restart model is nominally the same as the rolling restart
with one notable exception - a lack of atomicity. There is a gap between invoking the savepoint
command and the shutdown command. My problem isn’t fortunate enough to have idempotent operations:
replaying events ends up double-counting. With the current model (or at least as far as I
can tell from the documentation you linked) I will double-process some events that are slightly
after the savepoint.

One thing that could alleviate this is an atomic shutdown-with-savepoint (or savepoint-with-shutdown,
I’m not so picky about which way it is, I only want it to be atomic). With this, I can be
assured that the savepoint matches the actual last-processed state. 

My understanding of the processing within Flink is that this could be modeled by a “savepoint”
event followed by a “shutdown” event into the event stream, but my understanding is a
bit cartoonish so I’m sure it’s more involved.

Ron Crocker
Principal Engineer & Architect
( ( •)) New Relic
M: +1 630 363 8835

> On Dec 20, 2016, at 12:40 PM, Stephan Ewen <sewen@apache.org> wrote:
> Hi Andrew!
> Would be great to know if what Aljoscha described works for you. Ideally, this costs
no more than a failure/recovery cycle, which one typically also gets with rolling upgrades.
> Best,
> Stephan

View raw message