flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: Force enabling checkpoints for iterative streaming jobs
Date Wed, 10 Jun 2015 08:19:05 GMT
I disagree. Not having checkpointed operators inside the iteration still
breaks the guarantees.

It is not about the states it is about the loop itself.
On Wed, Jun 10, 2015 at 10:12 AM Aljoscha Krettek <aljoscha@apache.org>
wrote:

> This is the answer I gave on the PR (we should have one place for
> discussing this, though):
>
> I would be against merging this in the current form. What I propose is
> to analyse the topology to verify that there are no checkpointed
> operators inside iterations. Operators before and after iterations can
> be checkpointed and we can safely allow the user to enable
> checkpointing.
>
> If we have the code to analyse which operators are inside iterations
> we could also disallow windows inside iterations. I think windows
> inside iterations don't make sense since elements in different
> "iterations" would end up in the same window. Maybe I'm wrong here
> though, then please correct me.
>
> On Wed, Jun 10, 2015 at 10:08 AM, Márton Balassi
> <balassi.marton@gmail.com> wrote:
> > I agree that for the sake of the above mentioned use cases it is
> reasonable
> > to add this to the release with the right documentation, for machine
> > learning potentially loosing one round of feedback data should not
> matter.
> >
> > Let us not block prominent users until the next release on this.
> >
> > On Wed, Jun 10, 2015 at 8:09 AM, Gyula Fóra <gyula.fora@gmail.com>
> wrote:
> >
> >> As for people currently suffering from it:
> >>
> >> An application King is developing requires iterations, and they need
> >> checkpoints. Practically all SAMOA programs would need this.
> >>
> >> It is very likely that the state interfaces will be changed after the
> >> release, so this is not something that we can just add later. I don't
> see a
> >> reason why we should not add it, as it is clearly documented. In this
> >> actual case not having guarantees at all means people will never use it
> in
> >> any production system. Having limited guarantees means that it will
> depend
> >> on the application.
> >>
> >> On Wed, Jun 10, 2015 at 12:53 AM, Ufuk Celebi <uce@apache.org> wrote:
> >>
> >> > Hey Gyula,
> >> >
> >> > I understand your reasoning, but I don't think its worth to rush this
> >> into
> >> > the release.
> >> >
> >> > As you've said, we cannot give precise guarantees. But this is
> arguably
> >> > one of the key requirements for any fault tolerance mechanism.
> Therefore
> >> I
> >> > disagree that this is better than not having anything at all. I think
> it
> >> > will already go a long way to have the non-iterative case working
> >> reliably.
> >> >
> >> > And as far as I know there are no users really suffering from this at
> the
> >> > moment (in the sense that someone has complained on the mailing list).
> >> >
> >> > Hence, I vote to postpone this.
> >> >
> >> > – Ufuk
> >> >
> >> > On 10 Jun 2015, at 00:19, Gyula Fóra <gyfora@apache.org> wrote:
> >> >
> >> > > Hey all,
> >> > >
> >> > > It is currently impossible to enable state checkpointing for
> iterative
> >> > > jobs, because en exception is thrown when creating the jobgraph.
> This
> >> > > behaviour is motivated by the lack of precise guarantees that we can
> >> give
> >> > > with the current fault-tolerance implementations for cyclic graphs.
> >> > >
> >> > > This PR <https://github.com/apache/flink/pull/812> adds an optional
> >> > flag to
> >> > > force checkpoints even in case of iterations. The algorithm will
> take
> >> > > checkpoints periodically as before, but records in transit inside
> the
> >> > loop
> >> > > will be lost.
> >> > >
> >> > > However even this guarantee is enough for most applications (Machine
> >> > > Learning for instance) and certainly much better than not having
> >> anything
> >> > > at all.
> >> > >
> >> > >
> >> > > I suggest we add this to the 0.9 release as currently many
> applications
> >> > > suffer from this limitation (SAMOA, ML pipelines, graph streaming
> etc.)
> >> > >
> >> > >
> >> > > Cheers,
> >> > >
> >> > > Gyula
> >> >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message