apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Farkas <...@datatorrent.com>
Subject Re: operator recovery window
Date Tue, 15 Dec 2015 21:23:06 GMT
Yes 25 would be purged, but the operator would never get restored to that
window.

On Tue, Dec 15, 2015 at 12:37 PM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Tim,
>
> You mean 25 is purged too?
>
> Regards,
> Ashwin.
>
> On Tue, Dec 15, 2015 at 12:01 PM, Timothy Farkas <tim@datatorrent.com>
> wrote:
>
> > Hi Ashwin,
> >
> > In your example, if A fails the recovery windows would be
> >
> > D - 15
> > C - 15
> > B - 15
> > A - 15
> >
> > If C fails the recovery windows would be
> >
> > D -15
> > C -15
> > B - 25
> > A - 30
> >
> > If every operator just reached window 30 and checkpointed, the committed
> > window would be 25, and all the checkpoints before window 30 would be
> > purged, but the checkpoint for window 30 would not be purged.
> >
> > Thanks,
> > Tim
> >
> > On Tue, Dec 15, 2015 at 11:41 AM, Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com> wrote:
> >
> > > Tim,
> > >
> > > Thanks, that is pretty much inline with what I was thinking. A little
> > > different thought though in terms of picking the checkpoint based on
> > > downstream operators. For A, is it not going to be "the checkpoint with
> > the
> > > largest window id that is less than or equal to the checkpoint with the
> > > largest common window id (instead of largest window id) among all the
> > > operators down stream to A"
> > >
> > > For example,
> > >
> > > If A -> B -> C -> D is the dag. And say, the checkpoint window count
> is 5
> > > and the largest checkpoints are as follows.
> > >
> > > A - 30
> > > B - 25
> > > C - 20
> > > D - 15
> > >
> > > Does A recover at 25 (checkpoint with largest window id) or 15
> > (checkpoint
> > > with largest common window id)?
> > >
> > > Also, regarding recovering at committed window id. Is it not possible
> in
> > > the following scenario where all operators have checkpointed at 30 and
> > got
> > > the committed window call back. And then an operator fails before any
> > > operator checkpoints further. In that case, the recovery window is 30
> > > right?
> > >
> > > Regards,
> > > Ashwin.
> > >
> > > On Mon, Dec 14, 2015 at 11:58 PM, Timothy Farkas <tim@datatorrent.com>
> > > wrote:
> > >
> > > > Hi Ashwin,
> > > >
> > > > The recovery checkpoint for operator A is computed by taking the
> > > checkpoint
> > > > with the largest window id that is less than or equal to the
> checkpoint
> > > > with the largest window id among all the operators down stream to A.
> > The
> > > > output operators in a dag will always recover to their most recent
> > > > checkpoint. The input operator of the dag may recover to the earliest
> > > > checkpoint. Operators between the input and ouput operators could
> > recover
> > > > to a window in between.
> > > >
> > > > I don't think you can ever recover to a committed window, the
> earliest
> > I
> > > > think you can recover to is the window after the committed window
> (may
> > be
> > > > wrong on this).
> > > >
> > > > On Mon, Dec 14, 2015 at 11:05 PM, Ashwin Chandra Putta <
> > > > ashwinchandrap@gmail.com> wrote:
> > > >
> > > > > In the apex architecture there is concept of checkpointing and
> > concept
> > > of
> > > > > committed when all operator have crossed a common checkpoint.
> > > > >
> > > > > So, in which scenarios does a given operator recover at last
> > checkpoint
> > > > > window vs last committed window vs some other checkpoint window in
> > > > between?
> > > > > --
> > > > >
> > > > > Regards,
> > > > > Ashwin.
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message