apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Farkas <...@datatorrent.com>
Subject Re: operator recovery window
Date Tue, 15 Dec 2015 20:01:02 GMT
Hi Ashwin,

In your example, if A fails the recovery windows would be

D - 15
C - 15
B - 15
A - 15

If C fails the recovery windows would be

D -15
C -15
B - 25
A - 30

If every operator just reached window 30 and checkpointed, the committed
window would be 25, and all the checkpoints before window 30 would be
purged, but the checkpoint for window 30 would not be purged.

Thanks,
Tim

On Tue, Dec 15, 2015 at 11:41 AM, Ashwin Chandra Putta <
ashwinchandrap@gmail.com> wrote:

> Tim,
>
> Thanks, that is pretty much inline with what I was thinking. A little
> different thought though in terms of picking the checkpoint based on
> downstream operators. For A, is it not going to be "the checkpoint with the
> largest window id that is less than or equal to the checkpoint with the
> largest common window id (instead of largest window id) among all the
> operators down stream to A"
>
> For example,
>
> If A -> B -> C -> D is the dag. And say, the checkpoint window count is 5
> and the largest checkpoints are as follows.
>
> A - 30
> B - 25
> C - 20
> D - 15
>
> Does A recover at 25 (checkpoint with largest window id) or 15 (checkpoint
> with largest common window id)?
>
> Also, regarding recovering at committed window id. Is it not possible in
> the following scenario where all operators have checkpointed at 30 and got
> the committed window call back. And then an operator fails before any
> operator checkpoints further. In that case, the recovery window is 30
> right?
>
> Regards,
> Ashwin.
>
> On Mon, Dec 14, 2015 at 11:58 PM, Timothy Farkas <tim@datatorrent.com>
> wrote:
>
> > Hi Ashwin,
> >
> > The recovery checkpoint for operator A is computed by taking the
> checkpoint
> > with the largest window id that is less than or equal to the checkpoint
> > with the largest window id among all the operators down stream to A. The
> > output operators in a dag will always recover to their most recent
> > checkpoint. The input operator of the dag may recover to the earliest
> > checkpoint. Operators between the input and ouput operators could recover
> > to a window in between.
> >
> > I don't think you can ever recover to a committed window, the earliest I
> > think you can recover to is the window after the committed window (may be
> > wrong on this).
> >
> > On Mon, Dec 14, 2015 at 11:05 PM, Ashwin Chandra Putta <
> > ashwinchandrap@gmail.com> wrote:
> >
> > > In the apex architecture there is concept of checkpointing and concept
> of
> > > committed when all operator have crossed a common checkpoint.
> > >
> > > So, in which scenarios does a given operator recover at last checkpoint
> > > window vs last committed window vs some other checkpoint window in
> > between?
> > > --
> > >
> > > Regards,
> > > Ashwin.
> > >
> >
>
>
>
> --
>
> Regards,
> Ashwin.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message