apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chandni Singh <chan...@datatorrent.com>
Subject Re: Why is Async checkpointing made default?
Date Mon, 23 Nov 2015 07:04:01 GMT
Agreed. Thomas's solution fixes the backward incompatibility. I think we
really need to fix this.

On Sun, Nov 22, 2015 at 10:23 PM, Timothy Farkas <tim@datatorrent.com>
wrote:

> Gaurav,
>
> I think if the state copy fails then STRAM should roll back the operator to
> a checkpoint that is further back than the last checkpoint. If you are
> saying that you want to preserve the semantic that checkpointed is only
> called after a checkpoint is completed, I would argue that that guarantee
> is already pointless in the current implementation since it is always
> possible for an operator to be rolled back to a checkpoint before it's last
> completed checkpoint. So, it is already currently possible for some
> database or file operation performed after a completed checkpoint to be
> redone after a failure. Because of this I think Thomas's solution makes the
> most sense. Thomas's solution would also address Chandni's original point
> that the semantics for the checkpointed call back have been violated. There
> are operators in our libraries that have depended on the beginWindow(x),
> endWindow(x), and checkpointed(x) call sequence, which is now broken. We
> should probably fix that.
>
> Tim
>
> On Sun, Nov 22, 2015 at 10:02 PM, Gaurav Gupta <gaurav@datatorrent.com>
> wrote:
>
> > Thomas,
> >
> > This was done to preserve checkpointing semantics that is to tell the
> > operator that its state is preserved. Say if database is updated or files
> > are moved in checkpointed call but the state copy fails, how to address
> > such scenarios?
> >
> > Thanks
> > - Gaurav
> >
> > > On Nov 22, 2015, at 9:44 PM, Thomas Weise <thomas@datatorrent.com>
> > wrote:
> > >
> > > Alternatively I would ask why the checkpointed callback needs to wait
> > until
> > > the data was copied to HDFS instead upon completion of the state
> > > serialization.
> > >
> > > Thomas
> > >
> > >
> > > On Sun, Nov 22, 2015 at 9:41 PM, Chandni Singh <
> chandni@datatorrent.com>
> > > wrote:
> > >
> > >> Gaurav,
> > >>
> > >> My question is about why Async was made the default when it changed
> the
> > >> semantics of operator callbacks. Your response doesn't answer that.
> > >>
> > >> In a way we broke backward compatibility.
> > >>
> > >> Chandni
> > >>
> > >> On Sun, Nov 22, 2015 at 9:22 PM, Gaurav Gupta <gaurav@datatorrent.com
> >
> > >> wrote:
> > >>
> > >>> The idea behind Async checkpointing is to unblock operator while the
> > >> state
> > >>> is getting transferred to HDFS.
> > >>> Just to clarify that this beginWindow (x) -> endWindow(x) ->
> > checkpointed
> > >>> (x-1 ) should be an ideal sequence, but if the HDFS is slow or for
> some
> > >>> other reason transferring the state to HDFS is slow this sequence may
> > not
> > >>> hold true.
> > >>>
> > >>> Can your use case be addressed by
> > >>> https://malhar.atlassian.net/browse/APEX-78 <
> > >>> https://malhar.atlassian.net/browse/APEX-78>?
> > >>>
> > >>> Thanks
> > >>> - Gaurav
> > >>>
> > >>>> On Nov 22, 2015, at 3:56 PM, Chandni Singh <chandni@datatorrent.com
> >
> > >>> wrote:
> > >>>>
> > >>>> With Async checkpointing the checkpoint callback in CheckpointPoint
> > >>>> listener is called for a previous window, that is,
> > >>>> beginWindow (x) -> endWindow(x) -> checkpointed (x-1 )
> > >>>>
> > >>>> This feature was newly introduced. With synchronous checkpointing,
> the
> > >>>> behavior was always
> > >>>> beginWindow(x) -> endWindow(x) -> checkpointed (x)
> > >>>>
> > >>>> A lot of operators were written before asynchronous checkpointing
> was
> > >>>> introduced and few of them can rely on the sequencing guaranteed
by
> > >>>> synchronous checkpointing.
> > >>>>
> > >>>> So why was Async Checkpointed made default?
> > >>>>
> > >>>> With how Async checkpoint is today, the complexity to handle
> transient
> > >>>> state in checkpointed callback falls on every operator. For eg,
lets
> > >> say
> > >>>> earlier I had a transient map which I cleared every time the
> > >> checkpointed
> > >>>> was called, with async checkpointing this simple task will be a
lot
> > >> more
> > >>>> complicated.
> > >>>>
> > >>>> I think Async checkpointing broke the semantics of operator
> callbacks
> > >> and
> > >>>> should NOT be the default.
> > >>>
> > >>>
> > >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message