apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timothy Farkas <...@datatorrent.com>
Subject Re: Why is Async checkpointing made default?
Date Mon, 23 Nov 2015 06:23:40 GMT
Gaurav,

I think if the state copy fails then STRAM should roll back the operator to
a checkpoint that is further back than the last checkpoint. If you are
saying that you want to preserve the semantic that checkpointed is only
called after a checkpoint is completed, I would argue that that guarantee
is already pointless in the current implementation since it is always
possible for an operator to be rolled back to a checkpoint before it's last
completed checkpoint. So, it is already currently possible for some
database or file operation performed after a completed checkpoint to be
redone after a failure. Because of this I think Thomas's solution makes the
most sense. Thomas's solution would also address Chandni's original point
that the semantics for the checkpointed call back have been violated. There
are operators in our libraries that have depended on the beginWindow(x),
endWindow(x), and checkpointed(x) call sequence, which is now broken. We
should probably fix that.

Tim

On Sun, Nov 22, 2015 at 10:02 PM, Gaurav Gupta <gaurav@datatorrent.com>
wrote:

> Thomas,
>
> This was done to preserve checkpointing semantics that is to tell the
> operator that its state is preserved. Say if database is updated or files
> are moved in checkpointed call but the state copy fails, how to address
> such scenarios?
>
> Thanks
> - Gaurav
>
> > On Nov 22, 2015, at 9:44 PM, Thomas Weise <thomas@datatorrent.com>
> wrote:
> >
> > Alternatively I would ask why the checkpointed callback needs to wait
> until
> > the data was copied to HDFS instead upon completion of the state
> > serialization.
> >
> > Thomas
> >
> >
> > On Sun, Nov 22, 2015 at 9:41 PM, Chandni Singh <chandni@datatorrent.com>
> > wrote:
> >
> >> Gaurav,
> >>
> >> My question is about why Async was made the default when it changed the
> >> semantics of operator callbacks. Your response doesn't answer that.
> >>
> >> In a way we broke backward compatibility.
> >>
> >> Chandni
> >>
> >> On Sun, Nov 22, 2015 at 9:22 PM, Gaurav Gupta <gaurav@datatorrent.com>
> >> wrote:
> >>
> >>> The idea behind Async checkpointing is to unblock operator while the
> >> state
> >>> is getting transferred to HDFS.
> >>> Just to clarify that this beginWindow (x) -> endWindow(x) ->
> checkpointed
> >>> (x-1 ) should be an ideal sequence, but if the HDFS is slow or for some
> >>> other reason transferring the state to HDFS is slow this sequence may
> not
> >>> hold true.
> >>>
> >>> Can your use case be addressed by
> >>> https://malhar.atlassian.net/browse/APEX-78 <
> >>> https://malhar.atlassian.net/browse/APEX-78>?
> >>>
> >>> Thanks
> >>> - Gaurav
> >>>
> >>>> On Nov 22, 2015, at 3:56 PM, Chandni Singh <chandni@datatorrent.com>
> >>> wrote:
> >>>>
> >>>> With Async checkpointing the checkpoint callback in CheckpointPoint
> >>>> listener is called for a previous window, that is,
> >>>> beginWindow (x) -> endWindow(x) -> checkpointed (x-1 )
> >>>>
> >>>> This feature was newly introduced. With synchronous checkpointing, the
> >>>> behavior was always
> >>>> beginWindow(x) -> endWindow(x) -> checkpointed (x)
> >>>>
> >>>> A lot of operators were written before asynchronous checkpointing was
> >>>> introduced and few of them can rely on the sequencing guaranteed by
> >>>> synchronous checkpointing.
> >>>>
> >>>> So why was Async Checkpointed made default?
> >>>>
> >>>> With how Async checkpoint is today, the complexity to handle transient
> >>>> state in checkpointed callback falls on every operator. For eg, lets
> >> say
> >>>> earlier I had a transient map which I cleared every time the
> >> checkpointed
> >>>> was called, with async checkpointing this simple task will be a lot
> >> more
> >>>> complicated.
> >>>>
> >>>> I think Async checkpointing broke the semantics of operator callbacks
> >> and
> >>>> should NOT be the default.
> >>>
> >>>
> >>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message