apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Atri Sharma <atri.j...@gmail.com>
Subject Re: Window Transactions
Date Sun, 30 Aug 2015 16:05:12 GMT
Thanks Thomas and everyone.

I reflect with what Thomas explained and I have a few points to be added on
it.

I now understand that the ability to process or commit a transaction is
present. I am trying to understand following use cases:

1) a window is defined. The operator either needs to process the entire
window or none at all. Can we have this functionality now?

2) checkpoints within a window. If we fail, we can send the last seen
checkpoint to the source (if source can handle it) and ask for data further
that point.
On 29 Aug 2015 04:38, "Thomas Weise" <thomas@datatorrent.com> wrote:

> Atri,
>
> A concept of "transactional window" is needed for some applications that
> interact with external systems. A number of Malhar operators support it
> today. For example, a JDBC operator might perform all operations within a
> transaction that commences with the first write in a window and endWindow
> will commit the transaction. The engine provides the callbacks, the
> operator implements the transaction based on the capabilities of the
> external system. Note that this does not imply batching, it merely speaks
> to transaction demarcation.
>
> But this is just part of the work needed to make the operator
> "transactional". Windows can be reprocessed based on the processing
> semantics. When a container goes down, the operator will reset to the
> recovery checkpoint and reprocess the windows from the checkpoint till the
> point where the failure occurred. Unless the processing done by the
> operator is idempotent, this would lead to incorrect results. For example,
> if the operation was "UPDATE sometable SET count = count + 1", we would
> double count.
>
> One technique to deal with this is to maintain the windowId as part of the
> state that gets committed to the external system. Now we can skip the
> processing if we find that the window was already processed. Of course,
> this requires that the upstream operators also deliver the tuples in an
> idempotent manner on a window replay.
>
> Thomas
>
> On Fri, Aug 28, 2015 at 2:14 PM, Chetan Narsude <chetan@datatorrent.com>
> wrote:
>
> > Atri,
> >
> >   BEGIN_WINDOW, and END_WINDOW control events demarcate the the
> > transaction. We do not hold the first event after BEGIN_WINDOW hostage
> > until the END_WINDOW is received. This allows us to provide almost zero
> > latency at per tuple level. This is one of the the differentiating
> > paradigms for Apex.
> >
> >   If we do it otherwise - the platform degrades to micro-batch processing
> > mode. More details about it here:
> >
> >
> >
> https://www.datatorrent.com/real-time-event-stream-processing-what-are-your-choices/
> >
> >
> >  Let me know if this answers your question or I misunderstood the
> question.
> >
> > --
> > Chetan
> >
> >
> >
> > On Fri, Aug 28, 2015 at 1:37 PM, Atri Sharma <atri@apache.org> wrote:
> >
> > > Team,
> > >
> > > Does it make sense to have functionality to have all or nothing
> > > transactional system for windows? With future functionality to have
> > dynamic
> > > operators I feel it makes sense to allow data from an entire window to
> be
> > > processed or none of the data to be sent.
> > >
> > > I am not sure if window batching in its current form is a logical
> > > implementation of this feature.
> > >
> > > Thoughts?
> > >
> > > Regards,
> > >
> > > Atri
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message