apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Weise <tho...@datatorrent.com>
Subject Re: Window Transactions
Date Fri, 28 Aug 2015 23:08:36 GMT

A concept of "transactional window" is needed for some applications that
interact with external systems. A number of Malhar operators support it
today. For example, a JDBC operator might perform all operations within a
transaction that commences with the first write in a window and endWindow
will commit the transaction. The engine provides the callbacks, the
operator implements the transaction based on the capabilities of the
external system. Note that this does not imply batching, it merely speaks
to transaction demarcation.

But this is just part of the work needed to make the operator
"transactional". Windows can be reprocessed based on the processing
semantics. When a container goes down, the operator will reset to the
recovery checkpoint and reprocess the windows from the checkpoint till the
point where the failure occurred. Unless the processing done by the
operator is idempotent, this would lead to incorrect results. For example,
if the operation was "UPDATE sometable SET count = count + 1", we would
double count.

One technique to deal with this is to maintain the windowId as part of the
state that gets committed to the external system. Now we can skip the
processing if we find that the window was already processed. Of course,
this requires that the upstream operators also deliver the tuples in an
idempotent manner on a window replay.


On Fri, Aug 28, 2015 at 2:14 PM, Chetan Narsude <chetan@datatorrent.com>

> Atri,
>   BEGIN_WINDOW, and END_WINDOW control events demarcate the the
> transaction. We do not hold the first event after BEGIN_WINDOW hostage
> until the END_WINDOW is received. This allows us to provide almost zero
> latency at per tuple level. This is one of the the differentiating
> paradigms for Apex.
>   If we do it otherwise - the platform degrades to micro-batch processing
> mode. More details about it here:
> https://www.datatorrent.com/real-time-event-stream-processing-what-are-your-choices/
>  Let me know if this answers your question or I misunderstood the question.
> --
> Chetan
> On Fri, Aug 28, 2015 at 1:37 PM, Atri Sharma <atri@apache.org> wrote:
> > Team,
> >
> > Does it make sense to have functionality to have all or nothing
> > transactional system for windows? With future functionality to have
> dynamic
> > operators I feel it makes sense to allow data from an entire window to be
> > processed or none of the data to be sent.
> >
> > I am not sure if window batching in its current form is a logical
> > implementation of this feature.
> >
> > Thoughts?
> >
> > Regards,
> >
> > Atri
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message