apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Yan <da...@datatorrent.com>
Subject Re: APEXMALHAR-2354 Heuristic Watermark in windowed operator
Date Tue, 29 Nov 2016 22:08:41 GMT
Chinmay:

My response is inline:

On Mon, Nov 28, 2016 at 9:47 PM, Chinmay Kolhatkar <chinmay@apache.org>
wrote:

> David,
>
> Your suggestion make sense. This will also help in the idempotency.
> I'll also change the name of interface to WatermarkGenerator.
>
> I had 2 more questions here:
> 1. AbstractWindowedOperator currently has fixedWatermark right now. Should
> that be moved out of AbstractWindowedOperator and provided that as an
> implementation of WatermarkGenerator?
>


Yes, this makes sense.


>
> 2. Should we have a setter method to this interface to provide object of
> DataStorage, retractionStorage to the implementation? This can be useful
> for any advanced implementation of watermark generation.
>
>
The storage objects can be passed as part of a context object when
constructing the WatermarkGenerator object. In the future, we can add other
stuff to the context.





> -Chinmay.
>
>
>
> On Tue, Nov 29, 2016 at 2:19 AM, David Yan <david@datatorrent.com> wrote:
>
> > +1 for the feature.
> >
> > But since having multiple watermark tuples within one streaming window is
> > not useful because WindowedOperator currently only processes watermark
> only
> > at endWindow, how about:
> >
> > void processTupleForWatermark(Tuple.WindowedTuple<InputT> tuple);
> > // to be called for each input tuple, updates the state of the impl of
> > WatermarkGenerator interface.
> >
> > ControlTuple.Watermark getWatermarkTuple();
> > // to be called at endWindow. return null if watermark is not available
> >
> > This is actually somewhat related to the separate debate on this list
> about
> > control tuples being delivered only at streaming window boundary.
> >
> > David
> >
> > On Thu, Nov 24, 2016 at 2:52 AM, Chinmay Kolhatkar <chinmay@apache.org>
> > wrote:
> >
> > > Dear Community,
> > >
> > > I'm working on adding support for heuristic watermark in Windowed
> > Operator.
> > > Heuristic watermark give users of WindowedOperator a way to logically
> > > determine whether watermark condition is met or not by inspecting the
> > > tuples received.
> > > This can act as a replacement for or way to work along with Control
> Tuple
> > > received on control port.
> > >
> > > Here is the approach I'm considering:
> > >
> > > 1. A new interface lets say "HeuristicWatermark" will be added which
> > > extends Component<Context.OperatorContext>
> > > The reason why its extended with Component is then it can follow a
> > > lifecycle.
> > >
> > > 2. This method contains a single method something like this:
> > >
> > > ControlTuple.Watermark processTupleForWatermark(
> > > Tuple.WindowedTuple<InputT>
> > > input);
> > >
> > > 3. Object of this type can optionally be set to
> AbstractWindowedOperator
> > as
> > > a plugin which identified whether watermark condition has reached.
> > >
> > > 4. If heuristicWatermark is set, processTupleForWatermark will be
> called
> > > for every received tuple and the method can return the Watermark object
> > if
> > > watermark condition is met OR return null if not so.
> > >
> > > 5. If return value of this method is non-null, then processWatermark
> > method
> > > will be called which sets the nextWatermark value. And then rest of the
> > > watermark processing can continue to happen in endWindow.
> > >
> > >
> > > Please share your opinion on above approach.
> > >
> > > Thanks,
> > > Chinmay.
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message