apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: [DISCUSSION] Custom Control Tuples
Date Sat, 25 Jun 2016 16:07:43 GMT
David,
We should avoid control tuple within the window by simply restricting it
through API. This can be done by calling something like "sendControlTuple"
between windows, notably in input operators.

Thks
Amol


On Sat, Jun 25, 2016 at 7:32 AM, Munagala Ramanath <ram@datatorrent.com>
wrote:

> What would the API look like for option 1 ? Another operator callback
> called controlTuple() or does the operator code have to check each
> incoming tuple to see if it was data or control ?
>
> Ram
>
> On Fri, Jun 24, 2016 at 11:42 PM, David Yan <david@datatorrent.com> wrote:
>
> > It looks like option 1 is preferred by the community. But let me
> elaborate
> > why I brought up the option of piggy backing BEGIN and END_WINDOW
> >
> > Option 2 implicitly enforces that the operations related to the custom
> > control tuple be done at the streaming window boundary.
> >
> > For most operations, it makes sense to have that enforcement. Option 1
> > opens the door to the possibility of sending and handling control tuples
> > within a window, thus imposing a challenge of ensuring idempotency. In
> > fact, allowing that would make idempotency extremely difficult to
> achieve.
> >
> > David
> >
> > On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov <v.rozov@datatorrent.com>
> > wrote:
> >
> > > +1 for option 1.
> > >
> > > Thank you,
> > >
> > > Vlad
> > >
> > >
> > > On 6/24/16 14:35, Bright Chen wrote:
> > >
> > >> +1
> > >> It also can help to Shutdown the application gracefully.
> > >> Bright
> > >>
> > >> On Jun 24, 2016, at 1:35 PM, Siyuan Hua <siyuan@datatorrent.com>
> wrote:
> > >>>
> > >>> +1
> > >>>
> > >>> I think it's good to have custom control tuple and I prefer the 1
> > option.
> > >>>
> > >>> Also I think we should think about couple different callbacks, that
> > could
> > >>> be operator level(triggered when an operator receives an control
> tuple)
> > >>> or
> > >>> dag level(triggered when control tuple flow over the whole dag)
> > >>>
> > >>> Regards,
> > >>> Siyuan
> > >>>
> > >>>
> > >>>
> > >>>
> > >>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan <david@datatorrent.com>
> > >>> wrote:
> > >>>
> > >>> My initial thinking is that the custom control tuples, just like the
> > >>>> existing control tuples, will only be generated from the input
> > operators
> > >>>> and will be propagated downstream to all operators in the DAG.
So
> the
> > >>>> NxM
> > >>>> partitioning scenario works just like how other control tuples
work,
> > >>>> i.e.
> > >>>> the callback will not be called unless all ports have received
the
> > >>>> control
> > >>>> tuple for a particular window. This creates a little bit of
> > complication
> > >>>> with multiple input operators though.
> > >>>>
> > >>>> David
> > >>>>
> > >>>>
> > >>>> On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi <
> > tushar@datatorrent.com
> > >>>> >
> > >>>> wrote:
> > >>>>
> > >>>> +1 for the feature
> > >>>>>
> > >>>>> I am in favor of option 1, but we may need an helper method
to
> avoid
> > >>>>> compiler error on typed port, as calling port.emit(controlTuple)
> will
> > >>>>> be an error if type of control tuple and port does not match.
or
> new
> > >>>>> method in outputPort object , emitControlTuple(ControlTuple).
> > >>>>>
> > >>>>> Can you give example of piggy backing tuple with current
> BEGIN_WINDOW
> > >>>>> and END_WINDOW control tuples?
> > >>>>>
> > >>>>> In case of NxM partitioning, each downstream operator will
receive
> N
> > >>>>> control tuples. will it call user handler N times for each
> downstream
> > >>>>> operator or just once.
> > >>>>>
> > >>>>> Regards,
> > >>>>> - Tushar.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jun 24, 2016 at 11:52 PM, David Yan <david@datatorrent.com
> >
> > >>>>>
> > >>>> wrote:
> > >>>>
> > >>>>> Hi all,
> > >>>>>>
> > >>>>>> I would like to propose a new feature to the Apex core
engine --
> the
> > >>>>>> support of custom control tuples. Currently, we have control
> tuples
> > >>>>>>
> > >>>>> such
> > >>>>
> > >>>>> as
> > >>>>>
> > >>>>>> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so on, but we
don't have
> > the
> > >>>>>> support for applications to insert their own control tuples.
The
> way
> > >>>>>> currently to get around this is to use data tuples and
have a
> > separate
> > >>>>>>
> > >>>>> port
> > >>>>>
> > >>>>>> for such tuples that sends tuples to all partitions of
the
> > downstream
> > >>>>>> operators, which is not exactly developer friendly.
> > >>>>>>
> > >>>>>> We have already seen a number of use cases that can use
this
> > feature:
> > >>>>>>
> > >>>>>> 1) Batch support: We need to tell all operators of the
physical
> DAG
> > >>>>>>
> > >>>>> when
> > >>>>
> > >>>>> a
> > >>>>>
> > >>>>>> batch starts and ends, so the operators can do whatever
that is
> > needed
> > >>>>>>
> > >>>>> upon
> > >>>>>
> > >>>>>> the start or the end of a batch.
> > >>>>>>
> > >>>>>> 2) Watermark: To support the concepts of event time windowing,
the
> > >>>>>> watermark control tuple is needed to tell which windows
should be
> > >>>>>> considered late.
> > >>>>>>
> > >>>>>> 3) Changing operator properties: We do have the support
of
> changing
> > >>>>>> operator properties on the fly, but with a custom control
tuple,
> the
> > >>>>>> command to change operator properties can be window aligned
for
> all
> > >>>>>> partitions and also across the DAG.
> > >>>>>>
> > >>>>>> 4) Recording tuples: Like changing operator properties,
we do have
> > >>>>>> this
> > >>>>>> support now but only at the individual physical operator
level,
> and
> > >>>>>>
> > >>>>> without
> > >>>>>
> > >>>>>> control of which window to record tuples for. With a custom
> control
> > >>>>>>
> > >>>>> tuple,
> > >>>>>
> > >>>>>> because a control tuple must belong to a window, all operators
in
> > the
> > >>>>>>
> > >>>>> DAG
> > >>>>
> > >>>>> can start (and stop) recording for the same windows.
> > >>>>>>
> > >>>>>> I can think of two options to achieve this:
> > >>>>>>
> > >>>>>> 1) new custom control tuple type that takes user's serializable
> > >>>>>> object.
> > >>>>>>
> > >>>>>> 2) piggy back the current BEGIN_WINDOW and END_WINDOW control
> > tuples.
> > >>>>>>
> > >>>>>> Please provide your feedback. Thank you.
> > >>>>>>
> > >>>>>> David
> > >>>>>>
> > >>>>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message