apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amol Kekre <a...@datatorrent.com>
Subject Re: [DISCUSSION] Custom Control Tuples
Date Sat, 25 Jun 2016 16:43:08 GMT
I did not say that "notably" does not mean "exclusive"

Thks
Amol


On Sat, Jun 25, 2016 at 9:29 AM, Sandesh Hegde <sandesh@datatorrent.com>
wrote:

> Why restrict the control tuples to input operators?
>
> On Sat, Jun 25, 2016 at 9:07 AM Amol Kekre <amol@datatorrent.com> wrote:
>
> > David,
> > We should avoid control tuple within the window by simply restricting it
> > through API. This can be done by calling something like
> "sendControlTuple"
> > between windows, notably in input operators.
> >
> > Thks
> > Amol
> >
> >
> > On Sat, Jun 25, 2016 at 7:32 AM, Munagala Ramanath <ram@datatorrent.com>
> > wrote:
> >
> > > What would the API look like for option 1 ? Another operator callback
> > > called controlTuple() or does the operator code have to check each
> > > incoming tuple to see if it was data or control ?
> > >
> > > Ram
> > >
> > > On Fri, Jun 24, 2016 at 11:42 PM, David Yan <david@datatorrent.com>
> > wrote:
> > >
> > > > It looks like option 1 is preferred by the community. But let me
> > > elaborate
> > > > why I brought up the option of piggy backing BEGIN and END_WINDOW
> > > >
> > > > Option 2 implicitly enforces that the operations related to the
> custom
> > > > control tuple be done at the streaming window boundary.
> > > >
> > > > For most operations, it makes sense to have that enforcement. Option
> 1
> > > > opens the door to the possibility of sending and handling control
> > tuples
> > > > within a window, thus imposing a challenge of ensuring idempotency.
> In
> > > > fact, allowing that would make idempotency extremely difficult to
> > > achieve.
> > > >
> > > > David
> > > >
> > > > On Fri, Jun 24, 2016 at 4:38 PM, Vlad Rozov <v.rozov@datatorrent.com
> >
> > > > wrote:
> > > >
> > > > > +1 for option 1.
> > > > >
> > > > > Thank you,
> > > > >
> > > > > Vlad
> > > > >
> > > > >
> > > > > On 6/24/16 14:35, Bright Chen wrote:
> > > > >
> > > > >> +1
> > > > >> It also can help to Shutdown the application gracefully.
> > > > >> Bright
> > > > >>
> > > > >> On Jun 24, 2016, at 1:35 PM, Siyuan Hua <siyuan@datatorrent.com>
> > > wrote:
> > > > >>>
> > > > >>> +1
> > > > >>>
> > > > >>> I think it's good to have custom control tuple and I prefer
the 1
> > > > option.
> > > > >>>
> > > > >>> Also I think we should think about couple different callbacks,
> that
> > > > could
> > > > >>> be operator level(triggered when an operator receives an
control
> > > tuple)
> > > > >>> or
> > > > >>> dag level(triggered when control tuple flow over the whole
dag)
> > > > >>>
> > > > >>> Regards,
> > > > >>> Siyuan
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>>
> > > > >>> On Fri, Jun 24, 2016 at 12:42 PM, David Yan <
> david@datatorrent.com
> > >
> > > > >>> wrote:
> > > > >>>
> > > > >>> My initial thinking is that the custom control tuples, just
like
> > the
> > > > >>>> existing control tuples, will only be generated from
the input
> > > > operators
> > > > >>>> and will be propagated downstream to all operators in
the DAG.
> So
> > > the
> > > > >>>> NxM
> > > > >>>> partitioning scenario works just like how other control
tuples
> > work,
> > > > >>>> i.e.
> > > > >>>> the callback will not be called unless all ports have
received
> the
> > > > >>>> control
> > > > >>>> tuple for a particular window. This creates a little
bit of
> > > > complication
> > > > >>>> with multiple input operators though.
> > > > >>>>
> > > > >>>> David
> > > > >>>>
> > > > >>>>
> > > > >>>> On Fri, Jun 24, 2016 at 12:03 PM, Tushar Gosavi <
> > > > tushar@datatorrent.com
> > > > >>>> >
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> +1 for the feature
> > > > >>>>>
> > > > >>>>> I am in favor of option 1, but we may need an helper
method to
> > > avoid
> > > > >>>>> compiler error on typed port, as calling
> port.emit(controlTuple)
> > > will
> > > > >>>>> be an error if type of control tuple and port does
not match.
> or
> > > new
> > > > >>>>> method in outputPort object , emitControlTuple(ControlTuple).
> > > > >>>>>
> > > > >>>>> Can you give example of piggy backing tuple with
current
> > > BEGIN_WINDOW
> > > > >>>>> and END_WINDOW control tuples?
> > > > >>>>>
> > > > >>>>> In case of NxM partitioning, each downstream operator
will
> > receive
> > > N
> > > > >>>>> control tuples. will it call user handler N times
for each
> > > downstream
> > > > >>>>> operator or just once.
> > > > >>>>>
> > > > >>>>> Regards,
> > > > >>>>> - Tushar.
> > > > >>>>>
> > > > >>>>>
> > > > >>>>>
> > > > >>>>> On Fri, Jun 24, 2016 at 11:52 PM, David Yan <
> > david@datatorrent.com
> > > >
> > > > >>>>>
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>>> Hi all,
> > > > >>>>>>
> > > > >>>>>> I would like to propose a new feature to the
Apex core engine
> --
> > > the
> > > > >>>>>> support of custom control tuples. Currently,
we have control
> > > tuples
> > > > >>>>>>
> > > > >>>>> such
> > > > >>>>
> > > > >>>>> as
> > > > >>>>>
> > > > >>>>>> BEGIN_WINDOW, END_WINDOW, CHECKPOINT, and so
on, but we don't
> > have
> > > > the
> > > > >>>>>> support for applications to insert their own
control tuples.
> The
> > > way
> > > > >>>>>> currently to get around this is to use data tuples
and have a
> > > > separate
> > > > >>>>>>
> > > > >>>>> port
> > > > >>>>>
> > > > >>>>>> for such tuples that sends tuples to all partitions
of the
> > > > downstream
> > > > >>>>>> operators, which is not exactly developer friendly.
> > > > >>>>>>
> > > > >>>>>> We have already seen a number of use cases that
can use this
> > > > feature:
> > > > >>>>>>
> > > > >>>>>> 1) Batch support: We need to tell all operators
of the
> physical
> > > DAG
> > > > >>>>>>
> > > > >>>>> when
> > > > >>>>
> > > > >>>>> a
> > > > >>>>>
> > > > >>>>>> batch starts and ends, so the operators can do
whatever that
> is
> > > > needed
> > > > >>>>>>
> > > > >>>>> upon
> > > > >>>>>
> > > > >>>>>> the start or the end of a batch.
> > > > >>>>>>
> > > > >>>>>> 2) Watermark: To support the concepts of event
time windowing,
> > the
> > > > >>>>>> watermark control tuple is needed to tell which
windows should
> > be
> > > > >>>>>> considered late.
> > > > >>>>>>
> > > > >>>>>> 3) Changing operator properties: We do have the
support of
> > > changing
> > > > >>>>>> operator properties on the fly, but with a custom
control
> tuple,
> > > the
> > > > >>>>>> command to change operator properties can be
window aligned
> for
> > > all
> > > > >>>>>> partitions and also across the DAG.
> > > > >>>>>>
> > > > >>>>>> 4) Recording tuples: Like changing operator properties,
we do
> > have
> > > > >>>>>> this
> > > > >>>>>> support now but only at the individual physical
operator
> level,
> > > and
> > > > >>>>>>
> > > > >>>>> without
> > > > >>>>>
> > > > >>>>>> control of which window to record tuples for.
With a custom
> > > control
> > > > >>>>>>
> > > > >>>>> tuple,
> > > > >>>>>
> > > > >>>>>> because a control tuple must belong to a window,
all operators
> > in
> > > > the
> > > > >>>>>>
> > > > >>>>> DAG
> > > > >>>>
> > > > >>>>> can start (and stop) recording for the same windows.
> > > > >>>>>>
> > > > >>>>>> I can think of two options to achieve this:
> > > > >>>>>>
> > > > >>>>>> 1) new custom control tuple type that takes user's
> serializable
> > > > >>>>>> object.
> > > > >>>>>>
> > > > >>>>>> 2) piggy back the current BEGIN_WINDOW and END_WINDOW
control
> > > > tuples.
> > > > >>>>>>
> > > > >>>>>> Please provide your feedback. Thank you.
> > > > >>>>>>
> > > > >>>>>> David
> > > > >>>>>>
> > > > >>>>>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message