apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Yan <da...@datatorrent.com>
Subject Re: Regarding Iterations and Delay Operator
Date Tue, 09 Feb 2016 10:30:07 GMT
I think what you did is fine and I wouldn't consider a hack. The reason for
the +1 increment is exactly what Vlad said, the window needs to end and the
window cannot end until all input ports have sent the END_WINDOW tuple for
that particular window. And that's why all streaming platforms use the DAG
(A as in acyclic) as the programming paradigm.

Assuming in your use case B is the delay operator and your streaming window
is small, if it helps with your use case, you can also try having the
application window count for A be a multiple number while B remains to be
1, so that from the perspective of A, the feedback tuple from B to A
arrives in the same application window (except the last streaming window of
the application window).

David
On Feb 9, 2016 6:11 PM, "Bhupesh Chawda" <bhupesh@datatorrent.com> wrote:

> Hi David,
>
> Taking an example scenario, let's say we get 1000 tuples in the current
> window, which is processed by operator A. Operator A identifies features
> out of these tuples and passes them on to B. Now, if A decides that after
> say, 100 tuples it wants to compute the statistics on the features of 100
> tuples, it will ask operator B to compute this and send it back. This
> feedback is needed so that A can apply this statistic on the following
> tuples, i.e., tuples 101 onwards. Now B may compute this and send it back.
> But these are stuck at the delay operator and do not arrive at A until all
> the 1000 tuples have been processed. Meanwhile, operator A times out on the
> response, and proceeds with some default stats. Due to this behavior, the
> feedback which was expected after 100 tuples arrives after 1000 tuples.
> This means the statistic that was to be applied on tuples 101 onwards, is
> applied on tuples 1001 onwards. This makes the algorithm learn / converge
> at a very slow rate, which otherwise would have been done quickly.
>
> Just an update: I tried reducing the size of the streaming window to 100
> ms. This hack is currently working for the current use case, but I am not
> sure if this will work for all scenarios.
>
> Thanks.
> -Bhupesh
>
> On Tue, Feb 9, 2016 at 3:24 PM, David Yan <david@datatorrent.com> wrote:
>
> > Hi Bhupesh,
> >
> > In the use case you described, can you explain the reason why the
> feedback
> > tuples from B to A need to be in the same window?
> >
> > David
> > On Feb 5, 2016 3:05 PM, "Bhupesh Chawda" <bhupesh@datatorrent.com>
> wrote:
> >
> > > Hi Vlad / Tim,
> > >
> > > This use case is part of the integration with Apache SAMOA. I am trying
> > to
> > > run an algorithm which is written in SAMOA, on the Apex platform. This
> > > algorithm already requires two operators in the formation I described.
> > This
> > > is the reason I cannot control how the operators are arranged in the
> DAG.
> > >
> > > To answer Vlad's question, here are some more details on the use case.
> > > Operator A processes tuples coming in from the source, processes them
> and
> > > passes them on to B. After few tuples, it may request B to compute some
> > > stats on the tuples and get it back (through the loop back stream), so
> > that
> > > these stats could be used while processing the next few tuples coming
> > from
> > > the source.
> > > In case of Apex, I am suspecting that these stats that B needs to send
> > to A
> > > through the loop back stream, are arriving too late (because they are
> > > waiting for the end of the current window), and hence A times out on
> the
> > > response from B and proceeds using some default values. This results in
> > > incorrect computation at the end of the DAG.
> > >
> > > Also, please note that relaxing the fault tolerance feature is not a
> > > requirement. I mentioned this because I assumed (perhaps incorrectly)
> > that
> > > this was the reason for incrementing windows in the delay operator.
> > > However, as Vlad pointed out, there should be some point where an
> > operator
> > > could close the window.
> > >
> > > Any suggestions on how I can achieve the above mentioned use case?
> > >
> > > Thanks.
> > > -Bhupesh
> > >
> > > On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <tim@datatorrent.com>
> > > wrote:
> > >
> > > > Yes Partitioning A or B would be a good use case for iteration. Does
> > > > iteration currently support partitioning though? I thought it didn't,
> > > but I
> > > > may be behind the times since I've been exiled from the office :).
> > > >
> > > > On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v.rozov@datatorrent.com>
> > > > wrote:
> > > >
> > > > > What if B should be partitioned?
> > > > >
> > > > >
> > > > >
> > > > > On 2/4/16 17:43, Timothy Farkas wrote:
> > > > >
> > > > >> My question is why use iteration at all in such a case? You could
> > just
> > > > >> encapsulate A and B in a single single operator (call it OP)
as
> > > > >> components,
> > > > >> and take the tuples output from B and put them to A. OP would
also
> > > > contain
> > > > >> the logic to decide when to stop looping each tuple emitted by
B
> > back
> > > to
> > > > >> A.
> > > > >>
> > > > >> Thanks,
> > > > >> Tim
> > > > >>
> > > > >> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <
> > v.rozov@datatorrent.com>
> > > > >> wrote:
> > > > >>
> > > > >> IMO, it will be good to provide a little bit more details
> regarding
> > > the
> > > > >>> use case, namely what drives the requirement and why is it
OK to
> > > relax
> > > > >>> the
> > > > >>> fault tolerance feature. Another question is when will it
be OK
> to
> > > > close
> > > > >>> the current window for the operator A? A can't close it as
there
> > may
> > > be
> > > > >>> more tuples coming from the input stream connected to the
Delay
> > > > operator
> > > > >>> and Delay operator can't close it because A will not send
> > END_WINDOW
> > > > >>> waiting for END_WINDOW on the input port connected to the
Delay
> > > > operator.
> > > > >>>
> > > > >>> Vlad
> > > > >>>
> > > > >>>
> > > > >>> On 2/4/16 01:04, Bhupesh Chawda wrote:
> > > > >>>
> > > > >>> Exactly. That is the requirement. Then the feedback can be
> utilized
> > > for
> > > > >>>> tuples in the same window rather than the tuples in the
next
> > window.
> > > > >>>>
> > > > >>>> -Bhupesh
> > > > >>>>
> > > > >>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
> > > > >>>> sandeep@datatorrent.com
> > > > >>>> wrote:
> > > > >>>>
> > > > >>>> Bhupesh: Do you mean to say that you would like to use
Delay
> > > Operator
> > > > >>>> with
> > > > >>>>
> > > > >>>>> NO delay? Essentially you need feed back in real-time
and not
> > > delayed
> > > > >>>>> by
> > > > >>>>> a
> > > > >>>>> window.
> > > > >>>>>
> > > > >>>>> Regards,
> > > > >>>>> Sandeep
> > > > >>>>>
> > > > >>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
> > > > >>>>> bhupesh@datatorrent.com
> > > > >>>>> wrote:
> > > > >>>>>
> > > > >>>>> Hi,
> > > > >>>>>
> > > > >>>>>> I am working on a dag which has a loop. As per
my
> understanding,
> > > > >>>>>> tuples
> > > > >>>>>> which are flowing on the loop back stream, will
arrive at the
> > > > upstream
> > > > >>>>>> operator in at least the next window.
> > > > >>>>>>
> > > > >>>>>> Here is an example:
> > > > >>>>>>
> > > > >>>>>> Source -> A -> B -> Delay -> A
> > > > >>>>>>
> > > > >>>>>> In the example above, tuples in window id X which
arrive at B,
> > > will
> > > > be
> > > > >>>>>>
> > > > >>>>>> sent
> > > > >>>>>
> > > > >>>>> to A again in window id (X + n), where n >= 1.
> > > > >>>>>> I understand this requirement is for the tuples
to be
> recovered
> > in
> > > > >>>>>> case
> > > > >>>>>>
> > > > >>>>>> of
> > > > >>>>>
> > > > >>>>> a failure of operator B. However, is there a way
I can allow
> the
> > > > tuples
> > > > >>>>>>
> > > > >>>>>> to
> > > > >>>>>
> > > > >>>>> loop back in the same window, by relaxing the fault
tolerance
> > > > feature.
> > > > >>>>>> In
> > > > >>>>>> other words, I need tuples to immediately loop
back and not
> wait
> > > for
> > > > >>>>>> the
> > > > >>>>>> next window to arrive at operator A. I am okay
if these tuples
> > are
> > > > not
> > > > >>>>>> recovered in case of a failure.
> > > > >>>>>>
> > > > >>>>>> Thanks.
> > > > >>>>>> -Bhupesh
> > > > >>>>>>
> > > > >>>>>>
> > > > >>>>>>
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message