apex-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Yan <da...@datatorrent.com>
Subject Re: Regarding Iterations and Delay Operator
Date Tue, 09 Feb 2016 09:54:56 GMT
Hi Bhupesh,

In the use case you described, can you explain the reason why the feedback
tuples from B to A need to be in the same window?

David
On Feb 5, 2016 3:05 PM, "Bhupesh Chawda" <bhupesh@datatorrent.com> wrote:

> Hi Vlad / Tim,
>
> This use case is part of the integration with Apache SAMOA. I am trying to
> run an algorithm which is written in SAMOA, on the Apex platform. This
> algorithm already requires two operators in the formation I described. This
> is the reason I cannot control how the operators are arranged in the DAG.
>
> To answer Vlad's question, here are some more details on the use case.
> Operator A processes tuples coming in from the source, processes them and
> passes them on to B. After few tuples, it may request B to compute some
> stats on the tuples and get it back (through the loop back stream), so that
> these stats could be used while processing the next few tuples coming from
> the source.
> In case of Apex, I am suspecting that these stats that B needs to send to A
> through the loop back stream, are arriving too late (because they are
> waiting for the end of the current window), and hence A times out on the
> response from B and proceeds using some default values. This results in
> incorrect computation at the end of the DAG.
>
> Also, please note that relaxing the fault tolerance feature is not a
> requirement. I mentioned this because I assumed (perhaps incorrectly) that
> this was the reason for incrementing windows in the delay operator.
> However, as Vlad pointed out, there should be some point where an operator
> could close the window.
>
> Any suggestions on how I can achieve the above mentioned use case?
>
> Thanks.
> -Bhupesh
>
> On Fri, Feb 5, 2016 at 7:43 AM, Timothy Farkas <tim@datatorrent.com>
> wrote:
>
> > Yes Partitioning A or B would be a good use case for iteration. Does
> > iteration currently support partitioning though? I thought it didn't,
> but I
> > may be behind the times since I've been exiled from the office :).
> >
> > On Thu, Feb 4, 2016 at 5:58 PM, Vlad Rozov <v.rozov@datatorrent.com>
> > wrote:
> >
> > > What if B should be partitioned?
> > >
> > >
> > >
> > > On 2/4/16 17:43, Timothy Farkas wrote:
> > >
> > >> My question is why use iteration at all in such a case? You could just
> > >> encapsulate A and B in a single single operator (call it OP) as
> > >> components,
> > >> and take the tuples output from B and put them to A. OP would also
> > contain
> > >> the logic to decide when to stop looping each tuple emitted by B back
> to
> > >> A.
> > >>
> > >> Thanks,
> > >> Tim
> > >>
> > >> On Thu, Feb 4, 2016 at 12:58 PM, Vlad Rozov <v.rozov@datatorrent.com>
> > >> wrote:
> > >>
> > >> IMO, it will be good to provide a little bit more details regarding
> the
> > >>> use case, namely what drives the requirement and why is it OK to
> relax
> > >>> the
> > >>> fault tolerance feature. Another question is when will it be OK to
> > close
> > >>> the current window for the operator A? A can't close it as there may
> be
> > >>> more tuples coming from the input stream connected to the Delay
> > operator
> > >>> and Delay operator can't close it because A will not send END_WINDOW
> > >>> waiting for END_WINDOW on the input port connected to the Delay
> > operator.
> > >>>
> > >>> Vlad
> > >>>
> > >>>
> > >>> On 2/4/16 01:04, Bhupesh Chawda wrote:
> > >>>
> > >>> Exactly. That is the requirement. Then the feedback can be utilized
> for
> > >>>> tuples in the same window rather than the tuples in the next window.
> > >>>>
> > >>>> -Bhupesh
> > >>>>
> > >>>> On Thu, Feb 4, 2016 at 2:32 PM, Sandeep Deshmukh <
> > >>>> sandeep@datatorrent.com
> > >>>> wrote:
> > >>>>
> > >>>> Bhupesh: Do you mean to say that you would like to use Delay
> Operator
> > >>>> with
> > >>>>
> > >>>>> NO delay? Essentially you need feed back in real-time and not
> delayed
> > >>>>> by
> > >>>>> a
> > >>>>> window.
> > >>>>>
> > >>>>> Regards,
> > >>>>> Sandeep
> > >>>>>
> > >>>>> On Thu, Feb 4, 2016 at 10:59 AM, Bhupesh Chawda <
> > >>>>> bhupesh@datatorrent.com
> > >>>>> wrote:
> > >>>>>
> > >>>>> Hi,
> > >>>>>
> > >>>>>> I am working on a dag which has a loop. As per my understanding,
> > >>>>>> tuples
> > >>>>>> which are flowing on the loop back stream, will arrive
at the
> > upstream
> > >>>>>> operator in at least the next window.
> > >>>>>>
> > >>>>>> Here is an example:
> > >>>>>>
> > >>>>>> Source -> A -> B -> Delay -> A
> > >>>>>>
> > >>>>>> In the example above, tuples in window id X which arrive
at B,
> will
> > be
> > >>>>>>
> > >>>>>> sent
> > >>>>>
> > >>>>> to A again in window id (X + n), where n >= 1.
> > >>>>>> I understand this requirement is for the tuples to be recovered
in
> > >>>>>> case
> > >>>>>>
> > >>>>>> of
> > >>>>>
> > >>>>> a failure of operator B. However, is there a way I can allow
the
> > tuples
> > >>>>>>
> > >>>>>> to
> > >>>>>
> > >>>>> loop back in the same window, by relaxing the fault tolerance
> > feature.
> > >>>>>> In
> > >>>>>> other words, I need tuples to immediately loop back and
not wait
> for
> > >>>>>> the
> > >>>>>> next window to arrive at operator A. I am okay if these
tuples are
> > not
> > >>>>>> recovered in case of a failure.
> > >>>>>>
> > >>>>>> Thanks.
> > >>>>>> -Bhupesh
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message