flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyf...@apache.org>
Subject Re: [DISCUSS] Re-add record copy to chained operator calls
Date Wed, 20 May 2015 12:06:52 GMT
"Copy before putting it into a window buffer and any other group buffer."

Exactly my point. Any stateful operator should be able to implement
something like this without having to worry about copying the object (and
at this point the user would need to know whether it comes from the network
to avoid unnecessary copies), so I don't agree with leaving the copy off.

The user can of course specify that the operator is mutable if he wants
(and he is worried about the performance), But I still think the default
behaviour should be immutable.
We cannot force users to not hold object references and also it is a quite
unnatural way of programming in a language like java.


On Wed, May 20, 2015 at 1:39 PM, Stephan Ewen <sewen@apache.org> wrote:

> I am curious why the copying is actually needed.
>
> In the batch API, we chain and do not copy and it is rather predictable.
>
> The cornerpoints of that design is to follow these rules:
>
>  1) Objects read from the network or any buffer are always new objects.
> That comes naturally when they are deserialized as part of that (all
> buffers store serialized)
>
>  2) After a function returned a record (or gives one to the collector), it
> if given to the chain of chained operators, but after it is through the
> chain, no one else holds a reference to that object.
>      For that, it is crucial that objects are not stored by reference, but
> either stored serialized, or a copy is stored.
>
> This is quite solid in the batch API. How about we follow the same paradigm
> in the streaming API. We would need to adjust the following:
>
> 1) Do not copy between operators (I think this is the case right now)
>
> 2) Copy before putting it into a window buffer and any other group buffer.
>
>
>
>
>
>
>
>
> On Wed, May 20, 2015 at 1:22 PM, Aljoscha Krettek <aljoscha@apache.org>
> wrote:
>
> > Yes, in fact I anticipated this. There is one central place where we
> > can insert a copy step, in OperatorCollector in OutputHandler.
> >
> > On Wed, May 20, 2015 at 11:17 AM, Paris Carbone <parisc@kth.se> wrote:
> > > I guess it was not intended ^^.
> > >
> > > Chaining should be transparent and not break the correct/expected
> > behaviour.
> > >
> > >
> > > Paris?
> > >
> > > On 20 May 2015, at 11:02, Márton Balassi <mbalassi@apache.org> wrote:
> > >
> > > +1 for copying.
> > > On May 20, 2015 10:50 AM, "Gyula Fóra" <gyfora@apache.org> wrote:
> > >
> > > Hey,
> > >
> > > The latest streaming operator rework removed the copying of the outputs
> > > before passing them to chained operators. This is a major break for the
> > > previous operator semantics which guaranteed immutability.
> > >
> > > I think this change leads to very indeterministic program behaviour
> from
> > > the user's perspective as only non-chained outputs/inputs will be
> > mutable.
> > > If we allow this to happen, users will start disabling chaining to get
> > > immutability which defeats the purpose. (chaining should not affect
> > program
> > > behaviour just increase performance)
> > >
> > > In my opinion the default setting for each operator should be
> > immutability
> > > and the user could override this manually if he/she wants.
> > >
> > > What do you think?
> > >
> > > Regards,
> > > Gyula
> > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message