flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Rethink the "always copy" policy for streaming topologies
Date Fri, 02 Oct 2015 17:57:14 GMT
Do we know what kind of impact the non-reuse policy has? Maybe the
serialization overhead is subsumed by other effects.

But in general I'm ok with changing the default to non copying. We just
have to document this feature properly.
On Oct 2, 2015 6:31 PM, "Maximilian Michels" <mxm@apache.org> wrote:

> +1 Good idea. I think we can save quite some CPU cycles by not copying
> records.
>
> That is basically the behavior of the batch API, and there has so far never
> > been an issue with that (people running into the trap of overwritten
> > mutable elements).
>
>
> As far as I know, this is only the case for chained operators?
>
> On Fri, Oct 2, 2015 at 6:15 PM, Matthias J. Sax <mjsax@apache.org> wrote:
>
> > +1 for disable copy by default
> >
> >
> > On 10/02/2015 05:53 PM, Stephan Ewen wrote:
> > > Hi all!
> > >
> > > Now that we are coming to the next release, I wanted to make sure we
> > > finalize the decision on that point, because it would be nice to not
> > break
> > > the behavior of system afterwards.
> > >
> > > Right now, when tasks are chained together, the system copies the
> > elements
> > > always between different tasks in the same chain.
> > >
> > > I think this policy was established under the assumption that copies do
> > not
> > > cost anything, given our own test examples, which mainly use immutable
> > > types like Strings, boxed primitives, ..
> > >
> > > In practice, a lot of data types are actually quite expensive to copy.
> > >
> > > For example, a rather common data type in the event analysis of
> > web-sources
> > > is JSON Object.
> > > Flink treats this as a generic type. Depending on its concrete
> > > implementation, Kryo may have perform a serialization copy, which means
> > > encoding into bytes (JSON encoding, charset encoding) and decoding
> again.
> > >
> > > This has a massive impact on the out-of-the-box performance of the
> > system.
> > > Given that, I was wondering whether we should set to default policy to
> > "not
> > > copying".
> > >
> > > That is basically the behavior of the batch API, and there has so far
> > never
> > > been an issue with that (people running into the trap of overwritten
> > > mutable elements).
> > >
> > > What do you think?
> > >
> > > Stephan
> > >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message