flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )
Date Thu, 03 Sep 2015 12:20:34 GMT
In a set of benchmarks a while back, we found that the chaining mechanism
has some overhead right now, because of its abstraction. The abstraction
creates iterators for each element and makes it hard for the JIT to
specialize on the operators in the chain.

For purely local chains at full speed, this overhead is observable (can
decrease throughput from 25mio elements/core to 15-20mio elements per
core). If your job does not reach that throughput, or is I/O bound, source
bound, etc, it does not matter.

If you care about super high performance, collapsing the code into one
function helps.

On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05041@gmail.com> wrote:

> Hi Gyula,
> Thanks for your response. Seems i will use filter and map for now as that
> one is really make the intention clear, and not a big performance hit.
> Thanks again.
> Cheers
> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.fora@gmail.com> wrote:
>> Hey Welly,
>> If you call filter and map one after the other like you mentioned, these
>> operators will be chained and executed as if they were running in the same
>> operator.
>> The only small performance overhead comes from the fact that the output
>> of the filter will be copied before passing it as input to the map to keep
>> immutability guarantees (but no serialization/deserialization will happen).
>> Copying might be practically free depending on your data type, though.
>> If you are using operators that don't make use of the immutability of
>> inputs/outputs (i.e you don't hold references to those values) than you can
>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>> in which case they will have exactly the same performance.
>> Cheers,
>> Gyula
>> Welly Tambunan <if05041@gmail.com> ezt írta (időpont: 2015. szept. 3.,
>> Cs, 4:33):
>>> Hi All,
>>> I would like to filter some item from the event stream. I think there
>>> are two ways doing this.
>>> Using the regular pipeline filter(...).map(...). We can also use flatMap
>>> for doing both in the same operator.
>>> Any performance improvement if we are using flatMap ? As that will be
>>> done in one operator instance.
>>> Cheers
>>> --
>>> Welly Tambunan
>>> Triplelands
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
> --
> Welly Tambunan
> Triplelands
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>

View raw message