flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Welly Tambunan <if05...@gmail.com>
Subject Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )
Date Thu, 03 Sep 2015 22:20:12 GMT
Hi Stephan,

That's good information to know. We will hit that throughput easily. Our
computation graph has lot of chaining like this right now.
I think it's safe to minimize the chain right now.

Thanks a lot for this Stephan.

Cheers

On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <sewen@apache.org> wrote:

> In a set of benchmarks a while back, we found that the chaining mechanism
> has some overhead right now, because of its abstraction. The abstraction
> creates iterators for each element and makes it hard for the JIT to
> specialize on the operators in the chain.
>
> For purely local chains at full speed, this overhead is observable (can
> decrease throughput from 25mio elements/core to 15-20mio elements per
> core). If your job does not reach that throughput, or is I/O bound, source
> bound, etc, it does not matter.
>
> If you care about super high performance, collapsing the code into one
> function helps.
>
> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05041@gmail.com> wrote:
>
>> Hi Gyula,
>>
>> Thanks for your response. Seems i will use filter and map for now as that
>> one is really make the intention clear, and not a big performance hit.
>>
>> Thanks again.
>>
>> Cheers
>>
>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.fora@gmail.com> wrote:
>>
>>> Hey Welly,
>>>
>>> If you call filter and map one after the other like you mentioned, these
>>> operators will be chained and executed as if they were running in the same
>>> operator.
>>> The only small performance overhead comes from the fact that the output
>>> of the filter will be copied before passing it as input to the map to keep
>>> immutability guarantees (but no serialization/deserialization will happen).
>>> Copying might be practically free depending on your data type, though.
>>>
>>> If you are using operators that don't make use of the immutability of
>>> inputs/outputs (i.e you don't hold references to those values) than you can
>>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>>> in which case they will have exactly the same performance.
>>>
>>> Cheers,
>>> Gyula
>>>
>>> Welly Tambunan <if05041@gmail.com> ezt írta (időpont: 2015. szept. 3.,
>>> Cs, 4:33):
>>>
>>>> Hi All,
>>>>
>>>> I would like to filter some item from the event stream. I think there
>>>> are two ways doing this.
>>>>
>>>> Using the regular pipeline filter(...).map(...). We can also use
>>>> flatMap for doing both in the same operator.
>>>>
>>>> Any performance improvement if we are using flatMap ? As that will be
>>>> done in one operator instance.
>>>>
>>>>
>>>> Cheers
>>>>
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Mime
View raw message