flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )
Date Fri, 04 Sep 2015 07:31:44 GMT
We will definitely also try to get the chaining overhead down a bit.

BTW: To reach this kind of throughput, you need sources that can produce
very fast...

On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan <if05041@gmail.com> wrote:

> Hi Stephan,
>
> That's good information to know. We will hit that throughput easily. Our
> computation graph has lot of chaining like this right now.
> I think it's safe to minimize the chain right now.
>
> Thanks a lot for this Stephan.
>
> Cheers
>
> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> In a set of benchmarks a while back, we found that the chaining mechanism
>> has some overhead right now, because of its abstraction. The abstraction
>> creates iterators for each element and makes it hard for the JIT to
>> specialize on the operators in the chain.
>>
>> For purely local chains at full speed, this overhead is observable (can
>> decrease throughput from 25mio elements/core to 15-20mio elements per
>> core). If your job does not reach that throughput, or is I/O bound, source
>> bound, etc, it does not matter.
>>
>> If you care about super high performance, collapsing the code into one
>> function helps.
>>
>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05041@gmail.com> wrote:
>>
>>> Hi Gyula,
>>>
>>> Thanks for your response. Seems i will use filter and map for now as
>>> that one is really make the intention clear, and not a big performance hit.
>>>
>>> Thanks again.
>>>
>>> Cheers
>>>
>>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.fora@gmail.com>
>>> wrote:
>>>
>>>> Hey Welly,
>>>>
>>>> If you call filter and map one after the other like you mentioned,
>>>> these operators will be chained and executed as if they were running in the
>>>> same operator.
>>>> The only small performance overhead comes from the fact that the output
>>>> of the filter will be copied before passing it as input to the map to keep
>>>> immutability guarantees (but no serialization/deserialization will happen).
>>>> Copying might be practically free depending on your data type, though.
>>>>
>>>> If you are using operators that don't make use of the immutability of
>>>> inputs/outputs (i.e you don't hold references to those values) than you can
>>>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>>>> in which case they will have exactly the same performance.
>>>>
>>>> Cheers,
>>>> Gyula
>>>>
>>>> Welly Tambunan <if05041@gmail.com> ezt írta (időpont: 2015. szept.
3.,
>>>> Cs, 4:33):
>>>>
>>>>> Hi All,
>>>>>
>>>>> I would like to filter some item from the event stream. I think there
>>>>> are two ways doing this.
>>>>>
>>>>> Using the regular pipeline filter(...).map(...). We can also use
>>>>> flatMap for doing both in the same operator.
>>>>>
>>>>> Any performance improvement if we are using flatMap ? As that will be
>>>>> done in one operator instance.
>>>>>
>>>>>
>>>>> Cheers
>>>>>
>>>>>
>>>>> --
>>>>> Welly Tambunan
>>>>> Triplelands
>>>>>
>>>>> http://weltam.wordpress.com
>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>

Mime
View raw message