flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Welly Tambunan <if05...@gmail.com>
Subject Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )
Date Fri, 04 Sep 2015 07:58:07 GMT
Hi Stephan,

Cheers

On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen <sewen@apache.org> wrote:

> We will definitely also try to get the chaining overhead down a bit.
>
> BTW: To reach this kind of throughput, you need sources that can produce
> very fast...
>
> On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan <if05041@gmail.com> wrote:
>
>> Hi Stephan,
>>
>> That's good information to know. We will hit that throughput easily. Our
>> computation graph has lot of chaining like this right now.
>> I think it's safe to minimize the chain right now.
>>
>> Thanks a lot for this Stephan.
>>
>> Cheers
>>
>> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <sewen@apache.org> wrote:
>>
>>> In a set of benchmarks a while back, we found that the chaining
>>> mechanism has some overhead right now, because of its abstraction. The
>>> abstraction creates iterators for each element and makes it hard for the
>>> JIT to specialize on the operators in the chain.
>>>
>>> For purely local chains at full speed, this overhead is observable (can
>>> decrease throughput from 25mio elements/core to 15-20mio elements per
>>> core). If your job does not reach that throughput, or is I/O bound, source
>>> bound, etc, it does not matter.
>>>
>>> If you care about super high performance, collapsing the code into one
>>> function helps.
>>>
>>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05041@gmail.com>
>>> wrote:
>>>
>>>> Hi Gyula,
>>>>
>>>> Thanks for your response. Seems i will use filter and map for now as
>>>> that one is really make the intention clear, and not a big performance hit.
>>>>
>>>> Thanks again.
>>>>
>>>> Cheers
>>>>
>>>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.fora@gmail.com>
>>>> wrote:
>>>>
>>>>> Hey Welly,
>>>>>
>>>>> If you call filter and map one after the other like you mentioned,
>>>>> these operators will be chained and executed as if they were running
in the
>>>>> same operator.
>>>>> The only small performance overhead comes from the fact that the
>>>>> output of the filter will be copied before passing it as input to the
map
>>>>> to keep immutability guarantees (but no serialization/deserialization
will
>>>>> happen). Copying might be practically free depending on your data type,
>>>>> though.
>>>>>
>>>>> If you are using operators that don't make use of the immutability of
>>>>> inputs/outputs (i.e you don't hold references to those values) than you
can
>>>>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>>>>> in which case they will have exactly the same performance.
>>>>>
>>>>> Cheers,
>>>>> Gyula
>>>>>
>>>>> Welly Tambunan <if05041@gmail.com> ezt írta (időpont: 2015. szept.
>>>>> 3., Cs, 4:33):
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I would like to filter some item from the event stream. I think there
>>>>>> are two ways doing this.
>>>>>>
>>>>>> Using the regular pipeline filter(...).map(...). We can also use
>>>>>> flatMap for doing both in the same operator.
>>>>>>
>>>>>> Any performance improvement if we are using flatMap ? As that will
be
>>>>>> done in one operator instance.
>>>>>>
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Welly Tambunan
>>>>>> Triplelands
>>>>>>
>>>>>> http://weltam.wordpress.com
>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Welly Tambunan
>>>> Triplelands
>>>>
>>>> http://weltam.wordpress.com
>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>
>>>
>>>
>>
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Mime
View raw message