flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Welly Tambunan <if05...@gmail.com>
Subject Re: Efficiency for Filter then Transform ( filter().map() vs flatMap() )
Date Fri, 04 Sep 2015 08:02:15 GMT
Hi Stephan,

Thanks for your clarification.

Basically we will have lots of sensor that will push this kind of data to
queuing system ( currently we are using RabbitMQ, but will soon move to
Kafka).
We also will use the same pipeline to process the historical data.

I also want to minimize the chaining as in the filter is doing very little
work. By minimizing the pipeline we can minimize db/external source hit and
cached local data efficiently.

Cheers

On Fri, Sep 4, 2015 at 2:58 PM, Welly Tambunan <if05041@gmail.com> wrote:

> Hi Stephan,
>
> Cheers
>
> On Fri, Sep 4, 2015 at 2:31 PM, Stephan Ewen <sewen@apache.org> wrote:
>
>> We will definitely also try to get the chaining overhead down a bit.
>>
>> BTW: To reach this kind of throughput, you need sources that can produce
>> very fast...
>>
>> On Fri, Sep 4, 2015 at 12:20 AM, Welly Tambunan <if05041@gmail.com>
>> wrote:
>>
>>> Hi Stephan,
>>>
>>> That's good information to know. We will hit that throughput easily. Our
>>> computation graph has lot of chaining like this right now.
>>> I think it's safe to minimize the chain right now.
>>>
>>> Thanks a lot for this Stephan.
>>>
>>> Cheers
>>>
>>> On Thu, Sep 3, 2015 at 7:20 PM, Stephan Ewen <sewen@apache.org> wrote:
>>>
>>>> In a set of benchmarks a while back, we found that the chaining
>>>> mechanism has some overhead right now, because of its abstraction. The
>>>> abstraction creates iterators for each element and makes it hard for the
>>>> JIT to specialize on the operators in the chain.
>>>>
>>>> For purely local chains at full speed, this overhead is observable (can
>>>> decrease throughput from 25mio elements/core to 15-20mio elements per
>>>> core). If your job does not reach that throughput, or is I/O bound, source
>>>> bound, etc, it does not matter.
>>>>
>>>> If you care about super high performance, collapsing the code into one
>>>> function helps.
>>>>
>>>> On Thu, Sep 3, 2015 at 5:59 AM, Welly Tambunan <if05041@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Gyula,
>>>>>
>>>>> Thanks for your response. Seems i will use filter and map for now as
>>>>> that one is really make the intention clear, and not a big performance
hit.
>>>>>
>>>>> Thanks again.
>>>>>
>>>>> Cheers
>>>>>
>>>>> On Thu, Sep 3, 2015 at 10:29 AM, Gyula Fóra <gyula.fora@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hey Welly,
>>>>>>
>>>>>> If you call filter and map one after the other like you mentioned,
>>>>>> these operators will be chained and executed as if they were running
in the
>>>>>> same operator.
>>>>>> The only small performance overhead comes from the fact that the
>>>>>> output of the filter will be copied before passing it as input to
the map
>>>>>> to keep immutability guarantees (but no serialization/deserialization
will
>>>>>> happen). Copying might be practically free depending on your data
type,
>>>>>> though.
>>>>>>
>>>>>> If you are using operators that don't make use of the immutability
of
>>>>>> inputs/outputs (i.e you don't hold references to those values) than
you can
>>>>>> disable copying altogether by calling env.getConfig().enableObjectReuse(),
>>>>>> in which case they will have exactly the same performance.
>>>>>>
>>>>>> Cheers,
>>>>>> Gyula
>>>>>>
>>>>>> Welly Tambunan <if05041@gmail.com> ezt írta (időpont: 2015.
szept.
>>>>>> 3., Cs, 4:33):
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I would like to filter some item from the event stream. I think
>>>>>>> there are two ways doing this.
>>>>>>>
>>>>>>> Using the regular pipeline filter(...).map(...). We can also
use
>>>>>>> flatMap for doing both in the same operator.
>>>>>>>
>>>>>>> Any performance improvement if we are using flatMap ? As that
will
>>>>>>> be done in one operator instance.
>>>>>>>
>>>>>>>
>>>>>>> Cheers
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Welly Tambunan
>>>>>>> Triplelands
>>>>>>>
>>>>>>> http://weltam.wordpress.com
>>>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Welly Tambunan
>>>>> Triplelands
>>>>>
>>>>> http://weltam.wordpress.com
>>>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Welly Tambunan
>>> Triplelands
>>>
>>> http://weltam.wordpress.com
>>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>>
>>
>>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Mime
View raw message