flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: filter().project() vs flatMap()
Date Mon, 04 May 2015 12:59:29 GMT
That might help with cardinality estimation for cost-based optimization.
For example when deciding about join strategies (broadcast vs. repartition,
build-side of a hash join).
However, as Stephan said, there are many cases where it does not make a
difference, e.g. if the input cardinality of the filter (or the size of the
other join input) is unknown.

I think, chances are low that it makes a difference.


2015-05-04 14:53 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:

> Thanks Sebastian and Fabian for the feedback, just one last question:
> what does change from the system point of view to know that the  output
> tuples is <= the number of input tuples?
> Is there any optimization that Flink can apply to the pipeline?
>
> On Mon, May 4, 2015 at 2:49 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
>> It should not make a difference. I think its just personal taste.
>>
>> If your filter condition is simple enough, I'd go with Flink's Table API
>> because it does not require to define a Filter or FlatMapFunction.
>>
>>
>> 2015-05-04 14:43 GMT+02:00 Flavio Pompermaier <pompermaier@okkam.it>:
>>
>>> Hi Flinkers,
>>>
>>> I'd like to know whether it's better to perform a filter.project or a
>>> flatMap to filter tuples and do some projection after the filter.
>>> Functionally they are equivalent but maybe I'm ignoring something..
>>>
>>> Thanks in advance,
>>> Flavio
>>>
>>
>>
>
>

Mime
View raw message