flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Optimizations not performed - please confirm
Date Wed, 29 Jun 2016 19:50:02 GMT
Yes, that was my fault. I'm used to auto reply-all on my desktop machine,
but my phone just did a simple reply.
Sorry for the confusion,

2016-06-29 19:24 GMT+02:00 Ovidiu-Cristian MARCU <

> Thank you, Aljoscha!
> I received a similar update from Fabian, only now I see the user list was
> not in CC.
> Fabian::*The optimizer hasn’t been touched (except for bugfixes and new
> operators) for quite some time.*
> *These limitations are still present and I don’t expect them to be removed
> anytime soon. IMO, it is more likely that certain optimizations like join
> reordering will be done for Table API / SQL queries by the Calcite
> optimizer and pushed through the Flink Dataset optimizer.*
> I agree, for join reordering optimisations it makes sense to rely on
> Calcite.
> My goal is to understand how current documentation correlates to the
> Flink’s framework status.
> I've did an experimental study where I compared Flink and Spark for many
> workloads at very large scale (I’ll share the results soon) and I would
> like to develop a few ideas on top of Flink (from the results Flink is the
> winner in most of the use cases and it is our choice for the platform on
> which to develop and grow).
> My interest is in understanding more about Flink today. I am familiar with
> most of the papers written, I am watching the documentation also.
> I am looking at the DataSet API, runtime and current architecture.
> Best,
> Ovidiu
> On 29 Jun 2016, at 17:27, Aljoscha Krettek <aljoscha@apache.org> wrote:
> Hi,
> I think this document is still up-to-date since not much was done in these
> parts of the code for the 1.0 release and after that.
> Maybe Timo can give some insights into what optimizations are done in the
> Table API/SQL that will be be released in an updated version in 1.1.
> Cheers,
> Aljoscha
> +Timo, Explicitly adding Timo
> On Tue, 28 Jun 2016 at 21:41 Ovidiu-Cristian MARCU <
> ovidiu-cristian.marcu@inria.fr> wrote:
>> Hi,
>> The optimizer internals described in this document [1] are probably not
>> up-to-date.
>> Can you please confirm if this is still valid:
>> *“The following optimizations are not performed*
>>    - *Join reordering (or operator reordering in general): Joins /
>>    Filters / Reducers are not re-ordered in Flink. This is a high opportunity
>>    optimization, but with high risk in the absence of good estimates about the
>>    data characteristics. Flink is not doing these optimizations at this point.*
>>    - *Index vs. Table Scan selection: In Flink, all data sources are
>>    always scanned. The data source (the input format) may apply clever
>>    mechanism to not scan all the data, but pre-select and project. Examples
>>    are the RCFile / ORCFile / Parquet input formats."*
>> Any update of this page will be very helpful.
>> Thank you.
>> Best,
>> Ovidiu
>> [1] https://cwiki.apache.org/confluence/display/FLINK/Optimizer+Internals

View raw message