flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timo Walther <twal...@apache.org>
Subject Re: Reducing runtime of Flink planner
Date Mon, 07 Jan 2019 16:35:58 GMT
Hi Niklas,

it would be interesting to know which planner caused the long runtime. 
Could you use a debugger to figure out more details? Is it really the 
Flink Table API planner or the under DataSet planner one level deeper?

There was an issue that was recently closed [1] about the DataSet 
optimizer. Could this solve your problem?

I will also loop in Fabian who might knows more.

Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-10566

Am 07.01.19 um 14:05 schrieb Niklas Teichmann:
> Hi everybody,
>
> I have a question concerning the planner for the Flink Table / Batch API.
> At the moment I try to use a library called Cypher for Apache Flink, a 
> project that tries to implement
> the graph database query language Cypher on Apache Flink (CAPF, 
> https://github.com/soerenreichardt/cypher-for-apache-flink).
>
> The problem is that the planner seemingly takes a very long time to 
> plan and optimize the job created by CAPF. This example job in json 
> format
>
> https://pastebin.com/J84grsjc
>
> takes on a 24 GB data set about 20 minutes to plan and about 5 minutes 
> to run the job. That seems very long for a job of this size.
>
> Do you have any idea why this is the case?
> Is there a way to give the planner hints to reduce the planning time?
>
> Thanks in advance!
> Niklas



Mime
View raw message