From user-return-25190-archive-asf-public=cust-asf.ponee.io@flink.apache.org  Mon Jan  7 17:36:02 2019
Return-Path: <user-return-25190-archive-asf-public=cust-asf.ponee.io@flink.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id 95A0B180647
	for <archive-asf-public@cust-asf.ponee.io>; Mon,  7 Jan 2019 17:36:01 +0100 (CET)
Received: (qmail 37824 invoked by uid 500); 7 Jan 2019 16:36:00 -0000
Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:user-help@flink.apache.org>
List-Unsubscribe: <mailto:user-unsubscribe@flink.apache.org>
List-Post: <mailto:user@flink.apache.org>
List-Id: <user.flink.apache.org>
Delivered-To: mailing list user@flink.apache.org
Received: (qmail 37811 invoked by uid 99); 7 Jan 2019 16:36:00 -0000
Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152)
    by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jan 2019 16:36:00 +0000
Received: from [10.0.1.23] (dslb-002-205-086-134.002.205.pools.vodafone-ip.de [2.205.86.134])
	by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 485A443A8;
	Mon,  7 Jan 2019 16:35:59 +0000 (UTC)
Subject: Re: Reducing runtime of Flink planner
To: user@flink.apache.org
References: <20190107140550.Horde.JgKBr0YZvdzhD29fK7FGK0i@mail.uni-leipzig.de>
From: Timo Walther <twalthr@apache.org>
Cc: Fabien Huekse <fhueske@apache.org>
Message-ID: <225def92-fcb6-718f-fddd-2c2aea13e5b2@apache.org>
Date: Mon, 7 Jan 2019 17:35:58 +0100
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0)
 Gecko/20100101 Thunderbird/60.3.3
MIME-Version: 1.0
In-Reply-To: <20190107140550.Horde.JgKBr0YZvdzhD29fK7FGK0i@mail.uni-leipzig.de>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Transfer-Encoding: 7bit

Hi Niklas,

it would be interesting to know which planner caused the long runtime. 
Could you use a debugger to figure out more details? Is it really the 
Flink Table API planner or the under DataSet planner one level deeper?

There was an issue that was recently closed [1] about the DataSet 
optimizer. Could this solve your problem?

I will also loop in Fabian who might knows more.

Regards,
Timo

[1] https://issues.apache.org/jira/browse/FLINK-10566

Am 07.01.19 um 14:05 schrieb Niklas Teichmann:
> Hi everybody,
>
> I have a question concerning the planner for the Flink Table / Batch API.
> At the moment I try to use a library called Cypher for Apache Flink, a 
> project that tries to implement
> the graph database query language Cypher on Apache Flink (CAPF, 
> https://github.com/soerenreichardt/cypher-for-apache-flink).
>
> The problem is that the planner seemingly takes a very long time to 
> plan and optimize the job created by CAPF. This example job in json 
> format
>
> https://pastebin.com/J84grsjc
>
> takes on a 24 GB data set about 20 minutes to plan and about 5 minutes 
> to run the job. That seems very long for a job of this size.
>
> Do you have any idea why this is the case?
> Is there a way to give the planner hints to reduce the planning time?
>
> Thanks in advance!
> Niklas