From user-return-25190-archive-asf-public=cust-asf.ponee.io@flink.apache.org Mon Jan 7 17:36:02 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 95A0B180647 for ; Mon, 7 Jan 2019 17:36:01 +0100 (CET) Received: (qmail 37824 invoked by uid 500); 7 Jan 2019 16:36:00 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 37811 invoked by uid 99); 7 Jan 2019 16:36:00 -0000 Received: from mail-relay.apache.org (HELO mailrelay1-lw-us.apache.org) (207.244.88.152) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Jan 2019 16:36:00 +0000 Received: from [10.0.1.23] (dslb-002-205-086-134.002.205.pools.vodafone-ip.de [2.205.86.134]) by mailrelay1-lw-us.apache.org (ASF Mail Server at mailrelay1-lw-us.apache.org) with ESMTPSA id 485A443A8; Mon, 7 Jan 2019 16:35:59 +0000 (UTC) Subject: Re: Reducing runtime of Flink planner To: user@flink.apache.org References: <20190107140550.Horde.JgKBr0YZvdzhD29fK7FGK0i@mail.uni-leipzig.de> From: Timo Walther Cc: Fabien Huekse Message-ID: <225def92-fcb6-718f-fddd-2c2aea13e5b2@apache.org> Date: Mon, 7 Jan 2019 17:35:58 +0100 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:60.0) Gecko/20100101 Thunderbird/60.3.3 MIME-Version: 1.0 In-Reply-To: <20190107140550.Horde.JgKBr0YZvdzhD29fK7FGK0i@mail.uni-leipzig.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi Niklas, it would be interesting to know which planner caused the long runtime. Could you use a debugger to figure out more details? Is it really the Flink Table API planner or the under DataSet planner one level deeper? There was an issue that was recently closed [1] about the DataSet optimizer. Could this solve your problem? I will also loop in Fabian who might knows more. Regards, Timo [1] https://issues.apache.org/jira/browse/FLINK-10566 Am 07.01.19 um 14:05 schrieb Niklas Teichmann: > Hi everybody, > > I have a question concerning the planner for the Flink Table / Batch API. > At the moment I try to use a library called Cypher for Apache Flink, a > project that tries to implement > the graph database query language Cypher on Apache Flink (CAPF, > https://github.com/soerenreichardt/cypher-for-apache-flink). > > The problem is that the planner seemingly takes a very long time to > plan and optimize the job created by CAPF. This example job in json > format > > https://pastebin.com/J84grsjc > > takes on a 24 GB data set about 20 minutes to plan and about 5 minutes > to run the job. That seems very long for a job of this size. > > Do you have any idea why this is the case? > Is there a way to give the planner hints to reduce the planning time? > > Thanks in advance! > Niklas