Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0AE5317796 for ; Mon, 22 Jun 2015 08:26:29 +0000 (UTC) Received: (qmail 50730 invoked by uid 500); 22 Jun 2015 08:26:28 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 50672 invoked by uid 500); 22 Jun 2015 08:26:28 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 50661 invoked by uid 99); 22 Jun 2015 08:26:28 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 22 Jun 2015 08:26:28 +0000 Received: from mail-vn0-f51.google.com (mail-vn0-f51.google.com [209.85.216.51]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 67FB81A02FC for ; Mon, 22 Jun 2015 08:26:28 +0000 (UTC) Received: by vnbg1 with SMTP id g1so6818040vnb.3 for ; Mon, 22 Jun 2015 01:26:27 -0700 (PDT) X-Gm-Message-State: ALoCoQkhvGDbr7Gv4Tv5AxHfwe4VFdtb4D6FJJbivnMvGMP+mjD7HV/g6HlxbtZTcnlrkqG54gny X-Received: by 10.52.103.8 with SMTP id fs8mr17856849vdb.13.1434961587443; Mon, 22 Jun 2015 01:26:27 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.207.67 with HTTP; Mon, 22 Jun 2015 01:26:08 -0700 (PDT) In-Reply-To: References: <3DD7C050-E811-45ED-BCD4-55C83F228BE1@icloud.com> From: Maximilian Michels Date: Mon, 22 Jun 2015 10:26:08 +0200 Message-ID: Subject: Re: execute() and collect()/print()/count() To: dev@flink.apache.org Content-Type: multipart/alternative; boundary=047d7b86ef2eafb0d90519170747 --047d7b86ef2eafb0d90519170747 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 for cleaning up the documentation +1 for adding a link to the documentation (should be a permalink) +1 for printing a warning instead of an exception On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger wrote: > We could also add a link to the documentation into the exception that > explains the behavior. > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park > wrote: > > > +1 for ignoring execute() call with warning. > > > > But I'm concerned for how the user catches the error in program without > > any data sinks. > > > > By the way, eager execution is not well documented in data sinks sectio= n > > but is in program > > skeleton section. [1] This makes the user=E2=80=99s confusion. We shoul= d clean up > > documents. > > There are many codes calling execute() method after print() method. > [2][3] > > > > We should add a description for count() method to documents too. > > > > [1] > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_gu= ide.html#data-sinks > > [2] > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_gu= ide.html#parallel-execution > > [3] > > > http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_gu= ide.html#iteration-operators > > > > Regards, > > Chiwan Park > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels > wrote: > > > > > > Dear Flink community, > > > > > > I have stopped to count how many people on the user list and during > Flink > > > trainings have asked why their Flink program throws an Exception when > > they > > > just one to print a DataSet. The reason for this is that print() now > > > executes eagerly, thus, executes the Flink program. Subsequent calls = to > > > execute() need to define new DataSinks and throw an exception > otherwise. > > > > > > We have recently introduced a flag in the ExecutionEnvironment that > > checks > > > whether the user executed before (explicitly via execute() or > implicitly > > > through collect()/print()/count()). That enabled us to print a nicer > > > exception message. However, users either do not read the exception > > message > > > or do not understand it. They do ask this question a lot. > > > > > > That's why I propose to ignore calls to execute() entirely if no sink= s > > are > > > defined. That will get rid of one of the core annoyances for Flink > > users. I > > > know, that this is painfully for us programmers because we understand > how > > > Flink works internally but let's step back once and see that it > wouldn't > > be > > > so bad if execute didn't do anything in case of no new sinks. > > > > > > What would be the downside of this change? Users might call execute() > and > > > wonder that nothing happens. We would then simply print a warning tha= t > > > their program didn't define any sinks. That is a big difference to th= e > > > behavior before because users are scared of exceptions. If they just > get > > a > > > warning they will double-check their program and investigate why > nothing > > > happens. Most of the cases they do actually have defined sinks but > simply > > > left a call to execute() when they were printing a DataSet. > > > > > > What are you opinions on this issue? I have opened a JIRA for this as > > well: > > > https://issues.apache.org/jira/browse/FLINK-2249 > > > > > > Best, > > > Max > > > > > > > > > > > --047d7b86ef2eafb0d90519170747--