flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: execute() and collect()/print()/count()
Date Mon, 22 Jun 2015 08:26:08 GMT
+1 for cleaning up the documentation
+1 for adding a link to the documentation (should be a permalink)
+1 for printing a warning instead of an exception

On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <rmetzger@apache.org>
wrote:

> We could also add a link to the documentation into the exception that
> explains the behavior.
>
> On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <chiwanpark@icloud.com>
> wrote:
>
> > +1 for ignoring execute() call with warning.
> >
> > But I'm concerned for how the user catches the error in program without
> > any data sinks.
> >
> > By the way, eager execution is not well documented in data sinks section
> > but is in program
> > skeleton section. [1] This makes the user’s confusion. We should clean up
> > documents.
> > There are many codes calling execute() method after print() method.
> [2][3]
> >
> > We should add a description for count() method to documents too.
> >
> > [1]
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks
> > [2]
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution
> > [3]
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators
> >
> > Regards,
> > Chiwan Park
> >
> > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <mxm@apache.org>
> wrote:
> > >
> > > Dear Flink community,
> > >
> > > I have stopped to count how many people on the user list and during
> Flink
> > > trainings have asked why their Flink program throws an Exception when
> > they
> > > just one to print a DataSet. The reason for this is that print() now
> > > executes eagerly, thus, executes the Flink program. Subsequent calls to
> > > execute() need to define new DataSinks and throw an exception
> otherwise.
> > >
> > > We have recently introduced a flag in the ExecutionEnvironment that
> > checks
> > > whether the user executed before (explicitly via execute() or
> implicitly
> > > through collect()/print()/count()). That enabled us to print a nicer
> > > exception message. However, users either do not read the exception
> > message
> > > or do not understand it. They do ask this question a lot.
> > >
> > > That's why I propose to ignore calls to execute() entirely if no sinks
> > are
> > > defined. That will get rid of one of the core annoyances for Flink
> > users. I
> > > know, that this is painfully for us programmers because we understand
> how
> > > Flink works internally but let's step back once and see that it
> wouldn't
> > be
> > > so bad if execute didn't do anything in case of no new sinks.
> > >
> > > What would be the downside of this change? Users might call execute()
> and
> > > wonder that nothing happens. We would then simply print a warning that
> > > their program didn't define any sinks. That is a big difference to the
> > > behavior before because users are scared of exceptions. If they just
> get
> > a
> > > warning they will double-check their program and investigate why
> nothing
> > > happens. Most of the cases they do actually have defined sinks but
> simply
> > > left a call to execute() when they were printing a DataSet.
> > >
> > > What are you opinions on this issue? I have opened a JIRA for this as
> > well:
> > > https://issues.apache.org/jira/browse/FLINK-2249
> > >
> > > Best,
> > > Max
> >
> >
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message