flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: execute() and collect()/print()/count()
Date Tue, 23 Jun 2015 09:51:01 GMT
That would help to get around many cases, it still leaves some open, like
forgetting to create a sink after transformations after a collect() call.

Would probably be a good improvement over the status quo, though...

On Mon, Jun 22, 2015 at 9:37 PM, Alexander Alexandrov <
alexander.s.alexandrov@gmail.com> wrote:

> What about adding some state state to the DataBag internals that tracks the
> following conditions
>
> 1. whether the last job execution was triggered by an "enforcer" API method
> like print() / collect();
> 2. whether a DataSource / lazy operator was created after that;
>
> If 1 is true and 2 is false, a WARN can be displayed. Otherwise, we can
> still throw an error.
>
>
> 2015-06-22 18:17 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>
> > We have two situations to trade off here, and fixing one will make the
> > other worse:
> >
> > 1) env.execute() after collect() - see Max's mail
> >
> > 2) env.execute() on empty sinks program. Not throwing an exception makes
> > people wonder why nothing happens (if they write the program to just test
> > whether it runs or if they want to measure time).
> >
> > Both choices make one behave nice and the other not. So far, the idea was
> > that throwing an exception on empty sinks is that the error message will
> > help people figure out what is wrong fast. Debugging why nothing happens
> > can be slow.
> >
> >
> > It is hard to say if we would not introduce another source of confusion
> by
> > fixing one...
> >
> >
> >
> >
> >
> > On Mon, Jun 22, 2015 at 10:26 AM, Maximilian Michels <mxm@apache.org>
> > wrote:
> >
> > > +1 for cleaning up the documentation
> > > +1 for adding a link to the documentation (should be a permalink)
> > > +1 for printing a warning instead of an exception
> > >
> > > On Sun, Jun 21, 2015 at 12:25 AM, Robert Metzger <rmetzger@apache.org>
> > > wrote:
> > >
> > > > We could also add a link to the documentation into the exception that
> > > > explains the behavior.
> > > >
> > > > On Fri, Jun 19, 2015 at 5:52 AM, Chiwan Park <chiwanpark@icloud.com>
> > > > wrote:
> > > >
> > > > > +1 for ignoring execute() call with warning.
> > > > >
> > > > > But I'm concerned for how the user catches the error in program
> > without
> > > > > any data sinks.
> > > > >
> > > > > By the way, eager execution is not well documented in data sinks
> > > section
> > > > > but is in program
> > > > > skeleton section. [1] This makes the user’s confusion. We should
> > clean
> > > up
> > > > > documents.
> > > > > There are many codes calling execute() method after print() method.
> > > > [2][3]
> > > > >
> > > > > We should add a description for count() method to documents too.
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#data-sinks
> > > > > [2]
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#parallel-execution
> > > > > [3]
> > > > >
> > > >
> > >
> >
> http://ci.apache.org/projects/flink/flink-docs-master/apis/programming_guide.html#iteration-operators
> > > > >
> > > > > Regards,
> > > > > Chiwan Park
> > > > >
> > > > > > On Jun 19, 2015, at 9:15 PM, Maximilian Michels <mxm@apache.org>
> > > > wrote:
> > > > > >
> > > > > > Dear Flink community,
> > > > > >
> > > > > > I have stopped to count how many people on the user list and
> during
> > > > Flink
> > > > > > trainings have asked why their Flink program throws an Exception
> > when
> > > > > they
> > > > > > just one to print a DataSet. The reason for this is that print()
> > now
> > > > > > executes eagerly, thus, executes the Flink program. Subsequent
> > calls
> > > to
> > > > > > execute() need to define new DataSinks and throw an exception
> > > > otherwise.
> > > > > >
> > > > > > We have recently introduced a flag in the ExecutionEnvironment
> that
> > > > > checks
> > > > > > whether the user executed before (explicitly via execute() or
> > > > implicitly
> > > > > > through collect()/print()/count()). That enabled us to print
a
> > nicer
> > > > > > exception message. However, users either do not read the
> exception
> > > > > message
> > > > > > or do not understand it. They do ask this question a lot.
> > > > > >
> > > > > > That's why I propose to ignore calls to execute() entirely if
no
> > > sinks
> > > > > are
> > > > > > defined. That will get rid of one of the core annoyances for
> Flink
> > > > > users. I
> > > > > > know, that this is painfully for us programmers because we
> > understand
> > > > how
> > > > > > Flink works internally but let's step back once and see that
it
> > > > wouldn't
> > > > > be
> > > > > > so bad if execute didn't do anything in case of no new sinks.
> > > > > >
> > > > > > What would be the downside of this change? Users might call
> > execute()
> > > > and
> > > > > > wonder that nothing happens. We would then simply print a warning
> > > that
> > > > > > their program didn't define any sinks. That is a big difference
> to
> > > the
> > > > > > behavior before because users are scared of exceptions. If they
> > just
> > > > get
> > > > > a
> > > > > > warning they will double-check their program and investigate
why
> > > > nothing
> > > > > > happens. Most of the cases they do actually have defined sinks
> but
> > > > simply
> > > > > > left a call to execute() when they were printing a DataSet.
> > > > > >
> > > > > > What are you opinions on this issue? I have opened a JIRA for
> this
> > as
> > > > > well:
> > > > > > https://issues.apache.org/jira/browse/FLINK-2249
> > > > > >
> > > > > > Best,
> > > > > > Max
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message