flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: API behavior with data sinks (lazy) and eager operations
Date Mon, 19 Jan 2015 09:59:11 GMT
I would also execute the sinks immediately. I think its a corner case
because the sinks are usually the last thing in a plan and all print() or
collect() statements are earlier in the plan.

print() should go to the client command line, yes.

On Mon, Jan 19, 2015 at 1:42 AM, Stephan Ewen <sewen@apache.org> wrote:

> Hi there!
> With the upcoming more interactive extensions to the API (operations that
> go back to the client from a program and need to be eagerly evaluated) we
> need to define how different actions should behave.
> Currently, nothing gets executed until the "env.execute()" call is made.
> That allows to produce multiple data sources at the same time, which is a
> good feature.
> For certain operations, like the "count()" and "collect()" functions added
> in https://github.com/apache/flink/pull/210 , we need to trigger execution
> immediately.
> The open question is, how should this behave in connection with already
> defined data sinks:
> 1) Should all yet defined data sinks be executed as well?
> 2) Should only that immediate operation be executed and the data sinks be
> pending till a call to "env.execute()"
> I am somewhat leaning towards the first option right now, because I think
> that executing them later may force re-execution of larger parts of the
> plan.
> In addition: I think that the "print()" commands should go to the client
> command line. In that sense, they would behave like
> "collect().foreach(print)"
> Greetings,
> Stephan

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message