flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maximilian Michels <...@apache.org>
Subject Re: Changed the behavior of "DataSet.print()"
Date Thu, 28 May 2015 14:35:12 GMT
+1 for printOnTaskManager()

On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <Sebastian.Kruse@hpi.de>
wrote:

> Thanks, for your quick responses!
>
> I also think that renaming the old print method should do the trick. As a
> contribution to your brainstorming for a name, I propose logOnTaskManager()
> ;)
>
> Cheers,
> Sebastian
>
> -----Original Message-----
> From: Fabian Hueske [mailto:fhueske@gmail.com]
> Sent: Donnerstag, 28. Mai 2015 14:34
> To: dev@flink.apache.org
> Subject: Re: Changed the behavior of "DataSet.print()"
>
> As I said, the common print prefix might indicate eager execution.
>
> I know that writeToTaskManagerStdOut() is quite bulky, but we should make
> the difference in the behavior very clear, IMO.
>
> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>
> > Actually, there is a method "print(String prefix)" which still goes to
> > the sysout of where the job is executed.
> >
> > Let's give that one the name "printOnTaskManager()" and then we should
> > have it...
> >
> > On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fhueske@gmail.com>
> wrote:
> >
> > > I would avoid to call it printXYZ, since print()'s behavior changed
> > > to eager execution.
> > >
> > > 2015-05-28 14:10 GMT+02:00 Robert Metzger <rmetzger@apache.org>:
> > >
> > > > Okay, you are right, local is actually confusing.
> > > > I'm against introducing "worker" as a term in the API. Its still
> > > > called "TaskManager". Maybe "printOnTaskManager()" ?
> > > >
> > > > On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fhueske@gmail.com>
> > > wrote:
> > > >
> > > > > +1 for both.
> > > > >
> > > > > printLocal() might not be the best name, because "local" is not
> > > > > well defined and could also be understood as the local machine
> > > > > of the
> > user.
> > > > > How about naming the method completely different
> > > (writeToWorkerStdOut()?)
> > > > > to make sure users are not confused with eager and lazy execution?
> > > > >
> > > > >
> > > > > 2015-05-28 13:44 GMT+02:00 Robert Metzger <rmetzger@apache.org>:
> > > > >
> > > > > > Hi Sebastian,
> > > > > >
> > > > > > thank you for the feedback. I agree that both variants have
a
> > > > > > right
> > > to
> > > > > > exist.
> > > > > >
> > > > > > I would vote for adding another method to the DataSet called
> > > > > "printLocal()"
> > > > > > that has the old behavior.
> > > > > >
> > > > > > On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> > > > > Sebastian.Kruse@hpi.de>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi everyone,
> > > > > > >
> > > > > > > I am a bit worried about that recent change of the print()
> > method.
> > > I
> > > > > can
> > > > > > > understand the rationale that obtaining the stdout from
all
> > > > > > > the taskmanagers is cumbersome (although, for local
> > > > > > > debugging the old
> > > > > print()
> > > > > > > was fine).
> > > > > > > However, a major problem, I see with the new print(), is,
> > > > > > > that
> > now
> > > > you
> > > > > > can
> > > > > > > only have one print() per plan, as the plan is directly
> > > > > > > executed
> > as
> > > > > soon
> > > > > > as
> > > > > > > print() is invoked. If you regard print() as a debugging
> > > > > > > means,
> > > this
> > > > > is a
> > > > > > > severe restriction.
> > > > > > > I see use cases for both print() implementations, but I
> > > > > > > would at
> > > > least
> > > > > > > provide some kind of backwards compatibility, be at a
> > > > > > > parameter
> > or
> > > a
> > > > > > > legacyPrint() method or anything else. As I assume print()
> > > > > > > to be
> > > very
> > > > > > > frequently used, a lot of existing programs would benefit
> > > > > > > from
> > this
> > > > and
> > > > > > > might otherwise not be directly portable to newer Flink
> versions.
> > > > What
> > > > > do
> > > > > > > you think?
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Sebastian
> > > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Robert Metzger [mailto:rmetzger@apache.org]
> > > > > > > Sent: Dienstag, 26. Mai 2015 11:12
> > > > > > > To: dev@flink.apache.org
> > > > > > > Subject: Re: Changed the behavior of "DataSet.print()"
> > > > > > >
> > > > > > > I've filed a JIRA to update the documentation:
> > > > > > > https://issues.apache.org/jira/browse/FLINK-2092
> > > > > > >
> > > > > > > On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> > > > > > > <sewen@apache.org
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > Hi all!
> > > > > > > >
> > > > > > > > Me merged a patch yesterday that changed the API behavior
> > > > > > > > of
> > the
> > > > > > > > "DataSet.print()" function.
> > > > > > > >
> > > > > > > > "print()" now prints to stdout on the client process,
> > > > > > > > rather
> > than
> > > > the
> > > > > > > > TaskManager process, as before. This is much nicer
for
> > debugging
> > > > and
> > > > > > > > exploring data sets.
> > > > > > > >
> > > > > > > > One implication of this is that print() is now an
eager
> > > > > > > > method
> > (
> > > > like
> > > > > > > > collect() or count() ). That means that calling "print()"
> > > > immediately
> > > > > > > > triggers the execution, and no "env.execute()" is
required
> > > > > > > > any
> > > > more.
> > > > > > > >
> > > > > > > > Greetings,
> > > > > > > > Stephan
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message