flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Changed the behavior of "DataSet.print()"
Date Tue, 02 Jun 2015 09:35:39 GMT
I would like to reach consensus on this before the 0.9 release.

So far we have the following ideas:

writeToWorkerStdOut(prefix)
printOnTaskManager(prefix) (+1)
logOnTaskManager(prefix)

I'm against logOnTM because we are not logging the output, we are writing
or printing it.


*I would vote for deprecating "print(prefix)" and adding
"writeToWorkerStdOut(prefix)"*



On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <chiwanpark@icloud.com> wrote:

> I agree that avoiding name which starts with “print” is better.
>
> Regards,
> Chiwan Park
>
> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mxm@apache.org> wrote:
> >
> > +1 for printOnTaskManager()
> >
> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> Sebastian.Kruse@hpi.de>
> > wrote:
> >
> >> Thanks, for your quick responses!
> >>
> >> I also think that renaming the old print method should do the trick. As
> a
> >> contribution to your brainstorming for a name, I propose
> logOnTaskManager()
> >> ;)
> >>
> >> Cheers,
> >> Sebastian
> >>
> >> -----Original Message-----
> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> To: dev@flink.apache.org
> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >>
> >> As I said, the common print prefix might indicate eager execution.
> >>
> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> make
> >> the difference in the behavior very clear, IMO.
> >>
> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> >>
> >>> Actually, there is a method "print(String prefix)" which still goes to
> >>> the sysout of where the job is executed.
> >>>
> >>> Let's give that one the name "printOnTaskManager()" and then we should
> >>> have it...
> >>>
> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fhueske@gmail.com>
> >> wrote:
> >>>
> >>>> I would avoid to call it printXYZ, since print()'s behavior changed
> >>>> to eager execution.
> >>>>
> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rmetzger@apache.org>:
> >>>>
> >>>>> Okay, you are right, local is actually confusing.
> >>>>> I'm against introducing "worker" as a term in the API. Its still
> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >>>>>
> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fhueske@gmail.com>
> >>>> wrote:
> >>>>>
> >>>>>> +1 for both.
> >>>>>>
> >>>>>> printLocal() might not be the best name, because "local" is
not
> >>>>>> well defined and could also be understood as the local machine
> >>>>>> of the
> >>> user.
> >>>>>> How about naming the method completely different
> >>>> (writeToWorkerStdOut()?)
> >>>>>> to make sure users are not confused with eager and lazy execution?
> >>>>>>
> >>>>>>
> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rmetzger@apache.org>:
> >>>>>>
> >>>>>>> Hi Sebastian,
> >>>>>>>
> >>>>>>> thank you for the feedback. I agree that both variants have
a
> >>>>>>> right
> >>>> to
> >>>>>>> exist.
> >>>>>>>
> >>>>>>> I would vote for adding another method to the DataSet called
> >>>>>> "printLocal()"
> >>>>>>> that has the old behavior.
> >>>>>>>
> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian <
> >>>>>> Sebastian.Kruse@hpi.de>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Hi everyone,
> >>>>>>>>
> >>>>>>>> I am a bit worried about that recent change of the print()
> >>> method.
> >>>> I
> >>>>>> can
> >>>>>>>> understand the rationale that obtaining the stdout from
all
> >>>>>>>> the taskmanagers is cumbersome (although, for local
> >>>>>>>> debugging the old
> >>>>>> print()
> >>>>>>>> was fine).
> >>>>>>>> However, a major problem, I see with the new print(),
is,
> >>>>>>>> that
> >>> now
> >>>>> you
> >>>>>>> can
> >>>>>>>> only have one print() per plan, as the plan is directly
> >>>>>>>> executed
> >>> as
> >>>>>> soon
> >>>>>>> as
> >>>>>>>> print() is invoked. If you regard print() as a debugging
> >>>>>>>> means,
> >>>> this
> >>>>>> is a
> >>>>>>>> severe restriction.
> >>>>>>>> I see use cases for both print() implementations, but
I
> >>>>>>>> would at
> >>>>> least
> >>>>>>>> provide some kind of backwards compatibility, be at
a
> >>>>>>>> parameter
> >>> or
> >>>> a
> >>>>>>>> legacyPrint() method or anything else. As I assume print()
> >>>>>>>> to be
> >>>> very
> >>>>>>>> frequently used, a lot of existing programs would benefit
> >>>>>>>> from
> >>> this
> >>>>> and
> >>>>>>>> might otherwise not be directly portable to newer Flink
> >> versions.
> >>>>> What
> >>>>>> do
> >>>>>>>> you think?
> >>>>>>>>
> >>>>>>>> Cheers,
> >>>>>>>> Sebastian
> >>>>>>>>
> >>>>>>>> -----Original Message-----
> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >>>>>>>> To: dev@flink.apache.org
> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >>>>>>>>
> >>>>>>>> I've filed a JIRA to update the documentation:
> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >>>>>>>>
> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >>>>>>>> <sewen@apache.org
> >>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi all!
> >>>>>>>>>
> >>>>>>>>> Me merged a patch yesterday that changed the API
behavior
> >>>>>>>>> of
> >>> the
> >>>>>>>>> "DataSet.print()" function.
> >>>>>>>>>
> >>>>>>>>> "print()" now prints to stdout on the client process,
> >>>>>>>>> rather
> >>> than
> >>>>> the
> >>>>>>>>> TaskManager process, as before. This is much nicer
for
> >>> debugging
> >>>>> and
> >>>>>>>>> exploring data sets.
> >>>>>>>>>
> >>>>>>>>> One implication of this is that print() is now an
eager
> >>>>>>>>> method
> >>> (
> >>>>> like
> >>>>>>>>> collect() or count() ). That means that calling
"print()"
> >>>>> immediately
> >>>>>>>>> triggers the execution, and no "env.execute()" is
required
> >>>>>>>>> any
> >>>>> more.
> >>>>>>>>>
> >>>>>>>>> Greetings,
> >>>>>>>>> Stephan
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message