flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Changed the behavior of "DataSet.print()"
Date Tue, 02 Jun 2015 10:08:08 GMT
+1 for writeToWorkerStdOut(prefix)
On Jun 2, 2015 11:42, "Aljoscha Krettek" <aljoscha@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rmetzger@apache.org>
> wrote:
> > I would like to reach consensus on this before the 0.9 release.
> >
> > So far we have the following ideas:
> >
> > writeToWorkerStdOut(prefix)
> > printOnTaskManager(prefix) (+1)
> > logOnTaskManager(prefix)
> >
> > I'm against logOnTM because we are not logging the output, we are writing
> > or printing it.
> >
> >
> > *I would vote for deprecating "print(prefix)" and adding
> > "writeToWorkerStdOut(prefix)"*
> >
> >
> >
> > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <chiwanpark@icloud.com>
> wrote:
> >
> >> I agree that avoiding name which starts with “print” is better.
> >>
> >> Regards,
> >> Chiwan Park
> >>
> >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mxm@apache.org>
> wrote:
> >> >
> >> > +1 for printOnTaskManager()
> >> >
> >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> >> Sebastian.Kruse@hpi.de>
> >> > wrote:
> >> >
> >> >> Thanks, for your quick responses!
> >> >>
> >> >> I also think that renaming the old print method should do the trick.
> As
> >> a
> >> >> contribution to your brainstorming for a name, I propose
> >> logOnTaskManager()
> >> >> ;)
> >> >>
> >> >> Cheers,
> >> >> Sebastian
> >> >>
> >> >> -----Original Message-----
> >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> >> >> To: dev@flink.apache.org
> >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>
> >> >> As I said, the common print prefix might indicate eager execution.
> >> >>
> >> >> I know that writeToTaskManagerStdOut() is quite bulky, but we should
> >> make
> >> >> the difference in the behavior very clear, IMO.
> >> >>
> >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> >> >>
> >> >>> Actually, there is a method "print(String prefix)" which still
goes
> to
> >> >>> the sysout of where the job is executed.
> >> >>>
> >> >>> Let's give that one the name "printOnTaskManager()" and then we
> should
> >> >>> have it...
> >> >>>
> >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <fhueske@gmail.com>
> >> >> wrote:
> >> >>>
> >> >>>> I would avoid to call it printXYZ, since print()'s behavior
changed
> >> >>>> to eager execution.
> >> >>>>
> >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rmetzger@apache.org>:
> >> >>>>
> >> >>>>> Okay, you are right, local is actually confusing.
> >> >>>>> I'm against introducing "worker" as a term in the API.
Its still
> >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ?
> >> >>>>>
> >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske <fhueske@gmail.com
> >
> >> >>>> wrote:
> >> >>>>>
> >> >>>>>> +1 for both.
> >> >>>>>>
> >> >>>>>> printLocal() might not be the best name, because "local"
is not
> >> >>>>>> well defined and could also be understood as the local
machine
> >> >>>>>> of the
> >> >>> user.
> >> >>>>>> How about naming the method completely different
> >> >>>> (writeToWorkerStdOut()?)
> >> >>>>>> to make sure users are not confused with eager and
lazy
> execution?
> >> >>>>>>
> >> >>>>>>
> >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger <rmetzger@apache.org>:
> >> >>>>>>
> >> >>>>>>> Hi Sebastian,
> >> >>>>>>>
> >> >>>>>>> thank you for the feedback. I agree that both variants
have a
> >> >>>>>>> right
> >> >>>> to
> >> >>>>>>> exist.
> >> >>>>>>>
> >> >>>>>>> I would vote for adding another method to the DataSet
called
> >> >>>>>> "printLocal()"
> >> >>>>>>> that has the old behavior.
> >> >>>>>>>
> >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian
<
> >> >>>>>> Sebastian.Kruse@hpi.de>
> >> >>>>>>> wrote:
> >> >>>>>>>
> >> >>>>>>>> Hi everyone,
> >> >>>>>>>>
> >> >>>>>>>> I am a bit worried about that recent change
of the print()
> >> >>> method.
> >> >>>> I
> >> >>>>>> can
> >> >>>>>>>> understand the rationale that obtaining the
stdout from all
> >> >>>>>>>> the taskmanagers is cumbersome (although, for
local
> >> >>>>>>>> debugging the old
> >> >>>>>> print()
> >> >>>>>>>> was fine).
> >> >>>>>>>> However, a major problem, I see with the new
print(), is,
> >> >>>>>>>> that
> >> >>> now
> >> >>>>> you
> >> >>>>>>> can
> >> >>>>>>>> only have one print() per plan, as the plan
is directly
> >> >>>>>>>> executed
> >> >>> as
> >> >>>>>> soon
> >> >>>>>>> as
> >> >>>>>>>> print() is invoked. If you regard print() as
a debugging
> >> >>>>>>>> means,
> >> >>>> this
> >> >>>>>> is a
> >> >>>>>>>> severe restriction.
> >> >>>>>>>> I see use cases for both print() implementations,
but I
> >> >>>>>>>> would at
> >> >>>>> least
> >> >>>>>>>> provide some kind of backwards compatibility,
be at a
> >> >>>>>>>> parameter
> >> >>> or
> >> >>>> a
> >> >>>>>>>> legacyPrint() method or anything else. As I
assume print()
> >> >>>>>>>> to be
> >> >>>> very
> >> >>>>>>>> frequently used, a lot of existing programs
would benefit
> >> >>>>>>>> from
> >> >>> this
> >> >>>>> and
> >> >>>>>>>> might otherwise not be directly portable to
newer Flink
> >> >> versions.
> >> >>>>> What
> >> >>>>>> do
> >> >>>>>>>> you think?
> >> >>>>>>>>
> >> >>>>>>>> Cheers,
> >> >>>>>>>> Sebastian
> >> >>>>>>>>
> >> >>>>>>>> -----Original Message-----
> >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> >> >>>>>>>> To: dev@flink.apache.org
> >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()"
> >> >>>>>>>>
> >> >>>>>>>> I've filed a JIRA to update the documentation:
> >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> >> >>>>>>>>
> >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen
> >> >>>>>>>> <sewen@apache.org
> >> >>>>
> >> >>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hi all!
> >> >>>>>>>>>
> >> >>>>>>>>> Me merged a patch yesterday that changed
the API behavior
> >> >>>>>>>>> of
> >> >>> the
> >> >>>>>>>>> "DataSet.print()" function.
> >> >>>>>>>>>
> >> >>>>>>>>> "print()" now prints to stdout on the client
process,
> >> >>>>>>>>> rather
> >> >>> than
> >> >>>>> the
> >> >>>>>>>>> TaskManager process, as before. This is
much nicer for
> >> >>> debugging
> >> >>>>> and
> >> >>>>>>>>> exploring data sets.
> >> >>>>>>>>>
> >> >>>>>>>>> One implication of this is that print()
is now an eager
> >> >>>>>>>>> method
> >> >>> (
> >> >>>>> like
> >> >>>>>>>>> collect() or count() ). That means that
calling "print()"
> >> >>>>> immediately
> >> >>>>>>>>> triggers the execution, and no "env.execute()"
is required
> >> >>>>>>>>> any
> >> >>>>> more.
> >> >>>>>>>>>
> >> >>>>>>>>> Greetings,
> >> >>>>>>>>> Stephan
> >> >>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>
> >> >>>
> >> >>
> >>
> >>
> >>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message