flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Tzoumas <ktzou...@apache.org>
Subject Re: Changed the behavior of "DataSet.print()"
Date Tue, 02 Jun 2015 11:54:53 GMT
+1 for printOnTaskManager(prefix)

On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <trohrmann@apache.org> wrote:

> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fhueske@gmail.com> wrote:
>
> > +1 for writeToWorkerStdOut(prefix)
> > On Jun 2, 2015 11:42, "Aljoscha Krettek" <aljoscha@apache.org> wrote:
> >
> > > +1 for printOnTaskManager(prefix)
> > >
> > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rmetzger@apache.org>
> > > wrote:
> > > > I would like to reach consensus on this before the 0.9 release.
> > > >
> > > > So far we have the following ideas:
> > > >
> > > > writeToWorkerStdOut(prefix)
> > > > printOnTaskManager(prefix) (+1)
> > > > logOnTaskManager(prefix)
> > > >
> > > > I'm against logOnTM because we are not logging the output, we are
> > writing
> > > > or printing it.
> > > >
> > > >
> > > > *I would vote for deprecating "print(prefix)" and adding
> > > > "writeToWorkerStdOut(prefix)"*
> > > >
> > > >
> > > >
> > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <chiwanpark@icloud.com>
> > > wrote:
> > > >
> > > >> I agree that avoiding name which starts with “print” is better.
> > > >>
> > > >> Regards,
> > > >> Chiwan Park
> > > >>
> > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <mxm@apache.org>
> > > wrote:
> > > >> >
> > > >> > +1 for printOnTaskManager()
> > > >> >
> > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
> > > >> Sebastian.Kruse@hpi.de>
> > > >> > wrote:
> > > >> >
> > > >> >> Thanks, for your quick responses!
> > > >> >>
> > > >> >> I also think that renaming the old print method should do
the
> > trick.
> > > As
> > > >> a
> > > >> >> contribution to your brainstorming for a name, I propose
> > > >> logOnTaskManager()
> > > >> >> ;)
> > > >> >>
> > > >> >> Cheers,
> > > >> >> Sebastian
> > > >> >>
> > > >> >> -----Original Message-----
> > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
> > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
> > > >> >> To: dev@flink.apache.org
> > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
> > > >> >>
> > > >> >> As I said, the common print prefix might indicate eager
> execution.
> > > >> >>
> > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but
we
> > should
> > > >> make
> > > >> >> the difference in the behavior very clear, IMO.
> > > >> >>
> > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <sewen@apache.org>:
> > > >> >>
> > > >> >>> Actually, there is a method "print(String prefix)" which
still
> > goes
> > > to
> > > >> >>> the sysout of where the job is executed.
> > > >> >>>
> > > >> >>> Let's give that one the name "printOnTaskManager()" and
then we
> > > should
> > > >> >>> have it...
> > > >> >>>
> > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske <
> fhueske@gmail.com
> > >
> > > >> >> wrote:
> > > >> >>>
> > > >> >>>> I would avoid to call it printXYZ, since print()'s
behavior
> > changed
> > > >> >>>> to eager execution.
> > > >> >>>>
> > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger <rmetzger@apache.org
> >:
> > > >> >>>>
> > > >> >>>>> Okay, you are right, local is actually confusing.
> > > >> >>>>> I'm against introducing "worker" as a term in
the API. Its
> still
> > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()"
?
> > > >> >>>>>
> > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske
<
> > fhueske@gmail.com
> > > >
> > > >> >>>> wrote:
> > > >> >>>>>
> > > >> >>>>>> +1 for both.
> > > >> >>>>>>
> > > >> >>>>>> printLocal() might not be the best name,
because "local" is
> not
> > > >> >>>>>> well defined and could also be understood
as the local
> machine
> > > >> >>>>>> of the
> > > >> >>> user.
> > > >> >>>>>> How about naming the method completely different
> > > >> >>>> (writeToWorkerStdOut()?)
> > > >> >>>>>> to make sure users are not confused with
eager and lazy
> > > execution?
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger
<
> rmetzger@apache.org
> > >:
> > > >> >>>>>>
> > > >> >>>>>>> Hi Sebastian,
> > > >> >>>>>>>
> > > >> >>>>>>> thank you for the feedback. I agree that
both variants have
> a
> > > >> >>>>>>> right
> > > >> >>>> to
> > > >> >>>>>>> exist.
> > > >> >>>>>>>
> > > >> >>>>>>> I would vote for adding another method
to the DataSet called
> > > >> >>>>>> "printLocal()"
> > > >> >>>>>>> that has the old behavior.
> > > >> >>>>>>>
> > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse,
Sebastian <
> > > >> >>>>>> Sebastian.Kruse@hpi.de>
> > > >> >>>>>>> wrote:
> > > >> >>>>>>>
> > > >> >>>>>>>> Hi everyone,
> > > >> >>>>>>>>
> > > >> >>>>>>>> I am a bit worried about that recent
change of the print()
> > > >> >>> method.
> > > >> >>>> I
> > > >> >>>>>> can
> > > >> >>>>>>>> understand the rationale that obtaining
the stdout from all
> > > >> >>>>>>>> the taskmanagers is cumbersome (although,
for local
> > > >> >>>>>>>> debugging the old
> > > >> >>>>>> print()
> > > >> >>>>>>>> was fine).
> > > >> >>>>>>>> However, a major problem, I see with
the new print(), is,
> > > >> >>>>>>>> that
> > > >> >>> now
> > > >> >>>>> you
> > > >> >>>>>>> can
> > > >> >>>>>>>> only have one print() per plan, as
the plan is directly
> > > >> >>>>>>>> executed
> > > >> >>> as
> > > >> >>>>>> soon
> > > >> >>>>>>> as
> > > >> >>>>>>>> print() is invoked. If you regard
print() as a debugging
> > > >> >>>>>>>> means,
> > > >> >>>> this
> > > >> >>>>>> is a
> > > >> >>>>>>>> severe restriction.
> > > >> >>>>>>>> I see use cases for both print()
implementations, but I
> > > >> >>>>>>>> would at
> > > >> >>>>> least
> > > >> >>>>>>>> provide some kind of backwards compatibility,
be at a
> > > >> >>>>>>>> parameter
> > > >> >>> or
> > > >> >>>> a
> > > >> >>>>>>>> legacyPrint() method or anything
else. As I assume print()
> > > >> >>>>>>>> to be
> > > >> >>>> very
> > > >> >>>>>>>> frequently used, a lot of existing
programs would benefit
> > > >> >>>>>>>> from
> > > >> >>> this
> > > >> >>>>> and
> > > >> >>>>>>>> might otherwise not be directly portable
to newer Flink
> > > >> >> versions.
> > > >> >>>>> What
> > > >> >>>>>> do
> > > >> >>>>>>>> you think?
> > > >> >>>>>>>>
> > > >> >>>>>>>> Cheers,
> > > >> >>>>>>>> Sebastian
> > > >> >>>>>>>>
> > > >> >>>>>>>> -----Original Message-----
> > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
> > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12
> > > >> >>>>>>>> To: dev@flink.apache.org
> > > >> >>>>>>>> Subject: Re: Changed the behavior
of "DataSet.print()"
> > > >> >>>>>>>>
> > > >> >>>>>>>> I've filed a JIRA to update the documentation:
> > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM,
Stephan Ewen
> > > >> >>>>>>>> <sewen@apache.org
> > > >> >>>>
> > > >> >>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> Hi all!
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Me merged a patch yesterday that
changed the API behavior
> > > >> >>>>>>>>> of
> > > >> >>> the
> > > >> >>>>>>>>> "DataSet.print()" function.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> "print()" now prints to stdout
on the client process,
> > > >> >>>>>>>>> rather
> > > >> >>> than
> > > >> >>>>> the
> > > >> >>>>>>>>> TaskManager process, as before.
This is much nicer for
> > > >> >>> debugging
> > > >> >>>>> and
> > > >> >>>>>>>>> exploring data sets.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> One implication of this is that
print() is now an eager
> > > >> >>>>>>>>> method
> > > >> >>> (
> > > >> >>>>> like
> > > >> >>>>>>>>> collect() or count() ). That
means that calling "print()"
> > > >> >>>>> immediately
> > > >> >>>>>>>>> triggers the execution, and no
"env.execute()" is required
> > > >> >>>>>>>>> any
> > > >> >>>>> more.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> Greetings,
> > > >> >>>>>>>>> Stephan
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>
> > > >> >>
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message