flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Changed the behavior of "DataSet.print()"
Date Tue, 02 Jun 2015 15:24:08 GMT
By the way, we also should rename the corresponding Streaming API
method accordingly.

On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels <mxm@apache.org> wrote:
> +1 for printOnTaskManager(prefix)
>
> On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas <ktzoumas@apache.org> wrote:
>
>> +1 for printOnTaskManager(prefix)
>>
>> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann <trohrmann@apache.org>
>> wrote:
>>
>> > +1 for printOnTaskManager(prefix)
>> >
>> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske <fhueske@gmail.com>
>> wrote:
>> >
>> > > +1 for writeToWorkerStdOut(prefix)
>> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" <aljoscha@apache.org> wrote:
>> > >
>> > > > +1 for printOnTaskManager(prefix)
>> > > >
>> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger <rmetzger@apache.org
>> >
>> > > > wrote:
>> > > > > I would like to reach consensus on this before the 0.9 release.
>> > > > >
>> > > > > So far we have the following ideas:
>> > > > >
>> > > > > writeToWorkerStdOut(prefix)
>> > > > > printOnTaskManager(prefix) (+1)
>> > > > > logOnTaskManager(prefix)
>> > > > >
>> > > > > I'm against logOnTM because we are not logging the output, we
are
>> > > writing
>> > > > > or printing it.
>> > > > >
>> > > > >
>> > > > > *I would vote for deprecating "print(prefix)" and adding
>> > > > > "writeToWorkerStdOut(prefix)"*
>> > > > >
>> > > > >
>> > > > >
>> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park <
>> chiwanpark@icloud.com>
>> > > > wrote:
>> > > > >
>> > > > >> I agree that avoiding name which starts with “print”
is better.
>> > > > >>
>> > > > >> Regards,
>> > > > >> Chiwan Park
>> > > > >>
>> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels <
>> mxm@apache.org>
>> > > > wrote:
>> > > > >> >
>> > > > >> > +1 for printOnTaskManager()
>> > > > >> >
>> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian <
>> > > > >> Sebastian.Kruse@hpi.de>
>> > > > >> > wrote:
>> > > > >> >
>> > > > >> >> Thanks, for your quick responses!
>> > > > >> >>
>> > > > >> >> I also think that renaming the old print method
should do the
>> > > trick.
>> > > > As
>> > > > >> a
>> > > > >> >> contribution to your brainstorming for a name, I
propose
>> > > > >> logOnTaskManager()
>> > > > >> >> ;)
>> > > > >> >>
>> > > > >> >> Cheers,
>> > > > >> >> Sebastian
>> > > > >> >>
>> > > > >> >> -----Original Message-----
>> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com]
>> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34
>> > > > >> >> To: dev@flink.apache.org
>> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()"
>> > > > >> >>
>> > > > >> >> As I said, the common print prefix might indicate
eager
>> > execution.
>> > > > >> >>
>> > > > >> >> I know that writeToTaskManagerStdOut() is quite
bulky, but we
>> > > should
>> > > > >> make
>> > > > >> >> the difference in the behavior very clear, IMO.
>> > > > >> >>
>> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen <sewen@apache.org>:
>> > > > >> >>
>> > > > >> >>> Actually, there is a method "print(String prefix)"
which still
>> > > goes
>> > > > to
>> > > > >> >>> the sysout of where the job is executed.
>> > > > >> >>>
>> > > > >> >>> Let's give that one the name "printOnTaskManager()"
and then
>> we
>> > > > should
>> > > > >> >>> have it...
>> > > > >> >>>
>> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske
<
>> > fhueske@gmail.com
>> > > >
>> > > > >> >> wrote:
>> > > > >> >>>
>> > > > >> >>>> I would avoid to call it printXYZ, since
print()'s behavior
>> > > changed
>> > > > >> >>>> to eager execution.
>> > > > >> >>>>
>> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger
<
>> rmetzger@apache.org
>> > >:
>> > > > >> >>>>
>> > > > >> >>>>> Okay, you are right, local is actually
confusing.
>> > > > >> >>>>> I'm against introducing "worker" as
a term in the API. Its
>> > still
>> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()"
?
>> > > > >> >>>>>
>> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian
Hueske <
>> > > fhueske@gmail.com
>> > > > >
>> > > > >> >>>> wrote:
>> > > > >> >>>>>
>> > > > >> >>>>>> +1 for both.
>> > > > >> >>>>>>
>> > > > >> >>>>>> printLocal() might not be the best
name, because "local" is
>> > not
>> > > > >> >>>>>> well defined and could also be understood
as the local
>> > machine
>> > > > >> >>>>>> of the
>> > > > >> >>> user.
>> > > > >> >>>>>> How about naming the method completely
different
>> > > > >> >>>> (writeToWorkerStdOut()?)
>> > > > >> >>>>>> to make sure users are not confused
with eager and lazy
>> > > > execution?
>> > > > >> >>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert
Metzger <
>> > rmetzger@apache.org
>> > > >:
>> > > > >> >>>>>>
>> > > > >> >>>>>>> Hi Sebastian,
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> thank you for the feedback.
I agree that both variants
>> have
>> > a
>> > > > >> >>>>>>> right
>> > > > >> >>>> to
>> > > > >> >>>>>>> exist.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> I would vote for adding another
method to the DataSet
>> called
>> > > > >> >>>>>> "printLocal()"
>> > > > >> >>>>>>> that has the old behavior.
>> > > > >> >>>>>>>
>> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01
PM, Kruse, Sebastian <
>> > > > >> >>>>>> Sebastian.Kruse@hpi.de>
>> > > > >> >>>>>>> wrote:
>> > > > >> >>>>>>>
>> > > > >> >>>>>>>> Hi everyone,
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I am a bit worried about
that recent change of the
>> print()
>> > > > >> >>> method.
>> > > > >> >>>> I
>> > > > >> >>>>>> can
>> > > > >> >>>>>>>> understand the rationale
that obtaining the stdout from
>> all
>> > > > >> >>>>>>>> the taskmanagers is cumbersome
(although, for local
>> > > > >> >>>>>>>> debugging the old
>> > > > >> >>>>>> print()
>> > > > >> >>>>>>>> was fine).
>> > > > >> >>>>>>>> However, a major problem,
I see with the new print(), is,
>> > > > >> >>>>>>>> that
>> > > > >> >>> now
>> > > > >> >>>>> you
>> > > > >> >>>>>>> can
>> > > > >> >>>>>>>> only have one print() per
plan, as the plan is directly
>> > > > >> >>>>>>>> executed
>> > > > >> >>> as
>> > > > >> >>>>>> soon
>> > > > >> >>>>>>> as
>> > > > >> >>>>>>>> print() is invoked. If you
regard print() as a debugging
>> > > > >> >>>>>>>> means,
>> > > > >> >>>> this
>> > > > >> >>>>>> is a
>> > > > >> >>>>>>>> severe restriction.
>> > > > >> >>>>>>>> I see use cases for both
print() implementations, but I
>> > > > >> >>>>>>>> would at
>> > > > >> >>>>> least
>> > > > >> >>>>>>>> provide some kind of backwards
compatibility, be at a
>> > > > >> >>>>>>>> parameter
>> > > > >> >>> or
>> > > > >> >>>> a
>> > > > >> >>>>>>>> legacyPrint() method or
anything else. As I assume
>> print()
>> > > > >> >>>>>>>> to be
>> > > > >> >>>> very
>> > > > >> >>>>>>>> frequently used, a lot of
existing programs would benefit
>> > > > >> >>>>>>>> from
>> > > > >> >>> this
>> > > > >> >>>>> and
>> > > > >> >>>>>>>> might otherwise not be directly
portable to newer Flink
>> > > > >> >> versions.
>> > > > >> >>>>> What
>> > > > >> >>>>>> do
>> > > > >> >>>>>>>> you think?
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> Cheers,
>> > > > >> >>>>>>>> Sebastian
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> -----Original Message-----
>> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org]
>> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai
2015 11:12
>> > > > >> >>>>>>>> To: dev@flink.apache.org
>> > > > >> >>>>>>>> Subject: Re: Changed the
behavior of "DataSet.print()"
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> I've filed a JIRA to update
the documentation:
>> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>> On Fri, May 22, 2015 at
11:08 AM, Stephan Ewen
>> > > > >> >>>>>>>> <sewen@apache.org
>> > > > >> >>>>
>> > > > >> >>>>>> wrote:
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>>> Hi all!
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Me merged a patch yesterday
that changed the API
>> behavior
>> > > > >> >>>>>>>>> of
>> > > > >> >>> the
>> > > > >> >>>>>>>>> "DataSet.print()" function.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> "print()" now prints
to stdout on the client process,
>> > > > >> >>>>>>>>> rather
>> > > > >> >>> than
>> > > > >> >>>>> the
>> > > > >> >>>>>>>>> TaskManager process,
as before. This is much nicer for
>> > > > >> >>> debugging
>> > > > >> >>>>> and
>> > > > >> >>>>>>>>> exploring data sets.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> One implication of this
is that print() is now an eager
>> > > > >> >>>>>>>>> method
>> > > > >> >>> (
>> > > > >> >>>>> like
>> > > > >> >>>>>>>>> collect() or count()
). That means that calling
>> "print()"
>> > > > >> >>>>> immediately
>> > > > >> >>>>>>>>> triggers the execution,
and no "env.execute()" is
>> required
>> > > > >> >>>>>>>>> any
>> > > > >> >>>>> more.
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>> Greetings,
>> > > > >> >>>>>>>>> Stephan
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>>
>> > > > >> >>>>>>>>
>> > > > >> >>>>>>>
>> > > > >> >>>>>>
>> > > > >> >>>>>
>> > > > >> >>>>
>> > > > >> >>>
>> > > > >> >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > > >>
>> > > >
>> > >
>> >
>>

Mime
View raw message