Return-Path: X-Original-To: apmail-flink-dev-archive@www.apache.org Delivered-To: apmail-flink-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id E0894185F6 for ; Tue, 2 Jun 2015 15:24:10 +0000 (UTC) Received: (qmail 24475 invoked by uid 500); 2 Jun 2015 15:24:10 -0000 Delivered-To: apmail-flink-dev-archive@flink.apache.org Received: (qmail 24411 invoked by uid 500); 2 Jun 2015 15:24:10 -0000 Mailing-List: contact dev-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@flink.apache.org Delivered-To: mailing list dev@flink.apache.org Received: (qmail 24394 invoked by uid 99); 2 Jun 2015 15:24:10 -0000 Received: from mail-relay.apache.org (HELO mail-relay.apache.org) (140.211.11.15) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 02 Jun 2015 15:24:10 +0000 Received: from mail-vn0-f43.google.com (mail-vn0-f43.google.com [209.85.216.43]) by mail-relay.apache.org (ASF Mail Server at mail-relay.apache.org) with ESMTPSA id 2A5BC1A010F for ; Tue, 2 Jun 2015 15:24:10 +0000 (UTC) Received: by vnbg7 with SMTP id g7so21102701vnb.12 for ; Tue, 02 Jun 2015 08:24:08 -0700 (PDT) MIME-Version: 1.0 X-Received: by 10.52.71.203 with SMTP id x11mr38117231vdu.48.1433258648972; Tue, 02 Jun 2015 08:24:08 -0700 (PDT) Received: by 10.52.166.161 with HTTP; Tue, 2 Jun 2015 08:24:08 -0700 (PDT) In-Reply-To: References: <979DB9666496EA4AA881B4E447D304000BAC89D8@MXMA2012.hpi.uni-potsdam.de> <979DB9666496EA4AA881B4E447D304000BAC8AC2@MXMA2012.hpi.uni-potsdam.de> Date: Tue, 2 Jun 2015 17:24:08 +0200 Message-ID: Subject: Re: Changed the behavior of "DataSet.print()" From: Aljoscha Krettek To: dev@flink.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable By the way, we also should rename the corresponding Streaming API method accordingly. On Tue, Jun 2, 2015 at 3:24 PM, Maximilian Michels wrote: > +1 for printOnTaskManager(prefix) > > On Tue, Jun 2, 2015 at 1:54 PM, Kostas Tzoumas wrot= e: > >> +1 for printOnTaskManager(prefix) >> >> On Tue, Jun 2, 2015 at 1:35 PM, Till Rohrmann >> wrote: >> >> > +1 for printOnTaskManager(prefix) >> > >> > On Tue, Jun 2, 2015 at 12:08 PM, Fabian Hueske >> wrote: >> > >> > > +1 for writeToWorkerStdOut(prefix) >> > > On Jun 2, 2015 11:42, "Aljoscha Krettek" wrote= : >> > > >> > > > +1 for printOnTaskManager(prefix) >> > > > >> > > > On Tue, Jun 2, 2015 at 11:35 AM, Robert Metzger > > >> > > > wrote: >> > > > > I would like to reach consensus on this before the 0.9 release. >> > > > > >> > > > > So far we have the following ideas: >> > > > > >> > > > > writeToWorkerStdOut(prefix) >> > > > > printOnTaskManager(prefix) (+1) >> > > > > logOnTaskManager(prefix) >> > > > > >> > > > > I'm against logOnTM because we are not logging the output, we ar= e >> > > writing >> > > > > or printing it. >> > > > > >> > > > > >> > > > > *I would vote for deprecating "print(prefix)" and adding >> > > > > "writeToWorkerStdOut(prefix)"* >> > > > > >> > > > > >> > > > > >> > > > > On Thu, May 28, 2015 at 5:00 PM, Chiwan Park < >> chiwanpark@icloud.com> >> > > > wrote: >> > > > > >> > > > >> I agree that avoiding name which starts with =E2=80=9Cprint=E2= =80=9D is better. >> > > > >> >> > > > >> Regards, >> > > > >> Chiwan Park >> > > > >> >> > > > >> > On May 28, 2015, at 11:35 PM, Maximilian Michels < >> mxm@apache.org> >> > > > wrote: >> > > > >> > >> > > > >> > +1 for printOnTaskManager() >> > > > >> > >> > > > >> > On Thu, May 28, 2015 at 2:53 PM, Kruse, Sebastian < >> > > > >> Sebastian.Kruse@hpi.de> >> > > > >> > wrote: >> > > > >> > >> > > > >> >> Thanks, for your quick responses! >> > > > >> >> >> > > > >> >> I also think that renaming the old print method should do th= e >> > > trick. >> > > > As >> > > > >> a >> > > > >> >> contribution to your brainstorming for a name, I propose >> > > > >> logOnTaskManager() >> > > > >> >> ;) >> > > > >> >> >> > > > >> >> Cheers, >> > > > >> >> Sebastian >> > > > >> >> >> > > > >> >> -----Original Message----- >> > > > >> >> From: Fabian Hueske [mailto:fhueske@gmail.com] >> > > > >> >> Sent: Donnerstag, 28. Mai 2015 14:34 >> > > > >> >> To: dev@flink.apache.org >> > > > >> >> Subject: Re: Changed the behavior of "DataSet.print()" >> > > > >> >> >> > > > >> >> As I said, the common print prefix might indicate eager >> > execution. >> > > > >> >> >> > > > >> >> I know that writeToTaskManagerStdOut() is quite bulky, but w= e >> > > should >> > > > >> make >> > > > >> >> the difference in the behavior very clear, IMO. >> > > > >> >> >> > > > >> >> 2015-05-28 14:29 GMT+02:00 Stephan Ewen : >> > > > >> >> >> > > > >> >>> Actually, there is a method "print(String prefix)" which st= ill >> > > goes >> > > > to >> > > > >> >>> the sysout of where the job is executed. >> > > > >> >>> >> > > > >> >>> Let's give that one the name "printOnTaskManager()" and the= n >> we >> > > > should >> > > > >> >>> have it... >> > > > >> >>> >> > > > >> >>> On Thu, May 28, 2015 at 2:13 PM, Fabian Hueske < >> > fhueske@gmail.com >> > > > >> > > > >> >> wrote: >> > > > >> >>> >> > > > >> >>>> I would avoid to call it printXYZ, since print()'s behavio= r >> > > changed >> > > > >> >>>> to eager execution. >> > > > >> >>>> >> > > > >> >>>> 2015-05-28 14:10 GMT+02:00 Robert Metzger < >> rmetzger@apache.org >> > >: >> > > > >> >>>> >> > > > >> >>>>> Okay, you are right, local is actually confusing. >> > > > >> >>>>> I'm against introducing "worker" as a term in the API. It= s >> > still >> > > > >> >>>>> called "TaskManager". Maybe "printOnTaskManager()" ? >> > > > >> >>>>> >> > > > >> >>>>> On Thu, May 28, 2015 at 2:06 PM, Fabian Hueske < >> > > fhueske@gmail.com >> > > > > >> > > > >> >>>> wrote: >> > > > >> >>>>> >> > > > >> >>>>>> +1 for both. >> > > > >> >>>>>> >> > > > >> >>>>>> printLocal() might not be the best name, because "local"= is >> > not >> > > > >> >>>>>> well defined and could also be understood as the local >> > machine >> > > > >> >>>>>> of the >> > > > >> >>> user. >> > > > >> >>>>>> How about naming the method completely different >> > > > >> >>>> (writeToWorkerStdOut()?) >> > > > >> >>>>>> to make sure users are not confused with eager and lazy >> > > > execution? >> > > > >> >>>>>> >> > > > >> >>>>>> >> > > > >> >>>>>> 2015-05-28 13:44 GMT+02:00 Robert Metzger < >> > rmetzger@apache.org >> > > >: >> > > > >> >>>>>> >> > > > >> >>>>>>> Hi Sebastian, >> > > > >> >>>>>>> >> > > > >> >>>>>>> thank you for the feedback. I agree that both variants >> have >> > a >> > > > >> >>>>>>> right >> > > > >> >>>> to >> > > > >> >>>>>>> exist. >> > > > >> >>>>>>> >> > > > >> >>>>>>> I would vote for adding another method to the DataSet >> called >> > > > >> >>>>>> "printLocal()" >> > > > >> >>>>>>> that has the old behavior. >> > > > >> >>>>>>> >> > > > >> >>>>>>> On Thu, May 28, 2015 at 1:01 PM, Kruse, Sebastian < >> > > > >> >>>>>> Sebastian.Kruse@hpi.de> >> > > > >> >>>>>>> wrote: >> > > > >> >>>>>>> >> > > > >> >>>>>>>> Hi everyone, >> > > > >> >>>>>>>> >> > > > >> >>>>>>>> I am a bit worried about that recent change of the >> print() >> > > > >> >>> method. >> > > > >> >>>> I >> > > > >> >>>>>> can >> > > > >> >>>>>>>> understand the rationale that obtaining the stdout fro= m >> all >> > > > >> >>>>>>>> the taskmanagers is cumbersome (although, for local >> > > > >> >>>>>>>> debugging the old >> > > > >> >>>>>> print() >> > > > >> >>>>>>>> was fine). >> > > > >> >>>>>>>> However, a major problem, I see with the new print(), = is, >> > > > >> >>>>>>>> that >> > > > >> >>> now >> > > > >> >>>>> you >> > > > >> >>>>>>> can >> > > > >> >>>>>>>> only have one print() per plan, as the plan is directl= y >> > > > >> >>>>>>>> executed >> > > > >> >>> as >> > > > >> >>>>>> soon >> > > > >> >>>>>>> as >> > > > >> >>>>>>>> print() is invoked. If you regard print() as a debuggi= ng >> > > > >> >>>>>>>> means, >> > > > >> >>>> this >> > > > >> >>>>>> is a >> > > > >> >>>>>>>> severe restriction. >> > > > >> >>>>>>>> I see use cases for both print() implementations, but = I >> > > > >> >>>>>>>> would at >> > > > >> >>>>> least >> > > > >> >>>>>>>> provide some kind of backwards compatibility, be at a >> > > > >> >>>>>>>> parameter >> > > > >> >>> or >> > > > >> >>>> a >> > > > >> >>>>>>>> legacyPrint() method or anything else. As I assume >> print() >> > > > >> >>>>>>>> to be >> > > > >> >>>> very >> > > > >> >>>>>>>> frequently used, a lot of existing programs would bene= fit >> > > > >> >>>>>>>> from >> > > > >> >>> this >> > > > >> >>>>> and >> > > > >> >>>>>>>> might otherwise not be directly portable to newer Flin= k >> > > > >> >> versions. >> > > > >> >>>>> What >> > > > >> >>>>>> do >> > > > >> >>>>>>>> you think? >> > > > >> >>>>>>>> >> > > > >> >>>>>>>> Cheers, >> > > > >> >>>>>>>> Sebastian >> > > > >> >>>>>>>> >> > > > >> >>>>>>>> -----Original Message----- >> > > > >> >>>>>>>> From: Robert Metzger [mailto:rmetzger@apache.org] >> > > > >> >>>>>>>> Sent: Dienstag, 26. Mai 2015 11:12 >> > > > >> >>>>>>>> To: dev@flink.apache.org >> > > > >> >>>>>>>> Subject: Re: Changed the behavior of "DataSet.print()" >> > > > >> >>>>>>>> >> > > > >> >>>>>>>> I've filed a JIRA to update the documentation: >> > > > >> >>>>>>>> https://issues.apache.org/jira/browse/FLINK-2092 >> > > > >> >>>>>>>> >> > > > >> >>>>>>>> On Fri, May 22, 2015 at 11:08 AM, Stephan Ewen >> > > > >> >>>>>>>> > > > > >> >>>> >> > > > >> >>>>>> wrote: >> > > > >> >>>>>>>> >> > > > >> >>>>>>>>> Hi all! >> > > > >> >>>>>>>>> >> > > > >> >>>>>>>>> Me merged a patch yesterday that changed the API >> behavior >> > > > >> >>>>>>>>> of >> > > > >> >>> the >> > > > >> >>>>>>>>> "DataSet.print()" function. >> > > > >> >>>>>>>>> >> > > > >> >>>>>>>>> "print()" now prints to stdout on the client process, >> > > > >> >>>>>>>>> rather >> > > > >> >>> than >> > > > >> >>>>> the >> > > > >> >>>>>>>>> TaskManager process, as before. This is much nicer fo= r >> > > > >> >>> debugging >> > > > >> >>>>> and >> > > > >> >>>>>>>>> exploring data sets. >> > > > >> >>>>>>>>> >> > > > >> >>>>>>>>> One implication of this is that print() is now an eag= er >> > > > >> >>>>>>>>> method >> > > > >> >>> ( >> > > > >> >>>>> like >> > > > >> >>>>>>>>> collect() or count() ). That means that calling >> "print()" >> > > > >> >>>>> immediately >> > > > >> >>>>>>>>> triggers the execution, and no "env.execute()" is >> required >> > > > >> >>>>>>>>> any >> > > > >> >>>>> more. >> > > > >> >>>>>>>>> >> > > > >> >>>>>>>>> Greetings, >> > > > >> >>>>>>>>> Stephan >> > > > >> >>>>>>>>> >> > > > >> >>>>>>>>> >> > > > >> >>>>>>>> >> > > > >> >>>>>>> >> > > > >> >>>>>> >> > > > >> >>>>> >> > > > >> >>>> >> > > > >> >>> >> > > > >> >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> >> > > > >> > > >> > >>