beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Baptiste Onofré ...@nanthrax.net>
Subject Re: Debugging pipelines
Date Sun, 20 Mar 2016 19:25:34 GMT
I agree with Davor, it's what I tried to express: even if a custom DoFn 
or PTransform is always possible, maybe it makes sense to provide such 
functions directly in the SDK.

Regards
JB

On 03/20/2016 08:13 PM, Davor Bonaci wrote:
> Of course, this functionality can be implemented manually in a DoFn --
> this wouldn't be a primitive. That said, an argument could be made that
> a Beam SDK should provide a composite that logs elements in a
> PCollection. This can further be augmented with proposals like
> "log_every_n_seconds" or "log_every_n_elements", etc. There's no such
> concept in the Java SDK right now, but can be easily constructed by users.
>
> Now, there are several counter-arguments as well:
>
>   * An approach where we provide a logging PTransform would be less
>     flexible / powerful than a DoFn + direct usage of a logging library.
>     Regardless, one might want to optimize the common, simple case.
>   * A runner could choose to sample elements of all PCollections to
>     provide this automatically, perhaps without any intervention from
>     the pipeline writer (and/or enable this behavior on-demand, while
>     the pipeline is running).
>
> Regarding runner portability, our thinking is that logging ought to work
> across runners, regardless of the choice of logging library. This is
> important to be able to use other, non-Beam related libraries, that
> neither we or the pipeline developers can control. Basically, a runner
> can set up logging bridges between major logging frameworks and standard
> streams. Then, the runner gets all logs from Beam SDK, user code, any
> random libraries that users may choose, etc. -- it all works transparently.
>
> On Sun, Mar 20, 2016 at 10:55 AM, Dan Halperin <dhalperi@google.com
> <mailto:dhalperi@google.com>> wrote:
>
>     Sorry for short phone email. Check out this Transform from a pull
>     request.
>
>     https://github.com/elibixby/DataflowJavaSDK/blob/51ee732964b3425c6c4a8677d135c41765d9bcdc/contrib/firebaseio/src/main/java/contrib/LogElements.java
>
>     The key subtleties are around making sure you don't log too
>     often--logging per element is very expensive.
>
>     Happy to discuss more if you have questions, and again sorry I'm on
>     a phone so can't provide a more comprehensive response.
>
>     On Sun, Mar 20, 2016 at 10:21 Dan Halperin <dhalperi@google.com
>     <mailto:dhalperi@google.com>> wrote:
>
>         Hi Ismael,
>
>         I think you can just do this with a normal DoFn. Why do you
>         think this needs to be a new primitive?
>         On Sun, Mar 20, 2016 at 10:20 Dan Halperin <dhalperi@google.com
>         <mailto:dhalperi@google.com>> wrote:
>
>             Hi is
>             On Sun, Mar 20, 2016 at 08:18 Jean-Baptiste Onofré
>             <jb@nanthrax.net <mailto:jb@nanthrax.net>> wrote:
>
>                 Hi,
>
>                 thanks for the update.
>
>                 IMHO, I would name Debug transform as Log:
>
>                 .apply(Log.withLevel("DEBUG"))
>                 .apply(Log.withLevel("INFO").withPattern("%d %m ..."))
>                 .apply(Log.withLevel("WARN").withMessage("Foo").withStream("System.out")
>
>                 It would more flexible and related to the actual behavior.
>
>                 I would mimic a bit the Camel log component for instance.
>
>                 If you don't mind, I will do it with you.
>
>                 Thanks
>                 Regards
>                 JB
>
>                 On 03/20/2016 12:07 PM, Ismaël Mejía wrote:
>                  > Hi,
>                  >
>                  > The code of the transform is here in a playground for
>                 Beam experiments I
>                  > created (it is a bit alpha for the moment, and it
>                 does not have comments):
>                  >
>                  >
>                 https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java
>                  >
>                  > Since my initial goal was more of a test scenario in the
>                  > DirectPipelineRunner I haven't considered yet more
>                 advanced logging
>                  > capabilities and the possible issues of distribution
>                 (serialization, in
>                  > particular of dependencies, as well as exceptions,
>                 etc), but of course
>                  > it is something I expect to improve if there is
>                 interest. Do you see
>                  > some immediate things to improve to try it with the
>                 distributed runners
>                  > (I want to do this, as a excuse also to  try the
>                 FlinkRunner).
>                  >
>                  > Best,
>                  > -Ismael
>                  >
>                  >
>                  > On Sun, Mar 20, 2016 at 11:13 AM, Jean-Baptiste
>                 Onofré <jb@nanthrax.net <mailto:jb@nanthrax.net>
>                  > <mailto:jb@nanthrax.net <mailto:jb@nanthrax.net>>>
wrote:
>                  >
>                  >     By the way, for the "Integration" DSL, in
>                 addition of explicit debug
>                  >     transform, it would make sense to have an
>                 implicit "Tracer". It's
>                  >     something that I planned: it would allow us to
>                 have sampling on
>                  >     PCollection if the pipeline tracer is enabled
>                 (like we do in a Camel
>                  >     route with the tracer).
>                  >
>                  >     Regards
>                  >     JB
>                  >
>                  >     On 03/20/2016 10:14 AM, Ismaël Mejía wrote:
>                  >
>                  >         ​Hello,
>                  >
>                  >         I just started playing with Beam and I wanted
>                 to debug what happens
>                  >         between transforms in pipelines. I wrote a
>                 simple 'Debug'
>                  >         transform for
>                  >         this.
>                  >         The idea is to apply a function based on a
>                 predicate to any
>                  >         element in a
>                  >         collection without changing the collection,
>                 or in other words, a
>                  >         transform that
>                  >         does not transform but produces side effects.
>                  >
>                  >         The idea is better illustrated with this
>                 simple example:
>                  >
>                  >               .apply(FlatMapElements.via((String text) ->
>                  >         Arrays.asList(text.split(" ")))
>                  >                 .withOutputType(new
>                 TypeDescriptor<String>() {
>                  >                }))
>                  >               .apply(Debug
>                  >                 .when((String s) -> s.startsWith("A"))
>                  >                 .with((String s) -> {
>                  >                   System.out.println(s);
>                  >                   return null;
>                  >                 }));
>                  >               .apply(Filter.byPredicate((String text)
>                 -> text.length() > 5))
>                  >               .apply(Debug.print());  // sugared
>                 method, same as above
>                  >
>                  >         I think this can be useful (at least for
>                 debugging purposes), is
>                  >         there
>                  >         something
>                  >         like this already in the SDK ? If this is not
>                 the case, can you
>                  >         please
>                  >         give me some
>                  >         feedback/ideas to improve my transform.
>                  >
>                  >         Thanks,
>                  >         -Ismael
>                  >
>                  >         ps. You can find the code of the first
>                 version of the transform
>                  >         here:
>                  >
>                 https://github.com/iemejia/beam-playground/blob/master/src/main/java/org/apache/beam/transforms/Debug.java
>                  >
>                  >
>                  >
>                  >     --
>                  >     Jean-Baptiste Onofré
>                  > jbonofre@apache.org <mailto:jbonofre@apache.org>
>                 <mailto:jbonofre@apache.org <mailto:jbonofre@apache.org>>
>                  > http://blog.nanthrax.net
>                  >     Talend - http://www.talend.com
>                  >
>                  >
>
>                 --
>                 Jean-Baptiste Onofré
>                 jbonofre@apache.org <mailto:jbonofre@apache.org>
>                 http://blog.nanthrax.net
>                 Talend - http://www.talend.com
>
>

-- 
Jean-Baptiste Onofré
jbonofre@apache.org
http://blog.nanthrax.net
Talend - http://www.talend.com

Mime
View raw message