beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sourabh Bajaj <sourabhba...@google.com>
Subject Re: [Python] Replace ParDo's DoFn after both are constructed
Date Fri, 21 Jul 2017 21:10:53 GMT
Hi,

Is it possible to create

class ErrorSieve(PTransform):
   def __init__ (dofn):
   def expand():
         return ParDo(modifiedDoFn)

that way your pipeline just looks like p | ErrorSieve(DoFn()) and you don't
expose the ParDo to the user.

Will this work for your usecase?

-Sourabh

On Fri, Jul 21, 2017 at 2:06 PM Dmitry Demeshchuk <dmitry@postmates.com>
wrote:

> Hi list,
>
> I'm trying to make a transformation function (let's call it ErrorSieve)
> that would take a ParDo object as input and modify its underlying DoFn
> object, basically adding extra logic on top of an underlying process()
> method.
>
> Ideally for me, the example usage would be:
>
> ```python
> p | ErrorSieve(beam.ParDo(MyDoFn())
>
> or
>
> p | ErrorSieve(beam.FlatMap(lambda x: x + 1))
> ```
>
> However, this would require me to butcher the internals of ParDo
> mechanisms, especially since ParDo's make_fn() method gets called during
> its transformation. My other thinking was to make it a fair and square DoFn:
>
> ```python
> p | beam.ParDo(ErrorSieve(MyDoFn())
> ```
>
> The only problem with this is that I can't use it with transforms like
> FlatMap, which is a bit unfortunate.
>
> Do you think it's worth investigating how to implement the first approach,
> or should I just instead settle with the second approach, using only custom
> DoFns?
>
> Thank you.
>
>
> --
> Best regards,
> Dmitry Demeshchuk.
>

Mime
View raw message