beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (BEAM-1164) Allow a DoFn to opt in to mutating it's input
Date Thu, 30 Mar 2017 17:22:41 GMT

     [ https://issues.apache.org/jira/browse/BEAM-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Halperin updated BEAM-1164:
----------------------------------
    Issue Type: New Feature  (was: Bug)

> Allow a DoFn to opt in to mutating it's input
> ---------------------------------------------
>
>                 Key: BEAM-1164
>                 URL: https://issues.apache.org/jira/browse/BEAM-1164
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Frances Perry
>            Priority: Minor
>
> Runners generally can't tell if a DoFn is mutating inputs, but assuming so by default
leads to significant performance implications from unnecessary copying (around sibling fusion,
etc). So instead the model prevents mutating inputs, and the Direct Runner validates this
behavior. (See: http://beam.incubator.apache.org/contribute/design-principles/#make-efficient-things-easy-rather-than-make-easy-things-efficient)

> However, if users are processing a small number of large records by making incremental
changes (for example, genomics use cases), the cost of immutability requirement can be very
large. As a workaround, users sometimes do suboptimal things (fusing ParDos by hand) or undefined
things when they expect the immutability requirement is unnecessarily strict (adding no-op
coders in places they hope the runner won't be materializing things, mutating things anyway
when they don't expect sibling fusion to happen, etc).
> We should consider adding a signal (MutatingDoFn?) that users explicitly opt in to to
say their code may mutate inputs. The runner can then use this assumption to either prevent
optimizations that would break in the face of this or insert additional copies as needed to
allow optimizations to preserve semantics.
> See this related user@ discussion:
> https://lists.apache.org/thread.html/f39689f54147117f3fc54c498eff1a20fa73f1be5b5cad5b6f816fd3@%3Cuser.beam.apache.org%3E



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message