beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Halperin (JIRA)" <j...@apache.org>
Subject [jira] [Created] (BEAM-65) SplittableDoFn
Date Thu, 25 Feb 2016 16:26:18 GMT
Daniel Halperin created BEAM-65:
-----------------------------------

             Summary: SplittableDoFn
                 Key: BEAM-65
                 URL: https://issues.apache.org/jira/browse/BEAM-65
             Project: Beam
          Issue Type: New Feature
          Components: beam-model
            Reporter: Daniel Halperin
            Assignee: Eugene Kirpichov
            Priority: Minor


SplittableDoFn is a proposed enhancement for "dynamically splittable work" to the Beam model.

Among other things, it would allow a unified implementation of bounded/unbounded sources with
dynamic work rebalancing and the ability to express multiple scalable steps (e.g., global
expansion -> file sizing & parsing -> splitting files into independently-processable
blocks) via composition rather than inheritance.

This would make it much easier to implement many types of sources, to modify and reuse existing
sources. Also, it would improve scalability of the Beam model by moving things like splitting
a source from the control plane (where it is today -- glob -> List<FileBasedSource>
sent over service APIs) into the data plane (PCollection<Glob> -> PCollection<FileName>
-> ...).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message