beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-65) SplittableDoFn
Date Fri, 07 Apr 2017 19:12:41 GMT

    [ https://issues.apache.org/jira/browse/BEAM-65?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961303#comment-15961303
] 

ASF GitHub Bot commented on BEAM-65:
------------------------------------

GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/2462

    [BEAM-65] Adds HasDefaultTracker for RestrictionTracker inference

    Allows a restriction type to implement HasDefaultTracker, in that case the splittable
DoFn itself does not need to implement NewTracker - only ProcessElement and GetInitialRestriction.
    
    R: @tgroh
    
    (this is less urgent to review than https://github.com/apache/beam/pull/2455 - just nice
to have)

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam auto-tracker

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/2462.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2462
    
----
commit aa2f643a6a03ca1c1ac12873738219ed130edea5
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-04-07T19:09:47Z

    [BEAM-65] Adds HasDefaultTracker for RestrictionTracker inference
    
    Allows a restriction type to implement HasDefaultTracker,
    in that case the splittable DoFn itself does not need to
    implement NewTracker - only ProcessElement and GetInitialRestriction.

----


> SplittableDoFn
> --------------
>
>                 Key: BEAM-65
>                 URL: https://issues.apache.org/jira/browse/BEAM-65
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Daniel Halperin
>            Assignee: Eugene Kirpichov
>            Priority: Minor
>
> SplittableDoFn is a proposed enhancement for "dynamically splittable work" to the Beam
model.
> Among other things, it would allow a unified implementation of bounded/unbounded sources
with dynamic work rebalancing and the ability to express multiple scalable steps (e.g., global
expansion -> file sizing & parsing -> splitting files into independently-processable
blocks) via composition rather than inheritance.
> This would make it much easier to implement many types of sources, to modify and reuse
existing sources. Also, it would improve scalability of the Beam model by moving things like
splitting a source from the control plane (where it is today -- glob -> List<FileBasedSource>
sent over service APIs) into the data plane (PCollection<Glob> -> PCollection<FileName>
-> ...).



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message