beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkff <...@git.apache.org>
Subject [GitHub] beam pull request #3890: Introduces Reshuffle.viaRandomKey()
Date Fri, 22 Sep 2017 22:36:28 GMT
GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3890

    Introduces Reshuffle.viaRandomKey()

    It's a commonly used pattern for breaking fusion https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization
    
    viaRandomKey() only abstracts away the current commonly used pattern. It has the same
caveats as using Reshuffle.of() directly - the semantics are technically not guaranteed by
the Beam model, but it works in practice, and this is the pattern we keep recommending to
users.
    
    The naming is deliberately operational rather than semantic, to emphasize that we don't
have the semantics figured out, and the transform promises only that it expands into exactly
the sequence "pair with random key, reshuffle, drop key". The goal of this change is just
to reduce copy-paste.
    
    See prior discussion at https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E
    
    This change also converts several existing usages to use it, and adds another one in Match.
    
    R: @bjchambers 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam match-fusion-break

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3890
    
----
commit 0b4d801e4afb0be1463b196f419e1293265b68c1
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-09-22T22:24:36Z

    Introduces Reshuffle.viaRandomKey()
    
    It's a commonly used pattern for breaking fusion
    https://cloud.google.com/dataflow/service/dataflow-service-desc#fusion-optimization
    
    viaRandomKey() only abstracts away the current commonly used pattern.
    It has the same caveats as using Reshuffle.of() directly - the semantics
    are technically not guaranteed by the Beam model, but it works in
    practice, and this is the pattern we keep recommending to users.
    
    The naming is deliberately operational rather than semantic, to
    emphasize that we don't have the semantics figured out, and the
    transform promises only that it expands into exactly the sequence
    "pair with random key, reshuffle, drop key".
    The goal of this change is just to reduce copy-paste.
    
    See prior discussion at
    https://lists.apache.org/thread.html/ac34c9ac665a8d9f67b0254015e44c59ea65ecc1360d4014b95d3b2e@%3Cdev.beam.apache.org%3E
    
    This change also converts several existing usages to use it, and adds another
    one in Match.

----


---

Mime
View raw message