beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kennknowles <>
Subject [GitHub] beam pull request #2618: [BEAM-1867] Use step-derived PCollection names in D...
Date Thu, 20 Apr 2017 23:03:09 GMT
GitHub user kennknowles opened a pull request:

    [BEAM-1867] Use step-derived PCollection names in Dataflow

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](
    R: @bjchambers 
    This mitigates an issue in Dataflow. I also removed some checked exceptions that are never
caught and probably never should be.
    I have empirically checked that the element counts and byte sizes are restored by this
change, and added unit tests to the translator. Integration tests TBD.

You can merge this pull request into a Git repository by running:

    $ git pull Dataflow-PCollection-names

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2618
commit 4c0bdd6c002b83c67daedd5e01ee2ad0dd47c233
Author: Kenneth Knowles <>
Date:   2017-04-20T21:32:29Z

    Make crashing errors in Structs unchecked exceptions

commit c9ed8f9a69d2b3f17e782f4bd0da9bd4305f2320
Author: Kenneth Knowles <>
Date:   2017-04-20T22:32:51Z

    Derive Dataflow output names from steps, not PCollection names
    Long ago, PCollection names were assigned after transform replacements took
    place, because this happened interleaved with pipeline construction. Now,
    runner-independent graphs are constructed with named PCollections and when
    replacements occur, the names are preserved. This exposed a bug in Dataflow
    whereby the names of steps and the names of PCollections are tightly coupled.
    This change uses the mandatory derived names during translation, shielding
    users from the bug.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message