beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kennknowles <>
Subject [GitHub] incubator-beam pull request #756: Replace ParDo with MapElements and FlatMap...
Date Fri, 29 Jul 2016 02:54:32 GMT
GitHub user kennknowles opened a pull request:

    Replace ParDo with MapElements and FlatMapElements where possible

    Be sure to do all of the following to help us incorporate your contribution
    quickly and easily:
     - [ ] Make sure the PR title is formatted like:
       `[BEAM-<Jira issue #>] Description of pull request`
     - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
           Travis-CI on your fork and ensure the whole test matrix passes).
     - [ ] Replace `<Jira issue #>` in the title with the actual Jira issue
           number, if there is one.
     - [ ] If this contribution is large, please file an Apache
           [Individual Contributor License Agreement](
    The commits ended up having fairly separate topics, but can be reviewed individually or
as a medium-sized change.
    1. The first commit replaces `ParDo` with `MapElements` and `FlatMapElements` where it
is easy to do so.
    2. While debugging, I noticed that `DoFn` used a less-powerful form of `TypeDescriptor`
and switched trivially to the enhanced version.
    3. The root cause of issues with `MapElements` and `FlatMapElements` was a lack of use
of the input type descriptor. Making it available involved a moderate refactor. In the process
I broke some tests to do with display data and fixed them plus enhancements to display data
for `SimpleFunction`.
    If reviewers insist, I can try to alter this commit history.
    R: @bjchambers AND @swegner 

You can merge this pull request into a Git repository by running:

    $ git pull map-flatmap

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #756
commit b041197382f6a4ea5f6ad93f5e6f32aa1212937f
Author: Kenneth Knowles <>
Date:   2016-07-27T21:23:15Z

    Replace ParDo with simpler transforms where possible
    There are a number of places in the Java SDK where we use
    ParDo.of(DoFn) when MapElements or other higher-level
    composites are applicable and readable. This change
    alters a number of those.

commit 2b28a87cd9b39e145e6bfcd0b04ed63221dad271
Author: Kenneth Knowles <>
Date:   2016-07-29T01:44:39Z

    Make DoFn use instance-based TypeDescriptor

commit 5a95226719831e19f86703ac9838bbb5ec2c2362
Author: Kenneth Knowles <>
Date:   2016-07-29T01:47:04Z

    Use input type in coder inference for MapElements and FlatMapElements
    Previously, the input TypeDescriptor was unknown, so we would fail
    to infer a coder for things like MapElements.of(SimpleFunction<T, T>)
    even if the input PCollection provided a coder for T.
    Now, the input type is plumbed appropriately and the coder is inferred.
    This required internal changes to explicitly support good display data.
    While doing this, I just added display data to SimpleFunction by analogy
    with DoFn.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message