beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jkff <...@git.apache.org>
Subject [GitHub] beam pull request #3634: [BEAM-1820] Source.getDefaultOutputCoder() throws C...
Date Tue, 25 Jul 2017 00:06:56 GMT
GitHub user jkff opened a pull request:

    https://github.com/apache/beam/pull/3634

    [BEAM-1820] Source.getDefaultOutputCoder() throws CannotProvideCoderException

    This is based on https://github.com/apache/beam/pull/3549 by @lgajowy, but it turned out
that a lot more needs to be done to fix it properly, and the necessary changes require knowledge
of sufficiently dark corners of the SDK and runners that asking an external contributor to
do this is unfair, so I did the changes myself.
    
    TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to succeed.
    
    Source.getDefaultOutputCoder is only a hint for inferring the Coder of its elements, and
it should not be required to succeed.
    
    E.g. when a runner is replacing a Read.from(source) transform, and the override needs
to know a coder for elements of the source, if the source doesn't provide a coder, the user
may have set a coder on the returned PCollection explicitly. In this case, the runner should
use that coder.
    
    In other cases, when an API uses a Source and needs its coder, it must let the caller
provide a Coder explicitly (e.g. SourceTestUtils).
    
    I am treating this as a backwards-compatible change because Source.getDefaultOutputCoder()
is a runner-facing rather than user-facing API. From the point of view of a user implementing
a Source, adding a throws to the base class method signature is compatible.
    
    R: @tgroh 
    CC: @lgajowy


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkff/incubator-beam source-coder-fallback

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3634.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3634
    
----
commit 830c5657736ffe49e06e16039f947e7227fa45a7
Author: Eugene Kirpichov <kirpichov@google.com>
Date:   2017-07-24T23:38:27Z

    [BEAM-1820] Source.getDefaultOutputCoder() throws CannotProvideCoderException
    
    This is based on https://github.com/apache/beam/pull/3549 by @lgajowy,
    but it turned out that a lot more needs to be done to fix it properly,
    and the necessary changes require knowledge of sufficiently dark
    corners of the SDK and runners that asking an external contributor
    to do this is unfair, so I did the changes myself.
    
    TL;DR: callers of Source.getDefaultOutputCoder shouldn't require it to succeed.
    
    Source.getDefaultOutputCoder is only a hint for inferring the Coder
    of its elements, and it should not be required to succeed.
    
    E.g. when a runner is replacing a Read.from(source) transform,
    and the override needs to know a coder for elements of the source,
    if the source doesn't provide a coder, the user may have set
    a coder on the returned PCollection explicitly. In this case,
    the runner should use that coder.
    
    In other cases, when an API uses a Source and needs its coder,
    it must let the caller provide a Coder explicitly (e.g.
    SourceTestUtils).

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message