beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-53) PubSubIO: reimplement in Java
Date Mon, 28 Mar 2016 22:04:25 GMT


ASF GitHub Bot commented on BEAM-53:

GitHub user mshields822 opened a pull request:

    [BEAM-53] Java-only pub/sub source and sink (streaming only) 

    First step towards supporting pub/sub i/o in any Java runner.
     - No integration tests yet.
     - No unit tests for the source and sink. Propose we support mocking PubsubGrpcClient.
     - Depends on grpc-pubsub-v1 which is about to be renamed.
     - Only supports 'application default' credentials, and ignores any GcsOptions flags.
     - Not yet wired into the 'deault' PubsubIO implementations.
     - Watermark tracking is heuristic and may introduce late data.
    But other than that we're ready to go.

You can merge this pull request into a Git repository by running:

    $ git pull pubsub

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #85
commit dbc39a605bf09f2d1f1cae05de4caaf176abe6c1
Author: Mark Shields <>
Date:   2016-03-28T20:30:37Z

    Initial import

commit 4fa0b3d98c302c8b710106610b5aca0721436f87
Author: Mark Shields <>
Date:   2016-03-28T20:37:23Z

    Initial import II

commit 4ba8f304821a09af12e4fd8703d64c16b1349256
Author: Mark Shields <>
Date:   2016-03-28T21:58:49Z

    Formatting busywork


> PubSubIO: reimplement in Java
> -----------------------------
>                 Key: BEAM-53
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-core
>            Reporter: Daniel Halperin
>            Assignee: Mark Shields
> PubSubIO is currently only partially implemented in Java: the DirectPipelineRunner uses
a non-scalable API in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path implemented
in the Google Cloud Dataflow service.
> We need to reimplement PubSubIO in Java in order to support other runners in a scalable
> Additionally, we can take this opportunity to add new features:
> * getting timestamp from an arbitrary lambda in arbitrary formats rather than from a
message attribute in only 2 formats.
> * exposing metadata and attributes in the elements produced by PubSubIO.Read
> * setting metadata and attributes in the messages written by PubSubIO.Write

This message was sent by Atlassian JIRA

View raw message