beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <>
Subject [jira] [Commented] (BEAM-53) PubSubIO: reimplement in Java
Date Tue, 17 May 2016 23:48:12 GMT


ASF GitHub Bot commented on BEAM-53:

GitHub user mshields822 opened a pull request:

    [BEAM-53] Wire PubsubUnbounded{Source,Sink} into PubsubIO

    This also refines the handling of record ids in the sink to be random-but-reused-on-failure,
using the same trick as we do for the BigQuery sink.
    Still need to re-do the load tests I did a few weeks back with the actual change.
    Note that last time I tested the DataflowPipelineTranslator does not kick in and replace
the new transforms with the correct native transforms. Need to dig deeper.
    R: @dhalperi 

You can merge this pull request into a Git repository by running:

    $ git pull pubsub-runner

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #346
commit ce655a6a7480147f7527fa23818e1d546abaa599
Author: Mark Shields <>
Date:   2016-04-12T00:36:27Z

    Make java unbounded pub/sub source the default.

commit aafcf5f9d0286ec5a6ed0d634df8ff0902897cdc
Author: Mark Shields <>
Date:   2016-05-17T23:44:30Z

    Refine record id calculation. Prepare for supporting unit tests with re-using record ids.


> PubSubIO: reimplement in Java
> -----------------------------
>                 Key: BEAM-53
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: runner-core
>            Reporter: Daniel Halperin
>            Assignee: Mark Shields
> PubSubIO is currently only partially implemented in Java: the DirectPipelineRunner uses
a non-scalable API in a single-threaded manner.
> In contrast, the DataflowPipelineRunner uses an entirely different code path implemented
in the Google Cloud Dataflow service.
> We need to reimplement PubSubIO in Java in order to support other runners in a scalable
> Additionally, we can take this opportunity to add new features:
> * getting timestamp from an arbitrary lambda in arbitrary formats rather than from a
message attribute in only 2 formats.
> * exposing metadata and attributes in the elements produced by PubSubIO.Read
> * setting metadata and attributes in the messages written by PubSubIO.Write

This message was sent by Atlassian JIRA

View raw message