beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2660) Set PubsubIO batch size using builder
Date Sun, 23 Jul 2017 01:43:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16097476#comment-16097476
] 

ASF GitHub Bot commented on BEAM-2660:
--------------------------------------

GitHub user cjmcgraw opened a pull request:

    https://github.com/apache/beam/pull/3619

    [BEAM-2660] Set PubsubIO batch size using builder

    BEAM-2660 asks for controlling batch size using the `PubsubIO.Write.Builder`
    
    This PR adds Two values configurable through the `PubsubIO.Write.Builder`:
    - `maxBatchSize` - controls the bulk batch request size
    - `maxBatchByteSize` - controls the bulk batch bytes request size
    
    In this PR I have also made a modification to the `PubsubIO.Write.PubsubBoundedWriter`.
Now the writer will dynamically track the number of bytes allocated for all messages. If the
number of bytes exceeds the threshold it will publish before adding more messages.
    
    If the message size exceeds the `maxBatchByteSize` then an exception will be thrown
    
    An example use case of the new parameter is:
    
    ```java
    PubsubIO.writeMessages()
        .withMaxBatchSize(100)
        .withMaxBatchByteSize(100000)
       .to("my-topic")
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cjmcgraw/beam update-pubsubIO

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/beam/pull/3619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #3619
    
----
commit 7eff3ea99da3fad85e10ac50c2b2bc6fec89a1fc
Author: Carl McGraw <carlm@accretivetg.com>
Date:   2017-07-22T22:30:40Z

    Added maxPublishBatchSize parameter to PubsubBoundedWriter class.

commit 95f23cd98c2008e0f5712ed68036bfb71caaa144
Author: Carl McGraw <carlm@accretivetg.com>
Date:   2017-07-23T00:30:18Z

    updated BoundedPubsubWriter to dynamically flush if queued messages exceed a pre-defined
maximum batch byte size

commit c2abeb926c71bf21bbcc9406986c340d2c9d63e0
Author: Carl McGraw <carlm@accretivetg.com>
Date:   2017-07-23T01:17:03Z

    updated UnboundedPubsubSink to accept new parameters.

----


> Set PubsubIO batch size using builder
> -------------------------------------
>
>                 Key: BEAM-2660
>                 URL: https://issues.apache.org/jira/browse/BEAM-2660
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-gcp
>            Reporter: Carl McGraw
>            Assignee: Stephen Sisk
>              Labels: gcp, java, pubsub, sdk
>
> PubsubIO doesn't allow users to set the publish batch size. Instead the value is hard
coded in both the BoundedPubsubWriter and the UnboundedPubsubSink. 
> google's pub/sub is bound to a maximum of 10mb per request size. My company has run into
problems with events that are individually smaller than 1mb, but when batched in the 100 or
2000 default batch sizes causes pubsub to fail to send the event.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message