beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dhalperi <>
Subject [GitHub] incubator-beam pull request: Add BoundedReader APIs for expressing...
Date Thu, 19 May 2016 02:52:55 GMT
GitHub user dhalperi opened a pull request:

    Add BoundedReader APIs for expressing remaining and consumed parallelism

    These are useful for dynamic work rebalancing and autoscaling.

You can merge this pull request into a Git repository by running:

    $ git pull limited-parallelism

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #353
commit dfeecdbd6a751f0bac1f398dd1a86040a5c5166e
Author: Dan Halperin <>
Date:   2016-05-04T00:53:48Z

    BoundedReader: add getParallelism{Consumed,Remaining}
    And implement it for common sources

commit 837a42dcce1d14a45ff7c8b7c3b4efdbbf98ef82
Author: Dan Halperin <>
Date:   2016-05-15T20:54:57Z

    OffsetBasedReader: test limited parallelism signals

commit 32894f8e4f7b68b940e71dcc52202bf2216914aa
Author: Dan Halperin <>
Date:   2016-05-16T22:33:11Z

    CompressedSource: add tests of parallelism and progress
    *) empty file
    *) non-empty compressed file
    *) non-empty not-compressed file

commit ca29728dbff52a796279753c5b3efc3659f9ba06
Author: Dan Halperin <>
Date:   2016-05-17T03:04:52Z

    TextIO: implement and test parallelism
    *) empty file
    *) non-empty file

commit b866541f71df59f278e17b2895e7b412cbc7734f
Author: Dan Halperin <>
Date:   2016-05-17T03:48:33Z

    CountingSource: test limited parallelism

commit 1a4a0d99049871fe58f5194c59e0bf646894fae7
Author: Dan Halperin <>
Date:   2016-05-17T05:33:07Z

    AvroSource: rewrite to support remaining parallelism
    *) Make the start of a block match Avro's definition: the first byte after the previous
sync marker.
       This enables detecting the last block in the file.
    *) This change enables us to unify currentOffset and currentBlockOffset, as all records
are emitted
       at the start of the block that contains them.
    *) Simplify block header reading to have fewer object allocations and buffers using a
       reader and a (allocated once only) CountingInputStream to measure the size of that
    *) Add tests for consumed and remaining parallelism
    *) Let BlockBasedSource detect the end of the file in remaining parallelism.
    *) Verify in more places that the correct number of bytes is read from
       the input Avro file.

commit 4c775be82ccf68cdc221242e9cdfd4d0796a13e7
Author: Dan Halperin <>
Date:   2016-05-17T08:47:54Z

    CompressedSource: implement currentOffset based on bytes decompressed
    This is not a very good offset because it is an upper bound, but it is
    likely better than not reporting any progress at all.


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message