beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Commented] (BEAM-2680) Improve scalability of the Watch transform
Date Thu, 25 Jan 2018 20:09:00 GMT


Eugene Kirpichov commented on BEAM-2680:

Note: as a workaround, normally a user should be able to "shard" the input of Watch (e.g.
a filepattern) so that each individual poll result is smaller.

> Improve scalability of the Watch transform
> ------------------------------------------
>                 Key: BEAM-2680
>                 URL:
>             Project: Beam
>          Issue Type: Bug
>          Components: sdk-java-core
>            Reporter: Eugene Kirpichov
>            Assignee: Eugene Kirpichov
>            Priority: Major
> [] introduces the Watch transform [].
> The implementation leaves several scalability-related TODOs:
>  1) The state stores hashes and timestamps of outputs that have already been output and
should be omitted from future polls. We could garbage-collect this state, e.g. dropping elements
from "completed" and from addNewAsPending() if their timestamp is more than X behind the watermark.
>  2) When a poll returns a huge number of elements, we don't necessarily have to add all
of them into state.pending - instead we could add only N oldest elements and ignore others,
relying on future poll rounds to provide them, in order to avoid blowing up the state. Combined
with garbage collection of GrowthState.completed, this would make the transform scalable to
very large poll results.

This message was sent by Atlassian JIRA

View raw message