beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eugene Kirpichov (JIRA)" <>
Subject [jira] [Created] (BEAM-2680) Improve scalability of the Watch transform
Date Wed, 26 Jul 2017 02:24:00 GMT
Eugene Kirpichov created BEAM-2680:

             Summary: Improve scalability of the Watch transform
                 Key: BEAM-2680
             Project: Beam
          Issue Type: Bug
          Components: sdk-java-core
            Reporter: Eugene Kirpichov
            Assignee: Eugene Kirpichov introduces the Watch transform

The implementation leaves several scalability-related TODOs:
1) The state stores hashes and timestamps of outputs that have already been output and should
be omitted from future polls. We could garbage-collect this state, e.g. dropping elements
from "completed" and from addNewAsPending() if their timestamp is more than X behind the watermark.
2) When a poll returns a huge number of elements, we don't necessarily have to add all of
them into state.pending - instead we could add only N elements and ignore others, relying
on future poll rounds to provide them, in order to avoid blowing up the state. Combined with
garbage collection of GrowthState.completed, this would make the transform scalable to very
large poll results.

This message was sent by Atlassian JIRA

View raw message