beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kenneth Knowles (JIRA)" <>
Subject [jira] [Commented] (BEAM-1002) Enable caching of side-input dependent computations
Date Thu, 30 Mar 2017 17:45:41 GMT


Kenneth Knowles commented on BEAM-1002:

I also think there is a case for state-like APIs here. Interestingly (maybe not surprisingly)
partitioning by window matters, while partitioning per key is not. A bit of a flavor of the
fact that "keyed state" is only keyed to give some stable granularity for parallelism, while
it is windowed for correctness.

This might be far from optimal in terms of pithiness, but here is the minimal deviation from
existing state API:

new DoFn<NotAKV, Whatever>() {
  private final StateSpec<ValueState<MySideType>> globalSpec = ...

  /* lazily read the side input when the transient state is gone and write it to the state

We would discard it when the instance goes away (it might be corrupted, so we have to). Naively,
it is mostly just a convenience on top of a {{Map}} in an instance field, is it not?

Differences I can see:
 - Less error prone, or course
 - Could conceivably allow clearing it early, if we spec it out to be that way
 - For keyed merging windows we could insert the needed GBK for correctness

> Enable caching of side-input dependent computations
> ---------------------------------------------------
>                 Key: BEAM-1002
>                 URL:
>             Project: Beam
>          Issue Type: New Feature
>          Components: beam-model
>            Reporter: Robert Bradshaw
>            Assignee: Kenneth Knowles
> Sometimes the kind of computations one wants to perform in startBundle depend on side
inputs (and, implicitly, the window). For example, one might want to initialize a (non-serializable)
stateful object. In particular, this leads to users incorrectly (in the case of triggered
or non-globally-windowed side inputs) memoizing this computation in the first processElement
> One option would be to fold this into a customizable ViewFn. 

This message was sent by Atlassian JIRA

View raw message