beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lukavský <>
Subject Re: @RequireTimeSortedInput design draft
Date Thu, 06 Jun 2019 15:03:24 GMT

I have written a PoC implementation of this in [1] and I'd like to 
discuss some implementation details. First of all, I'd appreciate any 
feedback about this. There are some known issues:

  1) need to figure out how to get Coder of input PCollection of 
stateful ParDo inside StatefulDoFnRunner

  2) there are performance considerations, that can be solved probably 
only by Sorted Map State [2]

  3) additional work is needed for allowedLateness to work correctly 
(and there are at least two ways how to solve this), see the design doc [3]

  4) more tests (for batch and validatesRunner) are needed

I have come across a few bugs in DirectRunner, which I tried to solve:

  a) timers seem to be broken in stateful pardo with side inputs

  b) timers need to be sorted by timestamp, otherwise state might be 
cleared before it gets chance to be flushed

Thanks for feedback,





On 5/23/19 4:40 PM, Robert Bradshaw wrote:
> Thanks for writing this up.
> I think the justification for adding this to the model needs to be
> that it is useful (you have this covered, though some examples would
> be nice) and that it's something that can't easily be done by users
> themselves (specifically, though it can be (relatively) cheaply done
> in streaming and batch, it's done in very different ways, and also
> that it's hard to do via composition).
> On Thu, May 23, 2019 at 4:10 PM Jan Lukavský <> wrote:
>> Hi,
>> I have written a very brief draft of how it might be possible to
>> implement @RequireTimeSortedInput discussed in [1]. I see the document
>> [2] a starting point for a discussion. There are several open questions,
>> which I believe can be resolved by this great community. :-)
>> Jan
>> [1]
>> [2]

View raw message