beam-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Fang <>
Subject Ordering in PCollection
Date Thu, 03 Aug 2017 19:41:18 GMT
Hi all,

We have a stream of data that's ordered by a timestamp and our use case
requires us to process the data in order with respect to the previous
element. For example, we have a stream of true/false ingested from PubSub
and we want to make sure for each key, a true always follows by a false.

I know from PubSub, the order is not guaranteed, but for the same Dataflow
job, does the ProcessContext.output guarantee order when processElement is
called based on event time or process time? From my experiment, this
assumption seems to hold up but I wonder if this is an actual assumption of
the system.

In addition, if I key the stream with another key, does the assumption
still hold? If not, is there any way with Beam to ensure that
processElement is called in order of some time stamp.



Eric Fang

Stack Labs  |  10054 Pasadena Ave, Cupertino, CA 95014

This electronic mail transmission may contain private, confidential and
privileged information that is for the sole use of the intended recipient.
If you are not the intended recipient, you are hereby notified that any
review, dissemination, distribution, archiving, or copying of this
is strictly prohibited.  If you received this communication in error,
please reply to this message immediately and delete the original message
and the associated reply.

View raw message