flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax" <mj...@apache.org>
Subject Re: synchronizing two streams
Date Thu, 12 May 2016 13:00:25 GMT
That is correct. But there is no reason to throttle an input stream.

If you implements an Outer-Join you will have two in-memory buffers
holding the record of each stream of your "time window". Each time you
receive a watermark, you can remove all "expired" records from the
buffer of the other stream. Furthermore, you need to track if a record
got joined of not. For all records that got not joined, before removing
them emit a "record-null" (or "null-record") result tuple.

No need to block/sleep.

Does this make sense?


On 05/12/2016 02:51 PM, Alexander Gryzlov wrote:
> Hmm, probably I don't really get how Flink's execution model works. As
> far as I understand, the preferred way to throttle down stream
> consumption is to simply have an operator with a conditional
> Thread.sleep() inside. Wouldn't calling sleep() in either
> of TwoInputStreamOperator's processWatermarkN() methods just freeze the
> entire operator, stopping the consumption of both streams (as opposed to
> just one)?
> Alex
> On Thu, May 12, 2016 at 2:31 PM, Matthias J. Sax <mjsax@apache.org
> <mailto:mjsax@apache.org>> wrote:
>     I cannot follow completely. TwoInputStreamOperators defines two methods
>     to process watermarks for each stream.
>     So you can sync both stream within your outer join operator you plan to
>     implement.
>     -Matthias
>     On 05/11/2016 05:00 PM, Alexander Gryzlov wrote:
>     > Hello,
>     >
>     > We're implementing a streaming outer join operator based on a
>     > TwoInputStreamOperator with an internal buffer. In our use-case
>     only the
>     > items whose timestamps are within a several-second interval of each
>     > other can join, so we need to synchronize the two input streams to
>     > ensure maximal yield. Our plan is to utilize the watermark
>     mechanism to
>     > implement some sort of a "throttling" operator, which would take two
>     > streams and stop passing through one of them based on the
>     watermarks in
>     > another. However, there doesn't seem to exist an operator of the shape
>     > (A,B)->(A,B) in Flink, where A and B can be received and emitted
>     > independently. What would be a resource-saving way to implement such
>     > (e.g., without spawning two more parallel TwoInputStreamOperators)?
>     >
>     > Alex

View raw message