flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yan Zhou [FDS Science] ­ <yz...@coupang.com>
Subject Re: DataStream joining without window
Date Wed, 11 Oct 2017 18:13:07 GMT
Thank you for the reply. It's very helpful.

Best
Yan

On Tue, Oct 10, 2017 at 7:57 AM, Aljoscha Krettek <aljoscha@apache.org>
wrote:

> Hi,
>
> Yes, using a TwoInputStreamOperator (or even better, a CoProcessFunction,
> because TwoInputStreamOperator is a low-level interface that might change
> in the future) is the recommended way for implementing a stream-stream
> join, currently.
>
> As you already guessed, you need a policy for cleanup up the state that
> you hold. You can do this using the timer features of CoProcessFunction.
>
> Also, if you keep your buffered elements using the Flink state interfaces
> you can switch the state backend to the RocksDB backend and if you have
> concerns about the state growing too big.
>
> Best,
> Aljoscha
>
> > On 9. Oct 2017, at 23:11, Yan Zhou [FDS Science] ­ <yzhou@coupang.com>
> wrote:
> >
> > It seems like flink only supports DataStream joining within same time
> window. Why is it restricted in this way?
> >
> > I think I can implement a TwoInputStreamOperator to join two DataStreams
> without considering the window.  And inside the operator, create two state
> to cache records of two streams and join the streams within methods
> processElement1/processElement2. Should I go head with this approach? Is
> there any performance consideration here? If the concern is that the cache
> might take a lot of memory, we can introduce some cache policy and reduce
> the size. Or can we use rocksDB state?
> >
> > Please advise.
> >
> > Best
> > Yan
> >
>
>

Mime
View raw message