Yes, using a TwoInputStreamOperator (or even better, a CoProcessFunction, because TwoInputStreamOperator is a low-level interface that might change in the future) is the recommended way for implementing a stream-stream join, currently.
As you already guessed, you need a policy for cleanup up the state that you hold. You can do this using the timer features of CoProcessFunction.
Also, if you keep your buffered elements using the Flink state interfaces you can switch the state backend to the RocksDB backend and if you have concerns about the state growing too big.
> On 9. Oct 2017, at 23:11, Yan Zhou [FDS Science] <email@example.com> wrote:
> It seems like flink only supports DataStream joining within same time window. Why is it restricted in this way?
> I think I can implement a TwoInputStreamOperator to join two DataStreams without considering the window. And inside the operator, create two state to cache records of two streams and join the streams within methods processElement1/
processElement2. Should I go head with this approach? Is there any performance consideration here? If the concern is that the cache might take a lot of memory, we can introduce some cache policy and reduce the size. Or can we use rocksDB state?
> Please advise.