flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitry Golubets <dgolub...@gmail.com>
Subject Re: Three input stream operator and back pressure
Date Tue, 17 Jan 2017 12:05:43 GMT
Hi Stephan,

In one of our components we have to process events in order, due to
business logic requirements.
That is for sure introduces a bottleneck, but other aspects are fine.

I'm not taking about really resorting data, but just about consuming it in
the right order.
I.e. if two streams are already in order, all that has to be done is to
consume one that has the Min element at it's head and backpressure another
one.

What I can do ofc is to create a custom Source for it. But I would prefer
not to mix source dependent logic (e.g. Kafka connection, etc) and merging
logic.

Best regards,
Dmitry

On Tue, Jan 17, 2017 at 10:46 AM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Dmitry!
>
> The streaming runtime makes a conscious decision to not merge streams as
> in an ordered merge.
> The reason is that this is at large scale typically bad for scalability /
> network performance.
> Also, in certain DAGs, it may lead to deadlocks.
>
> Even the two input operator delivers records on a low level in a
> first-come-first-serve order as driven by network events (NIO events).
>
> Flink's operators tolerate out-of-order records to compensate for that.
> Overall, that seemed the more scalable design to us.
> Can your use case follow a similar approach?
>
> Stephan
>
>
>
> On Tue, Jan 17, 2017 at 10:57 AM, Dmitry Golubets <dgolubets@gmail.com>
> wrote:
>
>> Hi Timo,
>>
>> I don't have any key to join on, so I'm not sure Window Join would work
>> for me.
>>
>> Can I implement my own "low level" operator in any way?
>> I would appreciate if you can give me a hint or a link to example of how
>> to do it.
>>
>>
>>
>> Best regards,
>> Dmitry
>>
>> On Tue, Jan 17, 2017 at 9:24 AM, Timo Walther <twalthr@apache.org> wrote:
>>
>>> Hi Dmitry,
>>>
>>> the runtime supports an arbitrary number of inputs, however, the API
>>> does currently not provide a convenient way. You could use the "union"
>>> operator to reduce the number of inputs. Otherwise I think you have to
>>> implement your own operator. That depends on your use case though.
>>>
>>> You can maintain backpressure by using Flink's operator state. But did
>>> you also thought about a Window Join instead?
>>>
>>> I hope that helps.
>>>
>>> Timo
>>>
>>>
>>>
>>>
>>> Am 17/01/17 um 00:20 schrieb Dmitry Golubets:
>>>
>>> Hi,
>>>
>>> there are only *two *interfaces defined at the moment:
>>> *OneInputStreamOperator*
>>> and
>>> *TwoInputStreamOperator.*
>>>
>>> Is there any way to define an operator with arbitrary number of inputs?
>>>
>>> My another concern is how to maintain *backpressure *in the operator?
>>> Let's say I read events from two Kafka sources, both of which are
>>> ordered by time. I want to merge them keeping the global order. But to do
>>> it, I need to stop block one input if another one has no data yet.
>>>
>>> Best regards,
>>> Dmitry
>>>
>>>
>>>
>>
>

Mime
View raw message