flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Flink Rebalance
Date Fri, 10 Aug 2018 09:57:41 GMT
Hi,

Elias and Paul have good points.
I think the performance degradation is mostly to the lack of function
chaining in the rebalance case.

If all steps are just map functions, they can be chained in the
no-rebalance case.
That means, records are passed via function calls.
If you add rebalancing, records will be passed between map functions via
serialization, network transfer, and deserialization.
This is of course much more expensive than calling a method.

Best, Fabian

2018-08-10 4:25 GMT+02:00 Paul Lam <paullin3280@gmail.com>:

> Hi Antonio,
>
> AFAIK, there are two reasons for this:
>
> 1. Rebalancing itself brings latency because it takes time to redistribute
> the elements.
> 2. Rebalancing also messes up the order in the Kafka topic partitions, and
> often makes a event-time window wait longer to trigger in case you’re using
> event time characteristic.
>
> Best Regards,
> Paul Lam
>
>
>
> 在 2018年8月10日,05:49,antonio saldivar <ansale10@gmail.com> 写道:
>
> Hello
>
> Sending ~450 elements per second ( the values are in milliseconds start to
> end)
> I went from:
> with Rebalance
> *+------------+*
> *| **AVGWINDOW ** |*
> *+------------+*
> *| *32131.0853  * |*
> *+------------+*
>
> to this without rebalance
>
> *+------------+*
> *| **AVGWINDOW ** |*
> *+------------+*
> *| *70.2077   * |*
> *+------------+*
>
> El jue., 9 ago. 2018 a las 17:42, Elias Levy (<fearsome.lucidity@gmail.com
> >) escribió:
>
>> What do you consider a lot of latency?  The rebalance will require
>> serializing / deserializing the data as it gets distributed.  Depending on
>> the complexity of your records and the efficiency of your serializers, that
>> could have a significant impact on your performance.
>>
>> On Thu, Aug 9, 2018 at 2:14 PM antonio saldivar <ansale10@gmail.com>
>> wrote:
>>
>>> Hello
>>>
>>> Does anyone know why when I add "rebalance()" to my .map steps is adding
>>> a lot of latency rather than not having rebalance.
>>>
>>>
>>> I have kafka partitions in my topic 44 and 44 flink task manager
>>>
>>> execution plan looks like this when I add rebalance but it is adding a
>>> lot of latency
>>>
>>> kafka-src -> rebalance -> step1 -> rebalance ->step2->rebalance
->
>>> kafka-sink
>>>
>>> Thank you
>>> regards
>>>
>>>
>

Mime
View raw message