flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gianmarco De Francisci Morales <g...@apache.org>
Subject Re: Load balancing
Date Wed, 10 Jun 2015 13:40:06 GMT
We have been working on an adaptive load balancing strategy that would
address exactly the issue you point out.
FLINK-1725 is the starting point for the integration.

Cheers,

--
Gianmarco

On 9 June 2015 at 20:31, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Sebastian,
>
> I agree, shuffling only specific elements would be a very useful feature,
> but unfortunately it's not supported (yet).
> Would you like to open a JIRA for that?
>
> Cheers, Fabian
>
> 2015-06-09 17:22 GMT+02:00 Kruse, Sebastian <Sebastian.Kruse@hpi.de>:
>
>>  Hi folks,
>>
>>
>>
>> I would like to do some load balancing within one of my Flink jobs to
>> achieve good scalability. The rebalance() method is not applicable in my
>> case, as the runtime is dominated by the processing of very few larger
>> elements in my dataset. Hence, I need to distribute the processing work for
>> these elements among the nodes in the cluster. To do so, I subdivide those
>> elements into partial tasks and want to distribute these partial tasks to
>> other nodes by employing a custom partitioner.
>>
>>
>>
>> Now, my question is the following: Actually, I do not need to shuffle the
>> complete dataset but only a few elements. So is there a way of telling
>> within the partitioner, that data should reside on the same task manager?
>> Thanks!
>>
>>
>>
>> Cheers,
>>
>> Sebastian
>>
>
>

Mime
View raw message