flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Load balancing
Date Tue, 09 Jun 2015 17:31:46 GMT
Hi Sebastian,

I agree, shuffling only specific elements would be a very useful feature,
but unfortunately it's not supported (yet).
Would you like to open a JIRA for that?

Cheers, Fabian

2015-06-09 17:22 GMT+02:00 Kruse, Sebastian <Sebastian.Kruse@hpi.de>:

>  Hi folks,
> I would like to do some load balancing within one of my Flink jobs to
> achieve good scalability. The rebalance() method is not applicable in my
> case, as the runtime is dominated by the processing of very few larger
> elements in my dataset. Hence, I need to distribute the processing work for
> these elements among the nodes in the cluster. To do so, I subdivide those
> elements into partial tasks and want to distribute these partial tasks to
> other nodes by employing a custom partitioner.
> Now, my question is the following: Actually, I do not need to shuffle the
> complete dataset but only a few elements. So is there a way of telling
> within the partitioner, that data should reside on the same task manager?
> Thanks!
> Cheers,
> Sebastian

View raw message