flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sebastian Kruse (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-2193) Partial shuffling
Date Wed, 10 Jun 2015 07:32:00 GMT
Sebastian Kruse created FLINK-2193:
--------------------------------------

             Summary: Partial shuffling
                 Key: FLINK-2193
                 URL: https://issues.apache.org/jira/browse/FLINK-2193
             Project: Flink
          Issue Type: Improvement
            Reporter: Sebastian Kruse
            Priority: Minor


In some cases, it would come in handy to shuffle only some specific elements of a dataset
instead of all elements. This is currently not achievable with a custom partitioner.

Use cases for such a feature are:
* Load balancing: split up elements that require high processing load and distribute the splits
among all task managers.
* Evolutionary algorithms: A well-suited EA model for Map/Reduce-like platforms is the island
model, where each worker maintains and evolves its own population. From time to time, individuals
among the population need to be exchanged. Shuffling all the complete populations is not necessary,
though.

A presumably easy way to achieve this feature could be to provide the local partition number
in deployed partitioners, similar to {{RichFunction#getRuntimeContext()#getIndexOfThisSubtask()}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message