kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-687) Rebalance algorithm should consider partitions from all topics
Date Wed, 09 Jan 2013 16:02:13 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548604#comment-13548604

Jay Kreps commented on KAFKA-687:

This is a very good point, and not one I had considered.

It is probably not a trivial change because right now I think the election is done for each
topic independently.

We have in mind in the next major release after 0.8 (0.9, presumably) to move this co-ordination
to the server, which would be a good time to fix this. We could either do this balancing exactly
or else just randomize the start index (which would be almost as good if you had many topics.
> Rebalance algorithm should consider partitions from all topics
> --------------------------------------------------------------
>                 Key: KAFKA-687
>                 URL: https://issues.apache.org/jira/browse/KAFKA-687
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.8.1
>            Reporter: Pablo Barrera
> The current rebalance step, as stated in the original Kafka paper [1], splits the partitions
per topic between all the consumers. So if you have 100 topics with 2 partitions each and
10 consumers only two consumers will be used. That is, for each topic all partitions will
be listed and shared between the consumers in the consumer group in order (not randomly).
> If the consumer group is reading from several topics at the same time it makes sense
to split all the partitions from all topics between all the consumer. Following the example,
we will have 200 partitions in total, 20 per consumer, using the 10 consumers.
> The load per topic could be different and the division should consider this. However
even a random division should be better than the current algorithm while reading from several
topics and should harm reading from a few topics with several partitions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message