helix-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "vlad.gm@gmail.com" <vlad...@gmail.com>
Subject managing a Kafka consumer group using Helix
Date Wed, 03 Dec 2014 18:57:23 GMT
Dear all,

I am sure the following question appeared inside Linkedin before :)

We would like to manage a Kafka consumer group using Helix, that is have
multiple consumer instances and assign topics and their partitions among
the consumers automatically. The consumer group would use a whitelist to
select the topics, which means that the topic/partitions list is dynamic
and can change quite frequently. I can see each topic mapping to a Helix
resource or, alternatively, using a single Helix resource to handle all
topics. We are most likely to use a custom rebalancer in order to use
throughput metrics in order to balance traffic, not partition count.

Here are a few questions:
1) If we are to use a resource per topic, would we be able to later on
jointly rebalance multiple resources at once? The current rebalancer
callback seems to handle a single resource. Would we have to actually
manage the multiple resources in the background and just use the callback
when we are asked what to do with that resource?
2) If we are to put all topics and their partitions in a single resource we
are likely to quickly go over the amount of data that can be stored in a ZK
node. I remember that buckets can help with that problem. Can the number of
buckets increase dynamically with the number of partitions?
3) How big of a problem would it be to have an environment in which the
group of administered partitions changes quite often? I guess that with one
resource per topic this would not be a big issue, however it might be a
problem with a single resource for all topics.
4) Is there a limit on the number of resources that can be stored in a
single cluster, because of the amount of data that can be stored in a
single ZK node?


View raw message