Return-Path: X-Original-To: apmail-helix-user-archive@minotaur.apache.org Delivered-To: apmail-helix-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 81E9B10355 for ; Thu, 4 Dec 2014 22:51:03 +0000 (UTC) Received: (qmail 66655 invoked by uid 500); 4 Dec 2014 22:51:03 -0000 Delivered-To: apmail-helix-user-archive@helix.apache.org Received: (qmail 66603 invoked by uid 500); 4 Dec 2014 22:51:03 -0000 Mailing-List: contact user-help@helix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@helix.apache.org Delivered-To: mailing list user@helix.apache.org Received: (qmail 66591 invoked by uid 99); 4 Dec 2014 22:51:02 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Dec 2014 22:51:02 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of g.kishore@gmail.com designates 74.125.82.49 as permitted sender) Received: from [74.125.82.49] (HELO mail-wg0-f49.google.com) (74.125.82.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Dec 2014 22:50:35 +0000 Received: by mail-wg0-f49.google.com with SMTP id n12so15744028wgh.36 for ; Thu, 04 Dec 2014 14:50:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=IaE1rdDcJesExixsDYR2gSvlo1WQCW+e69a2hduyjtg=; b=G8pX5UZz+39F0Unhfin2Cqp5terRpYsTWOVFfnH0Xem0FvCCfumm+0KhvV6iq9nWGU 1xKlcC5aYUr2TvPOtGwqwwzr/NLpZyfb2wTuvnGUEPahZhuHHIjV+Y78gopjU9yebBCp x/yK/SaN0sv+gA6Ex3rS0k+mDE4SgH0nzxCe1+DifPPmpnR9Ke4/EsgmjcNjiPgOWUDe ehHl8IZT+UC9Ay43fkW8JzyBU+xtcOzLHIO6bZhTUgGuQaVDt4tsWSeBAMoWhCqZZlmG Wx29DUYRLUuNeX6Dc6Mw/ayyqLWOo+UQn5VB10WN3GBR2I1Dyw0A84JSbrS2Mhimq1qy r89g== MIME-Version: 1.0 X-Received: by 10.195.13.114 with SMTP id ex18mr18953047wjd.111.1417733434426; Thu, 04 Dec 2014 14:50:34 -0800 (PST) Received: by 10.194.57.130 with HTTP; Thu, 4 Dec 2014 14:50:34 -0800 (PST) In-Reply-To: References: Date: Thu, 4 Dec 2014 14:50:34 -0800 Message-ID: Subject: Re: managing a Kafka consumer group using Helix From: kishore g To: "user@helix.apache.org" Content-Type: multipart/alternative; boundary=047d7bfced1cbecdc705096bc959 X-Virus-Checked: Checked by ClamAV on apache.org --047d7bfced1cbecdc705096bc959 Content-Type: text/plain; charset=UTF-8 you bet :-) Kafka had its own consumer group implementation before Helix was implemented. We have implemented a similar consumer group pattern for Databus, so its definitely possible to extend it to Kafka. Regarding your questions. 1. Yes, current rebalancer handles one resource at a time, we had some discussion about having a composite rebalancer that rebalances multiple resources at once. Even without this feature, its possible to read the entire cluster data from the rebalancer and make the decision considering other resources. 2. Probably not a good idea to put all topics in one resource even with bucketing support. 3. Theoretically, group of changes does not matter if its within a resource v/s across resource. Its the overall change that matters. 4. We have gone up to 2000 resources across the cluster. This is not using one ZNode. I dont think its a good idea to have one ZNode for all resources. thanks, Kishore G On Wed, Dec 3, 2014 at 10:57 AM, vlad.gm@gmail.com wrote: > > Dear all, > > I am sure the following question appeared inside Linkedin before :) > > We would like to manage a Kafka consumer group using Helix, that is have > multiple consumer instances and assign topics and their partitions among > the consumers automatically. The consumer group would use a whitelist to > select the topics, which means that the topic/partitions list is dynamic > and can change quite frequently. I can see each topic mapping to a Helix > resource or, alternatively, using a single Helix resource to handle all > topics. We are most likely to use a custom rebalancer in order to use > throughput metrics in order to balance traffic, not partition count. > > Here are a few questions: > 1) If we are to use a resource per topic, would we be able to later on > jointly rebalance multiple resources at once? The current rebalancer > callback seems to handle a single resource. Would we have to actually > manage the multiple resources in the background and just use the callback > when we are asked what to do with that resource? > 2) If we are to put all topics and their partitions in a single resource > we are likely to quickly go over the amount of data that can be stored in a > ZK node. I remember that buckets can help with that problem. Can the number > of buckets increase dynamically with the number of partitions? > 3) How big of a problem would it be to have an environment in which the > group of administered partitions changes quite often? I guess that with one > resource per topic this would not be a big issue, however it might be a > problem with a single resource for all topics. > 4) Is there a limit on the number of resources that can be stored in a > single cluster, because of the amount of data that can be stored in a > single ZK node? > > Regards, > Vlad > --047d7bfced1cbecdc705096bc959 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
you bet :-)

Kafka had its own consumer group implementation before Helix was implemen= ted.

We have implemented a similar consumer group pattern for= Databus, so its definitely possible to extend it to Kafka.

Re= garding your questions.

1. Yes, current rebalancer handles one= resource at a time, we had some discussion about having a composite rebala= ncer that rebalances multiple resources at once. Even without this feature,= its possible to read the entire cluster data from the rebalancer and make = the decision considering other resources.
2. Probably not a good i= dea to put all topics in one resource even with bucketing support.
3. Theoretically, group of changes does not matter if its within a resourc= e v/s across resource. Its the overall change that matters.
4. We = have gone up to 2000 resources across the cluster. This is not using one ZN= ode. I dont think its a good idea to have one ZNode for all resources.
<= br>
thanks,
Kishore G

=

On Wed, Dec 3, 2014 at 10:57 AM, <= a href=3D"mailto:vlad.gm@gmail.com">vlad.gm@gmail.com <vlad.gm@gmail.c= om> wrote:

Dear all,

I am sure the following questi= on appeared inside Linkedin before :)

We would lik= e to manage a Kafka consumer group using Helix, that is have multiple consu= mer instances and assign topics and their partitions among the consumers au= tomatically. The consumer group would use a whitelist to select the topics,= which means that the topic/partitions list is dynamic and can change quite= frequently. I can see each topic mapping to a Helix resource or, alternati= vely, using a single Helix resource to handle all topics. We are most likel= y to use a custom rebalancer in order to use throughput metrics in order to= balance traffic, not partition count.=C2=A0

Here = are a few questions:
1) If we are to use a resource per topic, wo= uld we be able to later on jointly rebalance multiple resources at once? Th= e current rebalancer callback seems to handle a single resource. Would we h= ave to actually manage the multiple resources in the background and just us= e the callback when we are asked what to do with that resource?
2= ) If we are to put all topics and their partitions in a single resource we = are likely to quickly go over the amount of data that can be stored in a ZK= node. I remember that buckets can help with that problem. Can the number o= f buckets increase dynamically with the number of partitions?
3) = How big of a problem would it be to have an environment in which the group = of administered partitions changes quite often? I guess that with one resou= rce per topic this would not be a big issue, however it might be a problem = with a single resource for all topics.
4) Is there a limit on the= number of resources that can be stored in a single cluster, because of the= amount of data that can be stored in a single ZK node?

Regards,
Vlad

--047d7bfced1cbecdc705096bc959--