incubator-kafka-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jay Kreps <jay.kr...@gmail.com>
Subject Re: Consumer re-balance behavior
Date Mon, 24 Oct 2011 17:20:50 GMT
Hi Inder,

Yes, this is an important point. As you say, we clump together the
partitions consumed but make no attempt to optimize locality in the case
that the consumer is on the broker machine. We are very interested in
supporting this in the future, though, I think it is a valid use case for
some kinds of problems. The fix would be to make the assignment algorithm
prefer local to non-local partitions.

-Jay

On Mon, Oct 24, 2011 at 8:50 AM, <inder.pall@gmail.com> wrote:

> Jun/Jay,
>
> This clarifies. Sorted partitions grouped together might go to same
> consumer but it may not be going to the consumer where its local instead of
> NIO, if consumers are hosted on brokers.
>
> For example :
> In this case my partitions were like 1-0, 1-1 hosted on a machine named
> proudarmburn and 2-0, 2-1,  later on trainingarmburn so we got the
> localization as sorting followed the same behavior, however if my second
> machine name would had been leadergather then after sorting it would had
> gotten partitions starting with 1 series which are remote to it.
>
> Since we know the brokers hosting the partitions and consumers which are
> subscribed if we add affinity to broker while re balancing it could save
> quite a bit of nio.
>
> Of course I am not sure if its a recommended configuration to be able to
> have consumers share hardware with brokers. What do you guys think
>
> Inder
> Sent from BlackBerry® on Airtel
>
> -----Original Message-----
> From: Jay Kreps <jay.kreps@gmail.com>
> Date: Mon, 24 Oct 2011 08:03:41
> To: <kafka-users@incubator.apache.org>
> Reply-To: kafka-users@incubator.apache.org
> Subject: Re: Consumer re-balance behavior
>
> Also, to answer your question, it is intentional. This way each consumer
> connects to and interacts with a fixed number of brokers irrespective of
> the
> total size of the cluster.
>
> -Jay
>
> On Mon, Oct 24, 2011 at 7:31 AM, Jun Rao <junrao@gmail.com> wrote:
>
> > During rebalance, we simply sort all partitions and consumers by name and
> > give each consumer an even range of partitions. Since partitions on the
> > same
> > broker sort together, they tend to be given out to the same consumer, as
> in
> > this case.
> >
> > Since partition is the unit of rebalance, you want to have at least as
> many
> > partitions as consumers. This is the main reason to have more than 1
> > partition per broker.
> >
> > Number of partitions is controlled by 2 config parameters: num.partitions
> > and topic.partition.count.map. The former is the default for all topics
> and
> > the latter is for specific topics.
> >
> > Jun
> >
> > On Mon, Oct 24, 2011 at 1:28 AM, Inder Pall <inder.pall@gmail.com>
> wrote:
> >
> > > All,
> > >
> > > need some clarity and confirmation on the following behavior.
> > >
> > > Use-Case
> > > ------------
> > > 1. I have a topic T spread across two brokers (B1, B2)running on
> > different
> > > machines, each having 2 partitions configured for T. Totally 4
> partitions
> > > (1-0, 1-1, 2-0, 2-1)
> > > 2. Consumer C1 is part of group g1 and is consuming from from B1, B2
> for
> > T
> > > 3. Add a new consumer C2 part of g1
> > >
> > > This is triggering a re balance across C1 & C2 and eventually C1 gets
> > 1-0,
> > > 1-1 and C2 gets 2-0, 2-1.
> > > P.S. - B1, C1 are sharing the same machine, same is the case with B2,C2
> > >
> > > Behavior
> > > ---------
> > > both consumers are getting partitions which are hosted on the same
> boxes.
> > > Is
> > > this a coincidence or an optimization w.r.t locality of data and will
> > > always
> > > be applied.
> > >
> > > More questions
> > > -----------------
> > > 1. When would you want to have multiple partitions of the same topic
> > hosted
> > > on the same broker. Is it that you have 2 partitions of T on B1 and 10
> on
> > > B2
> > > and on re balance C1 & C2 would get 6 each.
> > > 2.  As in the above use-case, C1 has 1-0 & 1-1 partitions of T and
> adding
> > > messages to B1 results in the messages being spread across both the
> > > partitions. Is this behavior round robin or based on segment file
> > > size/other
> > > parameters?
> > > 3. Is it possible to configure #partitons based on topic, if so how?
> > >
> > > -- Inder
> > >
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message