kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Json Tu <kafka...@126.com>
Subject Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker
Date Fri, 10 Nov 2017 06:23:02 GMT
The broker with broker id 4759750 is just restart,and there are 500+ replica partitions shrink
and expand frequently,and there leader partition is distributed in the other 5 brokers.
the log is pulled from one broker,and extract logs related to 1 partition.

> 在 2017年11月10日,下午12:06,Hu Xi <huxi_2b@hotmail.com> 写道:
> 
> Seems broker `4759750` was always removed for partition [Yelp, 5] every round of ISR
shrinking. Did you check if everything works alright for this broker?
> 
> 
> 发件人: Json Tu <kafkausr@126.com>
> 发送时间: 2017年11月10日 11:08
> 收件人: users@kafka.apache.org
> 抄送: dev@kafka.apache.org; Guozhang Wang
> 主题: Re: Kafka 0.9.0.1 partitions shrink and expand frequently after restart the broker
>  
> I‘m so sorry for my poor english.
> 
> what I really means is my broker machine is configured as 8 core 16G. but my jvm configure
is as below.
> java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35
-XX:+DisableExplicitGC -Djava.awt.headless=true -Xloggc:/xx/yy/kafkaServer-gc.log -verbose:gc
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+UseGCLogFileRotation
-XX:NumberOfGCLogFiles=32 -XX:GCLogFileSize=10M -XX:+HeapDumpOnOutOfMemoryError.
> 
> we have 30+ clusters with this jvm configure, and are deployed on the machine which configured
as 8 core 16G. compare to other clusters, the current cluster have more than 5 times partitions
than other clusters.
> when we restart other clusters,  there is no such phenomenon.
> 
> may be some metrics or logs can leads to find root cause of this phenomenon.
> Looking forward to more suggestions.
> 
> 
> > 在 2017年11月9日,下午9:59,John Yost <hokiegeek2@gmail.com> 写道:
> > 
> > I've seen this before and it was due to long GC pauses due in large part to
> > a memory heap > 8 GB.
> > 
> > --John
> > 
> > On Thu, Nov 9, 2017 at 8:17 AM, Json Tu <kafkausr@126.com> wrote:
> > 
> >> Hi,
> >>    we have a kafka cluster which is made of 6 brokers,  with 8 cpu and
> >> 16G memory on each broker’s machine, and we have about 1600 topics in the
> >> cluster,about 1700 partitions’ leader and 1600 partitions' replica on each
> >> broker.
> >>    when we restart a normal broke,  we find that there are 500+
> >> partitions shrink and expand frequently when restart the broker,
> >> there are many logs as below.
> >> 
> >>   [2017-11-09 17:05:51,173] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:22,047] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:28,634] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:44,658] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:06:47,611] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:07:19,703] INFO Partition [Yelp,5] on broker 4759726:
> >> Shrinking ISR for partition [Yelp,5] from 4759726,4759750 to 4759726
> >> (kafka.cluster.Partition)
> >> [2017-11-09 17:07:26,811] INFO Partition [Yelp,5] on broker 4759726:
> >> Expanding ISR for partition [Yelp,5] from 4759726 to 4759726,4759750
> >> (kafka.cluster.Partition)
> >> …
> >> 
> >> 
> >>    and repeat shrink and expand after 30 minutes which is the default
> >> value of leader.imbalance.check.interval.seconds, and at that time
> >> we can find the log of controller’s auto rebalance,which can leads some
> >> partition’s leader change to this restarted broker.
> >>    we have no shrink and expand when our cluster is running except when
> >> we restart it,so replica.fetch.thread.num is 1,and it seems enough.
> >> 
> >>    we can reproduce it at each restart,can someone give some suggestions.
> >> thanks before.
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 
> >> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message