From dev-return-94813-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Mon Jun 4 19:01:06 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id CD023180636 for ; Mon, 4 Jun 2018 19:01:05 +0200 (CEST) Received: (qmail 69979 invoked by uid 500); 4 Jun 2018 17:01:04 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 69968 invoked by uid 99); 4 Jun 2018 17:01:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2018 17:01:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E4EAE1806B9 for ; Mon, 4 Jun 2018 17:01:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id z9TeWKzg69hn for ; Mon, 4 Jun 2018 17:01:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 459395F4E7 for ; Mon, 4 Jun 2018 17:01:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 76189E02BB for ; Mon, 4 Jun 2018 17:01:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2F1CC21097 for ; Mon, 4 Jun 2018 17:01:00 +0000 (UTC) Date: Mon, 4 Jun 2018 17:01:00 +0000 (UTC) From: "Anil (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Reopened] (KAFKA-6281) Kafka JavaAPI Producer failed with NotLeaderForPartitionException MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/KAFKA-6281?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Anil reopened KAFKA-6281: ------------------------- Thanks [~omkreddy],=C2=A0 We've got this issue again today, twice.=C2=A0 I see that both the times, there is GC running in my server1 , which I'm co= nnecting to. But it seems to be of short duration.=C2=A0 Please suggest, if that could be the cause. Regarding the network connectio= n issue(s), b/w which components there could be issue ? I mean the zoo-keep= er of server1, couldnot connect to zoo-keeper of server2 ? . We haven't fou= nd any connectivity issues, but still want to make sure on the ports and av= ailablitiy. =C2=A0 =C2=A0 {code:java} Exception 1 : -2018-06-04T05:03:35,967 [kafka-producer-network-thread | producer-1] ERROR= -org.apache.kafka.common.errors.NotLeaderForPartitionException: This serve= r is not the leader for that topic-partition. GC 1: 2018-06-04T05:03:24.865-0400: 1465075.225: [GC pause (G1 Evacuation Pause) = (young), 0.0158735 secs] [Parallel Time: 15.3 ms, GC Workers: 2] [GC Worker Start (ms): Min: 1465075225.5, Avg: 1465075225.5, Max: 146507522= 5.5, Diff: 0.0] [Ext Root Scanning (ms): Min: 0.7, Avg: 0.7, Max: 0.7, Diff: 0.1, Sum: 1.4] [Update RS (ms): Min: 2.1, Avg: 2.3, Max: 2.4, Diff: 0.2, Sum: 4.5] [Processed Buffers: Min: 19, Avg: 20.5, Max: 22, Diff: 3, Sum: 41] [Scan RS (ms): Min: 3.6, Avg: 3.7, Max: 3.9, Diff: 0.2, Sum: 7.5] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0= ] [Object Copy (ms): Min: 8.5, Avg: 8.5, Max: 8.6, Diff: 0.1, Sum: 17.0] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Total (ms): Min: 15.2, Avg: 15.2, Max: 15.2, Diff: 0.0, Sum: 30.= 4] [GC Worker End (ms): Min: 1465075240.7, Avg: 1465075240.7, Max: 1465075240.= 7, Diff: 0.0] [Code Root Fixup: 0.1 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.1 ms] [Other: 0.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.1 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 46.0M(46.0M)->0.0B(100.0M) Survivors: 5120.0K->6144.0K Heap: 262.0M(= 1024.0M)->217.0M(1024.0M)] [Times: user=3D0.03 sys=3D0.00, real=3D0.02 secs] --Exception 2 -- -2018-06-04T10:15:15,522 [kafka-producer-network-thread | producer-1] ERROR= -org.apache.kafka.common.errors.NotLeaderForPartitionException: This serve= r is not the leader for that topic-partition. --GC--2 2018-05-18T10:15:11.501-0400: 14981.861: [GC pause (G1 Evacuation Pause) (y= oung), 0.0205628 secs] [Parallel Time: 20.0 ms, GC Workers: 2] [GC Worker Start (ms): Min: 14981861.1, Avg: 14981861.1, Max: 14981861.1, D= iff: 0.0] [Ext Root Scanning (ms): Min: 0.6, Avg: 0.7, Max: 0.7, Diff: 0.0, Sum: 1.3] [Update RS (ms): Min: 1.6, Avg: 1.6, Max: 1.6, Diff: 0.0, Sum: 3.1] [Processed Buffers: Min: 12, Avg: 18.5, Max: 25, Diff: 13, Sum: 37] [Scan RS (ms): Min: 2.6, Avg: 2.6, Max: 2.6, Diff: 0.0, Sum: 5.2] [Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0= ] [Object Copy (ms): Min: 15.1, Avg: 15.1, Max: 15.1, Diff: 0.0, Sum: 30.2] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0] [GC Worker Total (ms): Min: 19.9, Avg: 19.9, Max: 19.9, Diff: 0.0, Sum: 39.= 8] [GC Worker End (ms): Min: 14981881.0, Avg: 14981881.0, Max: 14981881.0, Dif= f: 0.0] [Code Root Fixup: 0.0 ms] [Code Root Purge: 0.0 ms] [Clear CT: 0.0 ms] [Other: 0.5 ms] [Choose CSet: 0.0 ms] [Ref Proc: 0.1 ms] [Ref Enq: 0.0 ms] [Redirty Cards: 0.1 ms] [Humongous Register: 0.0 ms] [Humongous Reclaim: 0.0 ms] [Free CSet: 0.1 ms] [Eden: 63.0M(63.0M)->0.0B(79.0M) Survivors: 8192.0K->6144.0K Heap: 425.0M(1= 024.0M)->364.5M(1024.0M)] [Times: user=3D0.04 sys=3D0.00, real=3D0.02 secs] {code} > Kafka JavaAPI Producer failed with NotLeaderForPartitionException > ----------------------------------------------------------------- > > Key: KAFKA-6281 > URL: https://issues.apache.org/jira/browse/KAFKA-6281 > Project: Kafka > Issue Type: Bug > Reporter: Anil > Priority: Major > Attachments: server1-controller.log, server2-controller.log > > > We are running Kafka (vesion kafka_2.11-0.10.1.0) in a 2 node cluster. We= have 2 producers (Java API) acting on different topics. Each topic has sin= gle partition. The topic where we had this issue, has one consumer running.= This set up has been running fine for 3 months, and we saw this issue. All= the suggested cases/solutions for this issue in other forums don't seem to= apply for my scenario. > Exception at producer; > {code} > -2017-11-25T17:40:33,035 [kafka-producer-network-thread | producer-1] ERR= OR client.producer.BingLogProducerCallback - Encountered exception in sendi= ng message ; > org.apache.kafka.common.errors.NotLeaderForPartitionExceptio= n: This server is not the leader for that topic-partition. > {code} > We haven't enabled retries for the messages, because this is transactiona= l data and we want to maintain the order. > Producer config: > {code} > bootstrap.servers : server1ip:9092 > acks :all > retries : 0 > linger.ms :0 > buffer.memory :10240000 > max.request.size :1024000 > key.serializer : org.apache.kafka.common.serialization.StringSerializer > value.serializer : org.apache.kafka.common.serialization.StringSerializer > {code} > We are connecting to server1 at both producer and consumer. The controlle= r log at server2 indicates there is some shutdown happened at during sameti= me, but I dont understand why this happened. > {color:red}[2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in = preferred replica Map() (kafka.controller.KafkaController) [2017-11-25 17:3= 1:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 2 is 0.00= 0000 (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] DEBUG [Co= ntroller 2]: topics not in preferred replica Map() (kafka.controller.KafkaC= ontroller) [2017-11-25 17:31:44,776] TRACE [Controller 2]: leader imbalance= ratio for broker 1 is 0.000000 (kafka.controller.KafkaController) [2017-11= -25 17:34:18,314] INFO [SessionExpirationListener on 2], ZK expired; shut d= own all controller components and try to re-elect (kafka.controller.KafkaCo= ntroller$SessionExpirationListener) [2017-11-25 17:34:18,317] DEBUG [Contro= ller 2]: Controller resigning, broker id 2 (kafka.controller.KafkaControlle= r) [2017-11-25 17:34:18,317] DEBUG [Controller 2]: De-registering IsrChange= NotificationListener (kafka.controller.KafkaController) [2017-11-25 17:34:1= 8,317] INFO [delete-topics-thread-2], Shutting down (kafka.controller.Topic= DeletionManager$DeleteTopicsThread) [2017-11-25 17:34:18,317] INFO [delete-= topics-thread-2], Stopped (kafka.controller.TopicDeletionManager$DeleteTopi= csThread) [2017-11-25 17:34:18,318] INFO [delete-topics-thread-2], Shutdown= completed (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017= -11-25 17:34:18,318] INFO [Partition state machine on Controller 2]: Stoppe= d partition state machine (kafka.controller.PartitionStateMachine) [2017-11= -25 17:34:18,318] INFO [Replica state machine on controller 2]: Stopped rep= lica state machine (kafka.controller.ReplicaStateMachine) [2017-11-25 17:34= :18,318] INFO [Controller-2-to-broker-2-send-thread], Shutting down (kafka.= controller.RequestSendThread) [2017-11-25 17:34:18,318] INFO [Controller-2-= to-broker-2-send-thread], Stopped (kafka.controller.RequestSendThread) [201= 7-11-25 17:34:18,319] INFO [Controller-2-to-broker-2-send-thread], Shutdown= completed (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] I= NFO [Controller-2-to-broker-1-send-thread], Shutting down (kafka.controller= .RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-= 1-send-thread], Stopped (kafka.controller.RequestSendThread) [2017-11-25 17= :34:18,319] INFO [Controller-2-to-broker-1-send-thread], Shutdown completed= (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Contr= oller 2]: Broker 2 resigned as the controller (kafka.controller.KafkaContro= ller) [2017-11-25 17:34:18,353] DEBUG [IsrChangeNotificationListener] Fired= !!! (kafka.controller.IsrChangeNotificationListener) [2017-11-25 17:34:18,3= 53] DEBUG [IsrChangeNotificationListener] Fired!!! (kafka.controller.IsrCha= ngeNotificationListener) [2017-11-25 17:34:18,354] INFO [BrokerChangeListen= er on Controller 2]: Broker change listener fired for path /brokers/ids wit= h children 1,2 (kafka.controller.ReplicaStateMachine$BrokerChangeListener) = [2017-11-25 17:34:18,355] DEBUG [DeleteTopicsListener on 2]: Delete topics = listener fired for topics to be deleted (kafka.controller.PartitionStateMac= hine$DeleteTopicsListener) [2017-11-25 17:34:18,362] INFO [AddPartitionsLis= tener on 2]: Partition modification triggered {"version":1,"partitions":{"0= ":[1]}} for path /brokers/topics/ESQ (kafka.controller.PartitionStateMachin= e$PartitionModificationsListener) [2017-11-25 17:34:18,368] INFO [AddPartit= ionsListener on 2]: Partition modification triggered {"version":1,"partitio= ns":{"0":[1]}} for path /brokers/topics/Test1 (kafka.controller.PartitionSt= ateMachine$PartitionModificationsListener) [2017-11-25 17:34:18,369] INFO [= AddPartitionsListener on 2]: Partition modification triggered {"version":1,= "partitions":{"0":[2]}} for path /brokers/topics/ImageQ (kafka.controller.P= artitionStateMachine$PartitionModificationsListener) [2017-11-25 17:34:18,3= 74] INFO [AddPartitionsListener on 2]: Partition modification triggered {"v= ersion":1,"partitions":{"8":[1,2],"4":[1,2],"9":[2,1],"5":[2,1],"6":[1,2],"= 1":[2,1],"0":[1,2],"2":[1,2],"7":[2,1],"3":[2,1]}} for path /brokers/topics= /NMS_NotifyQ (kafka.controller.PartitionStateMachine$PartitionModifications= Listener) [2017-11-25 17:34:18,375] INFO [AddPartitionsListener on 2]: Part= ition modification triggered {"version":1,"partitions":{"0":[1]}} for path = /brokers/topics/TempBinLogReqQ @{color} -- This message was sent by Atlassian JIRA (v7.6.3#76005)