kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anil (JIRA)" <j...@apache.org>
Subject [jira] [Reopened] (KAFKA-6281) Kafka JavaAPI Producer failed with NotLeaderForPartitionException
Date Mon, 04 Jun 2018 17:01:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Anil reopened KAFKA-6281:
-------------------------

Thanks [~omkreddy], 

We've got this issue again today, twice. 

I see that both the times, there is GC running in my server1 , which I'm connecting to. But
it seems to be of short duration. 

Please suggest, if that could be the cause. Regarding the network connection issue(s), b/w
which components there could be issue ? I mean the zoo-keeper of server1, couldnot connect
to zoo-keeper of server2 ? . We haven't found any connectivity issues, but still want to make
sure on the ports and availablitiy.

 

 
{code:java}

Exception 1 :
-2018-06-04T05:03:35,967 [kafka-producer-network-thread | producer-1] ERROR -org.apache.kafka.common.errors.NotLeaderForPartitionException:
This server is not the leader for that topic-partition.

GC 1:
2018-06-04T05:03:24.865-0400: 1465075.225: [GC pause (G1 Evacuation Pause) (young), 0.0158735
secs]
[Parallel Time: 15.3 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 1465075225.5, Avg: 1465075225.5, Max: 1465075225.5, Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.7, Avg: 0.7, Max: 0.7, Diff: 0.1, Sum: 1.4]
[Update RS (ms): Min: 2.1, Avg: 2.3, Max: 2.4, Diff: 0.2, Sum: 4.5]
[Processed Buffers: Min: 19, Avg: 20.5, Max: 22, Diff: 3, Sum: 41]
[Scan RS (ms): Min: 3.6, Avg: 3.7, Max: 3.9, Diff: 0.2, Sum: 7.5]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 8.5, Avg: 8.5, Max: 8.6, Diff: 0.1, Sum: 17.0]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 15.2, Avg: 15.2, Max: 15.2, Diff: 0.0, Sum: 30.4]
[GC Worker End (ms): Min: 1465075240.7, Avg: 1465075240.7, Max: 1465075240.7, Diff: 0.0]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.1 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 46.0M(46.0M)->0.0B(100.0M) Survivors: 5120.0K->6144.0K Heap: 262.0M(1024.0M)->217.0M(1024.0M)]
[Times: user=0.03 sys=0.00, real=0.02 secs]




--Exception 2 --
-2018-06-04T10:15:15,522 [kafka-producer-network-thread | producer-1] ERROR -org.apache.kafka.common.errors.NotLeaderForPartitionException:
This server is not the leader for that topic-partition.

--GC--2

2018-05-18T10:15:11.501-0400: 14981.861: [GC pause (G1 Evacuation Pause) (young), 0.0205628
secs]
[Parallel Time: 20.0 ms, GC Workers: 2]
[GC Worker Start (ms): Min: 14981861.1, Avg: 14981861.1, Max: 14981861.1, Diff: 0.0]
[Ext Root Scanning (ms): Min: 0.6, Avg: 0.7, Max: 0.7, Diff: 0.0, Sum: 1.3]
[Update RS (ms): Min: 1.6, Avg: 1.6, Max: 1.6, Diff: 0.0, Sum: 3.1]
[Processed Buffers: Min: 12, Avg: 18.5, Max: 25, Diff: 13, Sum: 37]
[Scan RS (ms): Min: 2.6, Avg: 2.6, Max: 2.6, Diff: 0.0, Sum: 5.2]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Object Copy (ms): Min: 15.1, Avg: 15.1, Max: 15.1, Diff: 0.0, Sum: 30.2]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[Termination Attempts: Min: 1, Avg: 1.0, Max: 1, Diff: 0, Sum: 2]
[GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.0]
[GC Worker Total (ms): Min: 19.9, Avg: 19.9, Max: 19.9, Diff: 0.0, Sum: 39.8]
[GC Worker End (ms): Min: 14981881.0, Avg: 14981881.0, Max: 14981881.0, Diff: 0.0]
[Code Root Fixup: 0.0 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 0.0 ms]
[Other: 0.5 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 0.1 ms]
[Ref Enq: 0.0 ms]
[Redirty Cards: 0.1 ms]
[Humongous Register: 0.0 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 0.1 ms]
[Eden: 63.0M(63.0M)->0.0B(79.0M) Survivors: 8192.0K->6144.0K Heap: 425.0M(1024.0M)->364.5M(1024.0M)]
[Times: user=0.04 sys=0.00, real=0.02 secs]


{code}

> Kafka JavaAPI Producer failed with NotLeaderForPartitionException
> -----------------------------------------------------------------
>
>                 Key: KAFKA-6281
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6281
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Anil
>            Priority: Major
>         Attachments: server1-controller.log, server2-controller.log
>
>
> We are running Kafka (vesion kafka_2.11-0.10.1.0) in a 2 node cluster. We have 2 producers
(Java API) acting on different topics. Each topic has single partition. The topic where we
had this issue, has one consumer running. This set up has been running fine for 3 months,
and we saw this issue. All the suggested cases/solutions for this issue in other forums don't
seem to apply for my scenario.
> Exception at producer;
> {code}
> -2017-11-25T17:40:33,035 [kafka-producer-network-thread | producer-1] ERROR client.producer.BingLogProducerCallback
- Encountered exception in sending message ; > org.apache.kafka.common.errors.NotLeaderForPartitionException:
This server is not the leader for that topic-partition.
> {code}
> We haven't enabled retries for the messages, because this is transactional data and we
want to maintain the order.
> Producer config:
> {code}
> bootstrap.servers : server1ip:9092
> acks :all
> retries : 0
> linger.ms :0
> buffer.memory :10240000
> max.request.size :1024000
> key.serializer : org.apache.kafka.common.serialization.StringSerializer
> value.serializer : org.apache.kafka.common.serialization.StringSerializer
> {code}
> We are connecting to server1 at both producer and consumer. The controller log at server2
indicates there is some shutdown happened at during sametime, but I dont understand why this
happened.
> {color:red}[2017-11-25 17:31:44,776] DEBUG [Controller 2]: topics not in preferred replica
Map() (kafka.controller.KafkaController) [2017-11-25 17:31:44,776] TRACE [Controller 2]: leader
imbalance ratio for broker 2 is 0.000000 (kafka.controller.KafkaController) [2017-11-25 17:31:44,776]
DEBUG [Controller 2]: topics not in preferred replica Map() (kafka.controller.KafkaController)
[2017-11-25 17:31:44,776] TRACE [Controller 2]: leader imbalance ratio for broker 1 is 0.000000
(kafka.controller.KafkaController) [2017-11-25 17:34:18,314] INFO [SessionExpirationListener
on 2], ZK expired; shut down all controller components and try to re-elect (kafka.controller.KafkaController$SessionExpirationListener)
[2017-11-25 17:34:18,317] DEBUG [Controller 2]: Controller resigning, broker id 2 (kafka.controller.KafkaController)
[2017-11-25 17:34:18,317] DEBUG [Controller 2]: De-registering IsrChangeNotificationListener
(kafka.controller.KafkaController) [2017-11-25 17:34:18,317] INFO [delete-topics-thread-2],
Shutting down (kafka.controller.TopicDeletionManager$DeleteTopicsThread) [2017-11-25 17:34:18,317]
INFO [delete-topics-thread-2], Stopped (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
[2017-11-25 17:34:18,318] INFO [delete-topics-thread-2], Shutdown completed (kafka.controller.TopicDeletionManager$DeleteTopicsThread)
[2017-11-25 17:34:18,318] INFO [Partition state machine on Controller 2]: Stopped partition
state machine (kafka.controller.PartitionStateMachine) [2017-11-25 17:34:18,318] INFO [Replica
state machine on controller 2]: Stopped replica state machine (kafka.controller.ReplicaStateMachine)
[2017-11-25 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Shutting down (kafka.controller.RequestSendThread)
[2017-11-25 17:34:18,318] INFO [Controller-2-to-broker-2-send-thread], Stopped (kafka.controller.RequestSendThread)
[2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-2-send-thread], Shutdown completed
(kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-1-send-thread],
Shutting down (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-1-send-thread],
Stopped (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller-2-to-broker-1-send-thread],
Shutdown completed (kafka.controller.RequestSendThread) [2017-11-25 17:34:18,319] INFO [Controller
2]: Broker 2 resigned as the controller (kafka.controller.KafkaController) [2017-11-25 17:34:18,353]
DEBUG [IsrChangeNotificationListener] Fired!!! (kafka.controller.IsrChangeNotificationListener)
[2017-11-25 17:34:18,353] DEBUG [IsrChangeNotificationListener] Fired!!! (kafka.controller.IsrChangeNotificationListener)
[2017-11-25 17:34:18,354] INFO [BrokerChangeListener on Controller 2]: Broker change listener
fired for path /brokers/ids with children 1,2 (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
[2017-11-25 17:34:18,355] DEBUG [DeleteTopicsListener on 2]: Delete topics listener fired
for topics to be deleted (kafka.controller.PartitionStateMachine$DeleteTopicsListener) [2017-11-25
17:34:18,362] INFO [AddPartitionsListener on 2]: Partition modification triggered {"version":1,"partitions":{"0":[1]}}
for path /brokers/topics/ESQ (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
[2017-11-25 17:34:18,368] INFO [AddPartitionsListener on 2]: Partition modification triggered
{"version":1,"partitions":{"0":[1]}} for path /brokers/topics/Test1 (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
[2017-11-25 17:34:18,369] INFO [AddPartitionsListener on 2]: Partition modification triggered
{"version":1,"partitions":{"0":[2]}} for path /brokers/topics/ImageQ (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
[2017-11-25 17:34:18,374] INFO [AddPartitionsListener on 2]: Partition modification triggered
{"version":1,"partitions":{"8":[1,2],"4":[1,2],"9":[2,1],"5":[2,1],"6":[1,2],"1":[2,1],"0":[1,2],"2":[1,2],"7":[2,1],"3":[2,1]}}
for path /brokers/topics/NMS_NotifyQ (kafka.controller.PartitionStateMachine$PartitionModificationsListener)
[2017-11-25 17:34:18,375] INFO [AddPartitionsListener on 2]: Partition modification triggered
{"version":1,"partitions":{"0":[1]}} for path /brokers/topics/TempBinLogReqQ @{color}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message