kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Ivanichev (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-6631) Kafka Streams - Rebalancing exception in Kafka 1.0.0
Date Sat, 10 Mar 2018 11:32:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16394135#comment-16394135
] 

Alexander Ivanichev commented on KAFKA-6631:
--------------------------------------------

Hi Guozhang,

First of all , thank you for your help, i just tried what you suggested and increased max.message.bytes
to 15MB, however the issue still remains !

I'm no longer see any errors in any of Kafka brokers, however when i try to start the stream
app with 13 workers i get this error, i must add that max size of record in our input topic
is 6000 bytes.

However because of scale demand we have large number of partitions for that topic  - 150
partitions. Actually i think we started to experience this issue when we increased the partition
amount from 100 to 150, but it's seems that only our kafka streams app was effected by this
change, other apps that use normal consumer groups works just fine, so it's really strange
and unexpected.

I must add i tried recreating everything: all topics and cleaning all streams internal topics,
but still i'm experiencing this issue, i believe there some kind of bug in SyncGroup causes
this behaviour. 

 

> Kafka Streams - Rebalancing exception in Kafka 1.0.0
> ----------------------------------------------------
>
>                 Key: KAFKA-6631
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6631
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 1.0.0
>         Environment: Container Linux by CoreOS 1576.5.0
>            Reporter: Alexander Ivanichev
>            Priority: Critical
>
>  
> In Kafka Streams 1.0.0, we saw a strange rebalance error, our stream app performs window
based aggregations, sometimes on start when all stream workers  join the app just crash,
however if we enable only one worker than it works fine, sometime 2 workers work just fine,
but when third join the app crashes again, some critical issue with rebalance.
> {code:java}
> 018-03-08T18:51:01.226243000Z org.apache.kafka.common.KafkaException: Unexpected error
from SyncGroup: The server experienced an unexpected error when processing the request
> 2018-03-08T18:51:01.226557000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:566)
> 2018-03-08T18:51:01.226860000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:539)
> 2018-03-08T18:51:01.227328000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:808)
> 2018-03-08T18:51:01.227630000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:788)
> 2018-03-08T18:51:01.228152000Z at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
> 2018-03-08T18:51:01.228449000Z at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
> 2018-03-08T18:51:01.228897000Z at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
> 2018-03-08T18:51:01.229196000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:506)
> 2018-03-08T18:51:01.229673000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353)
> 2018-03-08T18:51:01.229971000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:268)
> 2018-03-08T18:51:01.230436000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
> 2018-03-08T18:51:01.230749000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:174)
> 2018-03-08T18:51:01.231065000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:364)
> 2018-03-08T18:51:01.231584000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
> 2018-03-08T18:51:01.231911000Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:295)
> 2018-03-08T18:51:01.232190000Z at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1138)
> 2018-03-08T18:51:01.232643000Z at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1103)
> 2018-03-08T18:51:01.233121000Z at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:851)
> 2018-03-08T18:51:01.233409000Z at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:808)
> 2018-03-08T18:51:01.233720000Z at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
> 2018-03-08T18:51:01.234196000Z at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
> 2018-03-08T18:51:01.234655000Z org.apache.kafka.common.KafkaException: Unexpected error
from SyncGroup: The server experienced an unexpected error when processing the request
> 2018-03-08T18:51:01.234972000Z exception in thread, closing process
> 2018-03-08T18:51:01.235500000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:566)
> 2018-03-08T18:51:01.235839000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:539)
> 2018-03-08T18:51:01.236336000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:808)
> 2018-03-08T18:51:01.236603000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:788)
> 2018-03-08T18:51:01.236889000Z at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
> 2018-03-08T18:51:01.237092000Z at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
> 2018-03-08T18:51:01.237531000Z at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
> 2018-03-08T18:51:01.237816000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:506)
> 2018-03-08T18:51:01.238097000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:353)
> 2018-03-08T18:51:01.238395000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:268)
> 2018-03-08T18:51:01.238698000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:214)
> 2018-03-08T18:51:01.239511000Z exception in thread, closing process
> 2018-03-08T18:51:01.239880000Z exception in thread, closing process
> 2018-03-08T18:51:01.240175000Z at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:174)
> 2018-03-08T18:51:01.240443000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:364)
> 2018-03-08T18:51:01.240764000Z at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:316)
> 2018-03-08T18:51:01.241083000Z at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:295)
> 2018-03-08T18:51:01.241367000Z at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1138)
> 2018-03-08T18:51:01.241789000Z at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1103)
> 2018-03-08T18:51:01.242075000Z at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:851)
> 2018-03-08T18:51:01.242351000Z at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:808)
> 2018-03-08T18:51:01.242641000Z at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
> 2018-03-08T18:51:01.243051000Z at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
> {code}
> On Taking a look further on brokers, I saw another exception:
> {code:java}
> Appending metadata message for group AnomalyKafkaStreams generation 12 failed due to
org.apache.kafka.common.errors.RecordTooLargeException, returning UNKNOWN error code to the
client (kafka.coordinator.group.GroupMetadataManager)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message