kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthias J. Sax (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (KAFKA-5397) streams are not recovering from LockException during rebalancing
Date Mon, 11 Dec 2017 18:04:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-5397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Matthias J. Sax reassigned KAFKA-5397:
--------------------------------------

    Assignee:     (was: Matthias J. Sax)

> streams are not recovering from LockException during rebalancing
> ----------------------------------------------------------------
>
>                 Key: KAFKA-5397
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5397
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: streams
>    Affects Versions: 0.10.2.1, 0.11.0.0
>         Environment: one node setup, confluent kafka broker v3.2.0, kafka-clients 0.11.0.0-SNAPSHOT,
5 threads for kafka-streams
>            Reporter: Jozef Koval
>
> Probably continuation of #KAFKA-5167. Portions of log:
> {code}
> 2017-06-07 01:17:52,435 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-5] StreamTask
                - task [2_0] Failed offset commits {browser-aggregation-KSTREAM-MAP-0000000039-repartition-0=OffsetAndMetadata{offset=4725597,
metadata=''}, browser-aggregation-KSTREAM-MAP-0000000052-repartition-0=OffsetAndMetadata{offset=4968164,
metadata=''}, browser-aggregation-KSTREAM-MAP-0000000026-repartition-0=OffsetAndMetadata{offset=2490506,
metadata=''}, browser-aggregation-KSTREAM-MAP-0000000065-repartition-0=OffsetAndMetadata{offset=7457795,
metadata=''}, browser-aggregation-KSTREAM-MAP-0000000013-repartition-0=OffsetAndMetadata{offset=530888,
metadata=''}} due to Commit cannot be completed since the group has already rebalanced and
assigned the partitions to another member. This means that the time between subsequent calls
to poll() was longer than the configured max.poll.interval.ms, which typically implies that
the poll loop is spending too much time message processing. You can address this either by
increasing the session timeout or by reducing the maximum size of batches returned in poll()
with max.poll.records.
> 2017-06-07 01:17:52,436 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask
                - task [7_0] Failed offset commits {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
metadata=''}} due to Commit cannot be completed since the group has already rebalanced and
assigned the partitions to another member. This means that the time between subsequent calls
to poll() was longer than the configured max.poll.interval.ms, which typically implies that
the poll loop is spending too much time message processing. You can address this either by
increasing the session timeout or by reducing the maximum size of batches returned in poll()
with max.poll.records.
> 2017-06-07 01:17:52,488 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
              - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed
to commit StreamTask 7_0 state: 
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since
the group has already rebalanced and assigned the partitions to another member. This means
that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms,
which typically implies that the poll loop is spending too much time message processing. You
can address this either by increasing the session timeout or by reducing the maximum size
of batches returned in poll() with max.poll.records.
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:738)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778)
>         at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:204)
>         at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:167)
>         at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:127)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:488)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:348)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:262)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:208)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:184)
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:605)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1146)
>         at org.apache.kafka.streams.processor.internals.StreamTask.commitOffsets(StreamTask.java:307)
>         at org.apache.kafka.streams.processor.internals.StreamTask.access$000(StreamTask.java:49)
>         at org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:268)
>         at org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:187)
>         at org.apache.kafka.streams.processor.internals.StreamTask.commitImpl(StreamTask.java:259)
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:253)
>         at org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:813)
>         at org.apache.kafka.streams.processor.internals.StreamThread.access$2800(StreamThread.java:73)
>         at org.apache.kafka.streams.processor.internals.StreamThread$2.apply(StreamThread.java:795)
>         at org.apache.kafka.streams.processor.internals.StreamThread.performOnStreamTasks(StreamThread.java:1442)
>         at org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:787)
>         at org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:776)
>         at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:565)
>         at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
> 2017-06-07 01:17:52,747 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamTask
                - task [7_0] Failed offset commits {browser-aggregation-Aggregate-Counts-repartition-0=OffsetAndMetadata{offset=13275085,
metadata=''}} due to Commit cannot be completed since the group has already rebalanced and
assigned the partitions to another member. This means that the time between subsequent calls
to poll() was longer than the configured max.poll.interval.ms, which typically implies that
the poll loop is spending too much time message processing. You can address this either by
increasing the session timeout or by reducing the maximum size of batches returned in poll()
with max.pol
> l.records.
> 2017-06-07 01:17:52,776 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
              - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed
to suspend stream task 7_0 due to: org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions
to another member. This means that the time between subsequent calls to poll() was longer
than the configured max.poll.interval.ms, which typically implies that the poll loop is spending
too much time message processing. You can address this either by increasing the session timeout
or by reducing the maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,781 WARN  [73e81b0b-5801-40ab-b02d-79afede6cc6-StreamThread-2] StreamTask
                - task [6_3] Failed offset commits {browser-aggregation-Aggregate-Texts-repartition-3=OffsetAndMetadata{offset=13489738,
metadata=''}} due to Commit cannot be completed since the group has already rebalanced and
assigned the partitions to another member. This means that the time between subsequent calls
to poll() was longer than the configured max.poll.interval.ms, which typically implies that
the poll loop is spending too much time message processing. You can address this either by
increasing the session timeout or by reducing the maximum size of batches returned in poll()
with max.poll.records.
> 2017-06-07 01:17:52,781 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
              - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Failed
to suspend stream task 6_3 due to: org.apache.kafka.clients.consumer.CommitFailedException:
Commit cannot be completed since the group has already rebalanced and assigned the partitions
to another member. This means that the time between subsequent calls to poll() was longer
than the configured max.poll.interval.ms, which typically implies that the poll loop is spending
too much time message processing. You can address this either by increasing the session timeout
or by reducing the maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,782 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] ConsumerCoordinator
       - User provided listener org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener
for group browser-aggregation failed on partition revocation
> org.apache.kafka.streams.errors.StreamsException: stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2]
failed to suspend stream tasks
>         at org.apache.kafka.streams.processor.internals.StreamThread.suspendTasksAndState(StreamThread.java:1134)
> at org.apache.kafka.streams.processor.internals.StreamThread.access$1800(StreamThread.java:73)
>         at org.apache.kafka.streams.processor.internals.StreamThread$RebalanceListener.onPartitionsRevoked(StreamThread.java:218)
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinPrepare(ConsumerCoordinator.java:422)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:353)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:310)
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:297)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1051)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1016)
>         at org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:580)
>         at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:551)
>         at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:525)
> //
> 2017-06-07 01:18:15,739 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
              - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could
not create task 6_2. Will retry. org.apache.kafka.streams.errors.LockException: task [6_2]
Failed to lock the state directory for task 6_2
> 2017-06-07 01:18:16,741 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
              - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could
not create task 7_2. Will retry. org.apache.kafka.streams.errors.LockException: task [7_2]
Failed to lock the state directory for task 7_2
> 2017-06-07 01:18:17,745 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
              - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Could
not create task 7_3. Will retry. org.apache.kafka.streams.errors.LockException: task [7_3]
Failed to lock the state directory for task 7_3
> 2017-06-07 01:18:17,795 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] StreamThread
              - stream-thread [73e81b0b-5801-40ab-b02d-079afede6cc6-StreamThread-2] Still
retrying to create tasks: [0_0, 1_0, 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2, 5_0, 4_1, 3_2, 5_1,
6_0, 3_3, 5_2, 4_3, 6_1, 7_1, 5_3, 6_2, 7_2, 7_3]
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message