Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm
Precedence: bulk
Reply-To: jira@kafka.apache.org
Date: Tue, 19 Sep 2017 17:35:00 +0000 (UTC)
From: "Matthias J. Sax (JIRA)" <jira@apache.org>
To: jira@kafka.apache.org
Message-ID: <JIRA.13077940.1496827131000.149035.1505842500424@Atlassian.JIRA>
In-Reply-To: <JIRA.13077940.1496827131000@Atlassian.JIRA>
References: <JIRA.13077940.1496827131000@Atlassian.JIRA> <JIRA.13077940.1496827131191@jira-lw-us.apache.org>
Subject: [jira] [Assigned] (KAFKA-5397) streams are not recovering from
 LockException during rebalancing
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Tue, 19 Sep 2017 17:35:07 -0000


     [ https://issues.apache.org/jira/browse/KAFKA-5397?page=3Dcom.atlassia=
n.jira.plugin.system.issuetabpanels:all-tabpanel ]

Matthias J. Sax reassigned KAFKA-5397:
--------------------------------------

    Assignee: Matthias J. Sax

> streams are not recovering from LockException during rebalancing
> ----------------------------------------------------------------
>
>                 Key: KAFKA-5397
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5397
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.2.1, 0.11.0.0
>         Environment: one node setup, confluent kafka broker v3.2.0, kafka=
-clients 0.11.0.0-SNAPSHOT, 5 threads for kafka-streams
>            Reporter: Jozef Koval
>            Assignee: Matthias J. Sax
>
> Probably continuation of #KAFKA-5167. Portions of log:
> {code}
> 2017-06-07 01:17:52,435 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-5] StreamTask                 - task [2_0] Failed offset commits {b=
rowser-aggregation-KSTREAM-MAP-0000000039-repartition-0=3DOffsetAndMetadata=
{offset=3D4725597, metadata=3D''}, browser-aggregation-KSTREAM-MAP-00000000=
52-repartition-0=3DOffsetAndMetadata{offset=3D4968164, metadata=3D''}, brow=
ser-aggregation-KSTREAM-MAP-0000000026-repartition-0=3DOffsetAndMetadata{of=
fset=3D2490506, metadata=3D''}, browser-aggregation-KSTREAM-MAP-0000000065-=
repartition-0=3DOffsetAndMetadata{offset=3D7457795, metadata=3D''}, browser=
-aggregation-KSTREAM-MAP-0000000013-repartition-0=3DOffsetAndMetadata{offse=
t=3D530888, metadata=3D''}} due to Commit cannot be completed since the gro=
up has already rebalanced and assigned the partitions to another member. Th=
is means that the time between subsequent calls to poll() was longer than t=
he configured max.poll.interval.ms, which typically implies that the poll l=
oop is spending too much time message processing. You can address this eith=
er by increasing the session timeout or by reducing the maximum size of bat=
ches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,436 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamTask                 - task [7_0] Failed offset commits {b=
rowser-aggregation-Aggregate-Counts-repartition-0=3DOffsetAndMetadata{offse=
t=3D13275085, metadata=3D''}} due to Commit cannot be completed since the g=
roup has already rebalanced and assigned the partitions to another member. =
This means that the time between subsequent calls to poll() was longer than=
 the configured max.poll.interval.ms, which typically implies that the poll=
 loop is spending too much time message processing. You can address this ei=
ther by increasing the session timeout or by reducing the maximum size of b=
atches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,488 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b=
02d-079afede6cc6-StreamThread-2] Failed to commit StreamTask 7_0 state:=20
> org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be=
 completed since the group has already rebalanced and assigned the partitio=
ns to another member. This means that the time between subsequent calls to =
poll() was longer than the configured max.poll.interval.ms, which typically=
 implies that the poll loop is spending too much time message processing. Y=
ou can address this either by increasing the session timeout or by reducing=
 the maximum size of batches returned in poll() with max.poll.records.
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato=
r$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792)
> at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$Offset=
CommitResponseHandler.handle(ConsumerCoordinator.java:738)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinato=
r$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinato=
r$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778)
>         at org.apache.kafka.clients.consumer.internals.RequestFuture$1.on=
Success(RequestFuture.java:204)
>         at org.apache.kafka.clients.consumer.internals.RequestFuture.fire=
Success(RequestFuture.java:167)
>         at org.apache.kafka.clients.consumer.internals.RequestFuture.comp=
lete(RequestFuture.java:127)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli=
ent$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.jav=
a:488)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli=
ent.firePendingCompletedRequests(ConsumerNetworkClient.java:348)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli=
ent.poll(ConsumerNetworkClient.java:262)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli=
ent.poll(ConsumerNetworkClient.java:208)
>         at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli=
ent.poll(ConsumerNetworkClient.java:184)
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato=
r.commitOffsetsSync(ConsumerCoordinator.java:605)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(Kaf=
kaConsumer.java:1146)
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit=
Offsets(StreamTask.java:307)
>         at org.apache.kafka.streams.processor.internals.StreamTask.access=
$000(StreamTask.java:49)
>         at org.apache.kafka.streams.processor.internals.StreamTask$1.run(=
StreamTask.java:268)
>         at org.apache.kafka.streams.processor.internals.StreamsMetricsImp=
l.measureLatencyNs(StreamsMetricsImpl.java:187)
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit=
Impl(StreamTask.java:259)
>         at org.apache.kafka.streams.processor.internals.StreamTask.commit=
(StreamTask.java:253)
>         at org.apache.kafka.streams.processor.internals.StreamThread.comm=
itOne(StreamThread.java:813)
>         at org.apache.kafka.streams.processor.internals.StreamThread.acce=
ss$2800(StreamThread.java:73)
>         at org.apache.kafka.streams.processor.internals.StreamThread$2.ap=
ply(StreamThread.java:795)
>         at org.apache.kafka.streams.processor.internals.StreamThread.perf=
ormOnStreamTasks(StreamThread.java:1442)
>         at org.apache.kafka.streams.processor.internals.StreamThread.comm=
itAll(StreamThread.java:787)
>         at org.apache.kafka.streams.processor.internals.StreamThread.mayb=
eCommit(StreamThread.java:776)
>         at org.apache.kafka.streams.processor.internals.StreamThread.runL=
oop(StreamThread.java:565)
>         at org.apache.kafka.streams.processor.internals.StreamThread.run(=
StreamThread.java:525)
> 2017-06-07 01:17:52,747 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamTask                 - task [7_0] Failed offset commits {b=
rowser-aggregation-Aggregate-Counts-repartition-0=3DOffsetAndMetadata{offse=
t=3D13275085, metadata=3D''}} due to Commit cannot be completed since the g=
roup has already rebalanced and assigned the partitions to another member. =
This means that the time between subsequent calls to poll() was longer than=
 the configured max.poll.interval.ms, which typically implies that the poll=
 loop is spending too much time message processing. You can address this ei=
ther by increasing the session timeout or by reducing the maximum size of b=
atches returned in poll() with max.pol
> l.records.
> 2017-06-07 01:17:52,776 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b=
02d-079afede6cc6-StreamThread-2] Failed to suspend stream task 7_0 due to: =
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be c=
ompleted since the group has already rebalanced and assigned the partitions=
 to another member. This means that the time between subsequent calls to po=
ll() was longer than the configured max.poll.interval.ms, which typically i=
mplies that the poll loop is spending too much time message processing. You=
 can address this either by increasing the session timeout or by reducing t=
he maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,781 WARN  [73e81b0b-5801-40ab-b02d-79afede6cc6-Stream=
Thread-2] StreamTask                 - task [6_3] Failed offset commits {br=
owser-aggregation-Aggregate-Texts-repartition-3=3DOffsetAndMetadata{offset=
=3D13489738, metadata=3D''}} due to Commit cannot be completed since the gr=
oup has already rebalanced and assigned the partitions to another member. T=
his means that the time between subsequent calls to poll() was longer than =
the configured max.poll.interval.ms, which typically implies that the poll =
loop is spending too much time message processing. You can address this eit=
her by increasing the session timeout or by reducing the maximum size of ba=
tches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,781 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b=
02d-079afede6cc6-StreamThread-2] Failed to suspend stream task 6_3 due to: =
org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be c=
ompleted since the group has already rebalanced and assigned the partitions=
 to another member. This means that the time between subsequent calls to po=
ll() was longer than the configured max.poll.interval.ms, which typically i=
mplies that the poll loop is spending too much time message processing. You=
 can address this either by increasing the session timeout or by reducing t=
he maximum size of batches returned in poll() with max.poll.records.
> 2017-06-07 01:17:52,782 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] ConsumerCoordinator        - User provided listener org.apache.k=
afka.streams.processor.internals.StreamThread$RebalanceListener for group b=
rowser-aggregation failed on partition revocation
> org.apache.kafka.streams.errors.StreamsException: stream-thread [73e81b0b=
-5801-40ab-b02d-079afede6cc6-StreamThread-2] failed to suspend stream tasks
>         at org.apache.kafka.streams.processor.internals.StreamThread.susp=
endTasksAndState(StreamThread.java:1134)
> at org.apache.kafka.streams.processor.internals.StreamThread.access$1800(=
StreamThread.java:73)
>         at org.apache.kafka.streams.processor.internals.StreamThread$Reba=
lanceListener.onPartitionsRevoked(StreamThread.java:218)
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato=
r.onJoinPrepare(ConsumerCoordinator.java:422)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinato=
r.joinGroupIfNeeded(AbstractCoordinator.java:353)
>         at org.apache.kafka.clients.consumer.internals.AbstractCoordinato=
r.ensureActiveGroup(AbstractCoordinator.java:310)
>         at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato=
r.poll(ConsumerCoordinator.java:297)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(Kafka=
Consumer.java:1051)
>         at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaCons=
umer.java:1016)
>         at org.apache.kafka.streams.processor.internals.StreamThread.poll=
Requests(StreamThread.java:580)
>         at org.apache.kafka.streams.processor.internals.StreamThread.runL=
oop(StreamThread.java:551)
>         at org.apache.kafka.streams.processor.internals.StreamThread.run(=
StreamThread.java:525)
> //
> 2017-06-07 01:18:15,739 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b=
02d-079afede6cc6-StreamThread-2] Could not create task 6_2. Will retry. org=
.apache.kafka.streams.errors.LockException: task [6_2] Failed to lock the s=
tate directory for task 6_2
> 2017-06-07 01:18:16,741 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b=
02d-079afede6cc6-StreamThread-2] Could not create task 7_2. Will retry. org=
.apache.kafka.streams.errors.LockException: task [7_2] Failed to lock the s=
tate directory for task 7_2
> 2017-06-07 01:18:17,745 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b=
02d-079afede6cc6-StreamThread-2] Could not create task 7_3. Will retry. org=
.apache.kafka.streams.errors.LockException: task [7_3] Failed to lock the s=
tate directory for task 7_3
> 2017-06-07 01:18:17,795 WARN  [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea=
mThread-2] StreamThread               - stream-thread [73e81b0b-5801-40ab-b=
02d-079afede6cc6-StreamThread-2] Still retrying to create tasks: [0_0, 1_0,=
 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2, 5_0, 4_1, 3_2, 5_1, 6_0, 3_3, 5_2, 4_3,=
 6_1, 7_1, 5_3, 6_2, 7_2, 7_3]
> {code}


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)