Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 03449200CF7 for ; Tue, 19 Sep 2017 19:35:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 01B8C1609DD; Tue, 19 Sep 2017 17:35:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EDE291609BF for ; Tue, 19 Sep 2017 19:35:05 +0200 (CEST) Received: (qmail 63751 invoked by uid 500); 19 Sep 2017 17:35:05 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 63740 invoked by uid 99); 19 Sep 2017 17:35:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 19 Sep 2017 17:35:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 9432D18478E for ; Tue, 19 Sep 2017 17:35:04 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id s_mGY3hgzpww for ; Tue, 19 Sep 2017 17:35:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 98BAF5FDE2 for ; Tue, 19 Sep 2017 17:35:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id EFA59E0044 for ; Tue, 19 Sep 2017 17:35:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 687FF244DE for ; Tue, 19 Sep 2017 17:35:00 +0000 (UTC) Date: Tue, 19 Sep 2017 17:35:00 +0000 (UTC) From: "Matthias J. Sax (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (KAFKA-5397) streams are not recovering from LockException during rebalancing MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 19 Sep 2017 17:35:07 -0000 [ https://issues.apache.org/jira/browse/KAFKA-5397?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax reassigned KAFKA-5397: -------------------------------------- Assignee: Matthias J. Sax > streams are not recovering from LockException during rebalancing > ---------------------------------------------------------------- > > Key: KAFKA-5397 > URL: https://issues.apache.org/jira/browse/KAFKA-5397 > Project: Kafka > Issue Type: Bug > Components: streams > Affects Versions: 0.10.2.1, 0.11.0.0 > Environment: one node setup, confluent kafka broker v3.2.0, kafka= -clients 0.11.0.0-SNAPSHOT, 5 threads for kafka-streams > Reporter: Jozef Koval > Assignee: Matthias J. Sax > > Probably continuation of #KAFKA-5167. Portions of log: > {code} > 2017-06-07 01:17:52,435 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-5] StreamTask - task [2_0] Failed offset commits {b= rowser-aggregation-KSTREAM-MAP-0000000039-repartition-0=3DOffsetAndMetadata= {offset=3D4725597, metadata=3D''}, browser-aggregation-KSTREAM-MAP-00000000= 52-repartition-0=3DOffsetAndMetadata{offset=3D4968164, metadata=3D''}, brow= ser-aggregation-KSTREAM-MAP-0000000026-repartition-0=3DOffsetAndMetadata{of= fset=3D2490506, metadata=3D''}, browser-aggregation-KSTREAM-MAP-0000000065-= repartition-0=3DOffsetAndMetadata{offset=3D7457795, metadata=3D''}, browser= -aggregation-KSTREAM-MAP-0000000013-repartition-0=3DOffsetAndMetadata{offse= t=3D530888, metadata=3D''}} due to Commit cannot be completed since the gro= up has already rebalanced and assigned the partitions to another member. Th= is means that the time between subsequent calls to poll() was longer than t= he configured max.poll.interval.ms, which typically implies that the poll l= oop is spending too much time message processing. You can address this eith= er by increasing the session timeout or by reducing the maximum size of bat= ches returned in poll() with max.poll.records. > 2017-06-07 01:17:52,436 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamTask - task [7_0] Failed offset commits {b= rowser-aggregation-Aggregate-Counts-repartition-0=3DOffsetAndMetadata{offse= t=3D13275085, metadata=3D''}} due to Commit cannot be completed since the g= roup has already rebalanced and assigned the partitions to another member. = This means that the time between subsequent calls to poll() was longer than= the configured max.poll.interval.ms, which typically implies that the poll= loop is spending too much time message processing. You can address this ei= ther by increasing the session timeout or by reducing the maximum size of b= atches returned in poll() with max.poll.records. > 2017-06-07 01:17:52,488 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamThread - stream-thread [73e81b0b-5801-40ab-b= 02d-079afede6cc6-StreamThread-2] Failed to commit StreamTask 7_0 state:=20 > org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be= completed since the group has already rebalanced and assigned the partitio= ns to another member. This means that the time between subsequent calls to = poll() was longer than the configured max.poll.interval.ms, which typically= implies that the poll loop is spending too much time message processing. Y= ou can address this either by increasing the session timeout or by reducing= the maximum size of batches returned in poll() with max.poll.records. > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato= r$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:792) > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$Offset= CommitResponseHandler.handle(ConsumerCoordinator.java:738) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinato= r$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:798) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinato= r$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:778) > at org.apache.kafka.clients.consumer.internals.RequestFuture$1.on= Success(RequestFuture.java:204) > at org.apache.kafka.clients.consumer.internals.RequestFuture.fire= Success(RequestFuture.java:167) > at org.apache.kafka.clients.consumer.internals.RequestFuture.comp= lete(RequestFuture.java:127) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli= ent$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.jav= a:488) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli= ent.firePendingCompletedRequests(ConsumerNetworkClient.java:348) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli= ent.poll(ConsumerNetworkClient.java:262) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli= ent.poll(ConsumerNetworkClient.java:208) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkCli= ent.poll(ConsumerNetworkClient.java:184) > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato= r.commitOffsetsSync(ConsumerCoordinator.java:605) > at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(Kaf= kaConsumer.java:1146) > at org.apache.kafka.streams.processor.internals.StreamTask.commit= Offsets(StreamTask.java:307) > at org.apache.kafka.streams.processor.internals.StreamTask.access= $000(StreamTask.java:49) > at org.apache.kafka.streams.processor.internals.StreamTask$1.run(= StreamTask.java:268) > at org.apache.kafka.streams.processor.internals.StreamsMetricsImp= l.measureLatencyNs(StreamsMetricsImpl.java:187) > at org.apache.kafka.streams.processor.internals.StreamTask.commit= Impl(StreamTask.java:259) > at org.apache.kafka.streams.processor.internals.StreamTask.commit= (StreamTask.java:253) > at org.apache.kafka.streams.processor.internals.StreamThread.comm= itOne(StreamThread.java:813) > at org.apache.kafka.streams.processor.internals.StreamThread.acce= ss$2800(StreamThread.java:73) > at org.apache.kafka.streams.processor.internals.StreamThread$2.ap= ply(StreamThread.java:795) > at org.apache.kafka.streams.processor.internals.StreamThread.perf= ormOnStreamTasks(StreamThread.java:1442) > at org.apache.kafka.streams.processor.internals.StreamThread.comm= itAll(StreamThread.java:787) > at org.apache.kafka.streams.processor.internals.StreamThread.mayb= eCommit(StreamThread.java:776) > at org.apache.kafka.streams.processor.internals.StreamThread.runL= oop(StreamThread.java:565) > at org.apache.kafka.streams.processor.internals.StreamThread.run(= StreamThread.java:525) > 2017-06-07 01:17:52,747 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamTask - task [7_0] Failed offset commits {b= rowser-aggregation-Aggregate-Counts-repartition-0=3DOffsetAndMetadata{offse= t=3D13275085, metadata=3D''}} due to Commit cannot be completed since the g= roup has already rebalanced and assigned the partitions to another member. = This means that the time between subsequent calls to poll() was longer than= the configured max.poll.interval.ms, which typically implies that the poll= loop is spending too much time message processing. You can address this ei= ther by increasing the session timeout or by reducing the maximum size of b= atches returned in poll() with max.pol > l.records. > 2017-06-07 01:17:52,776 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamThread - stream-thread [73e81b0b-5801-40ab-b= 02d-079afede6cc6-StreamThread-2] Failed to suspend stream task 7_0 due to: = org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be c= ompleted since the group has already rebalanced and assigned the partitions= to another member. This means that the time between subsequent calls to po= ll() was longer than the configured max.poll.interval.ms, which typically i= mplies that the poll loop is spending too much time message processing. You= can address this either by increasing the session timeout or by reducing t= he maximum size of batches returned in poll() with max.poll.records. > 2017-06-07 01:17:52,781 WARN [73e81b0b-5801-40ab-b02d-79afede6cc6-Stream= Thread-2] StreamTask - task [6_3] Failed offset commits {br= owser-aggregation-Aggregate-Texts-repartition-3=3DOffsetAndMetadata{offset= =3D13489738, metadata=3D''}} due to Commit cannot be completed since the gr= oup has already rebalanced and assigned the partitions to another member. T= his means that the time between subsequent calls to poll() was longer than = the configured max.poll.interval.ms, which typically implies that the poll = loop is spending too much time message processing. You can address this eit= her by increasing the session timeout or by reducing the maximum size of ba= tches returned in poll() with max.poll.records. > 2017-06-07 01:17:52,781 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamThread - stream-thread [73e81b0b-5801-40ab-b= 02d-079afede6cc6-StreamThread-2] Failed to suspend stream task 6_3 due to: = org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be c= ompleted since the group has already rebalanced and assigned the partitions= to another member. This means that the time between subsequent calls to po= ll() was longer than the configured max.poll.interval.ms, which typically i= mplies that the poll loop is spending too much time message processing. You= can address this either by increasing the session timeout or by reducing t= he maximum size of batches returned in poll() with max.poll.records. > 2017-06-07 01:17:52,782 ERROR [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] ConsumerCoordinator - User provided listener org.apache.k= afka.streams.processor.internals.StreamThread$RebalanceListener for group b= rowser-aggregation failed on partition revocation > org.apache.kafka.streams.errors.StreamsException: stream-thread [73e81b0b= -5801-40ab-b02d-079afede6cc6-StreamThread-2] failed to suspend stream tasks > at org.apache.kafka.streams.processor.internals.StreamThread.susp= endTasksAndState(StreamThread.java:1134) > at org.apache.kafka.streams.processor.internals.StreamThread.access$1800(= StreamThread.java:73) > at org.apache.kafka.streams.processor.internals.StreamThread$Reba= lanceListener.onPartitionsRevoked(StreamThread.java:218) > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato= r.onJoinPrepare(ConsumerCoordinator.java:422) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinato= r.joinGroupIfNeeded(AbstractCoordinator.java:353) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinato= r.ensureActiveGroup(AbstractCoordinator.java:310) > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinato= r.poll(ConsumerCoordinator.java:297) > at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(Kafka= Consumer.java:1051) > at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaCons= umer.java:1016) > at org.apache.kafka.streams.processor.internals.StreamThread.poll= Requests(StreamThread.java:580) > at org.apache.kafka.streams.processor.internals.StreamThread.runL= oop(StreamThread.java:551) > at org.apache.kafka.streams.processor.internals.StreamThread.run(= StreamThread.java:525) > // > 2017-06-07 01:18:15,739 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamThread - stream-thread [73e81b0b-5801-40ab-b= 02d-079afede6cc6-StreamThread-2] Could not create task 6_2. Will retry. org= .apache.kafka.streams.errors.LockException: task [6_2] Failed to lock the s= tate directory for task 6_2 > 2017-06-07 01:18:16,741 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamThread - stream-thread [73e81b0b-5801-40ab-b= 02d-079afede6cc6-StreamThread-2] Could not create task 7_2. Will retry. org= .apache.kafka.streams.errors.LockException: task [7_2] Failed to lock the s= tate directory for task 7_2 > 2017-06-07 01:18:17,745 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamThread - stream-thread [73e81b0b-5801-40ab-b= 02d-079afede6cc6-StreamThread-2] Could not create task 7_3. Will retry. org= .apache.kafka.streams.errors.LockException: task [7_3] Failed to lock the s= tate directory for task 7_3 > 2017-06-07 01:18:17,795 WARN [73e81b0b-5801-40ab-b02d-079afede6cc6-Strea= mThread-2] StreamThread - stream-thread [73e81b0b-5801-40ab-b= 02d-079afede6cc6-StreamThread-2] Still retrying to create tasks: [0_0, 1_0,= 2_0, 0_2, 3_0, 0_3, 4_0, 3_1, 2_2, 5_0, 4_1, 3_2, 5_1, 6_0, 3_3, 5_2, 4_3,= 6_1, 7_1, 5_3, 6_2, 7_2, 7_3] > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)