Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 0F89F200B87 for ; Mon, 19 Sep 2016 22:05:23 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 09B4A160ADF; Mon, 19 Sep 2016 20:05:23 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 4FBD0160ABB for ; Mon, 19 Sep 2016 22:05:22 +0200 (CEST) Received: (qmail 71040 invoked by uid 500); 19 Sep 2016 20:05:20 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 70750 invoked by uid 99); 19 Sep 2016 20:05:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Sep 2016 20:05:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 96E092C1B82 for ; Mon, 19 Sep 2016 20:05:20 +0000 (UTC) Date: Mon, 19 Sep 2016 20:05:20 +0000 (UTC) From: "Guozhang Wang (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-3752) Provide a way for KStreams to recover from unclean shutdown MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 19 Sep 2016 20:05:23 -0000 [ https://issues.apache.org/jira/browse/KAFKA-3752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504535#comment-15504535 ] Guozhang Wang commented on KAFKA-3752: -------------------------------------- [~theduderog] Could you try to validate if this issue is already fixed? There are a couple of tickets related to this issue that is just merged recently. > Provide a way for KStreams to recover from unclean shutdown > ----------------------------------------------------------- > > Key: KAFKA-3752 > URL: https://issues.apache.org/jira/browse/KAFKA-3752 > Project: Kafka > Issue Type: Sub-task > Components: streams > Affects Versions: 0.10.0.0 > Reporter: Roger Hoover > Labels: architecture > > If a KStream application gets killed with SIGKILL (e.g. by the Linux OOM Killer), it may leave behind lock files and fail to recover. > It would be useful to have an options (say --force) to tell KStreams to proceed even if it finds old LOCK files. > {noformat} > [2016-05-24 17:37:52,886] ERROR Failed to create an active task #0_0 in thread [StreamThread-1]: (org.apache.kafka.streams.processor.internals.StreamThread:583) > org.apache.kafka.streams.errors.ProcessorStateException: Error while creating the state manager > at org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:71) > at org.apache.kafka.streams.processor.internals.StreamTask.(StreamTask.java:86) > at org.apache.kafka.streams.processor.internals.StreamThread.createStreamTask(StreamThread.java:550) > at org.apache.kafka.streams.processor.internals.StreamThread.addStreamTasks(StreamThread.java:577) > at org.apache.kafka.streams.processor.internals.StreamThread.access$000(StreamThread.java:68) > at org.apache.kafka.streams.processor.internals.StreamThread$1.onPartitionsAssigned(StreamThread.java:123) > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:222) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:232) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$1.onSuccess(AbstractCoordinator.java:227) > at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at org.apache.kafka.clients.consumer.internals.RequestFuture$2.onSuccess(RequestFuture.java:182) > at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:436) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$SyncGroupResponseHandler.handle(AbstractCoordinator.java:422) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658) > at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167) > at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133) > at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426) > at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192) > at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163) > at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243) > at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345) > at org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977) > at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937) > at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295) > at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218) > Caused by: java.io.IOException: Failed to lock the state directory: /data/test/2/kafka-streams/test-2/0_0 > at org.apache.kafka.streams.processor.internals.ProcessorStateManager.(ProcessorStateManager.java:95) > at org.apache.kafka.streams.processor.internals.AbstractTask.(AbstractTask.java:69) > ... 32 more > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)