Return-Path: X-Original-To: apmail-spark-issues-archive@minotaur.apache.org Delivered-To: apmail-spark-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7ADC618D84 for ; Fri, 7 Aug 2015 21:02:46 +0000 (UTC) Received: (qmail 14891 invoked by uid 500); 7 Aug 2015 21:02:46 -0000 Delivered-To: apmail-spark-issues-archive@spark.apache.org Received: (qmail 14861 invoked by uid 500); 7 Aug 2015 21:02:46 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 14851 invoked by uid 99); 7 Aug 2015 21:02:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Aug 2015 21:02:46 +0000 Date: Fri, 7 Aug 2015 21:02:46 +0000 (UTC) From: "Ruben Ramalho (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (SPARK-9476) Kafka stream loses leader after 2h of operation MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-9476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruben Ramalho updated SPARK-9476: --------------------------------- Affects Version/s: (was: 1.4.1) 1.4.0 > Kafka stream loses leader after 2h of operation > ------------------------------------------------ > > Key: SPARK-9476 > URL: https://issues.apache.org/jira/browse/SPARK-9476 > Project: Spark > Issue Type: Bug > Components: Streaming > Affects Versions: 1.4.0 > Environment: Docker, Centos, Spark standalone, core i7, 8Gb > Reporter: Ruben Ramalho > > This seems to happen every 2h, it happens both with the direct stream and regular stream, I'm doing window operations over a 1h period (if that can help). > Here's part of the error message: > 2015-07-30 13:27:23 WARN ClientUtils$:89 - Fetching topic metadata with correlation id 10 for topics [Set(updates)] from broker [id:0,host:192.168.3.23,port:3000] failed > java.nio.channels.ClosedChannelException > at kafka.network.BlockingChannel.send(BlockingChannel.scala:100) > at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) > at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72) > at kafka.producer.SyncProducer.send(SyncProducer.scala:113) > at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58) > at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93) > at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > 2015-07-30 13:27:23 INFO SyncProducer:68 - Disconnecting from 192.168.3.23:3000 > 2015-07-30 13:27:23 WARN ConsumerFetcherManager$LeaderFinderThread:89 - [spark-group_81563e123e9f-1438259236988-fc3d82bf-leader-finder-thread], Failed to find leader for Set([updates,0]) > kafka.common.KafkaException: fetching topic metadata for topics [Set(oversight-updates)] from broker [ArrayBuffer(id:0,host:192.168.3.23,port:3000)] failed > at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:72) > at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:93) > at kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:66) > at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:60) > Caused by: java.nio.channels.ClosedChannelException > at kafka.network.BlockingChannel.send(BlockingChannel.scala:100) > at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:73) > at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:72) > at kafka.producer.SyncProducer.send(SyncProducer.scala:113) > at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:58) > After the crash I tried to communicate with kafka with a simple scala consumer and producer and have no problem at all. Spark tough needs a kafka container restart to start normal operaiton. There are no errors on the kafka log, apart from an improper closed connection. > I have been trying to solve this problem for days, I suspect this has something to do with spark. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org