Return-Path: X-Original-To: apmail-storm-dev-archive@minotaur.apache.org Delivered-To: apmail-storm-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F95F11412 for ; Thu, 24 Jul 2014 21:57:01 +0000 (UTC) Received: (qmail 80716 invoked by uid 500); 24 Jul 2014 21:57:00 -0000 Delivered-To: apmail-storm-dev-archive@storm.apache.org Received: (qmail 80667 invoked by uid 500); 24 Jul 2014 21:57:00 -0000 Mailing-List: contact dev-help@storm.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@storm.incubator.apache.org Delivered-To: mailing list dev@storm.incubator.apache.org Received: (qmail 80655 invoked by uid 99); 24 Jul 2014 21:57:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 21:57:00 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 24 Jul 2014 21:56:59 +0000 Received: (qmail 79473 invoked by uid 99); 24 Jul 2014 21:56:39 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 24 Jul 2014 21:56:39 +0000 Date: Thu, 24 Jul 2014 21:56:39 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: dev@storm.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (STORM-399) Kafka Spout defaulting to latest offset when current offset is older then 100k MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/STORM-399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073713#comment-14073713 ] ASF GitHub Bot commented on STORM-399: -------------------------------------- Github user d2r commented on the pull request: https://github.com/apache/incubator-storm/pull/183#issuecomment-50083706 +1 on this change. @d-t-w That seems perfectly reasonable. Do you want to file a JIRA to implement different configurable strategies? > Kafka Spout defaulting to latest offset when current offset is older then 100k > ------------------------------------------------------------------------------ > > Key: STORM-399 > URL: https://issues.apache.org/jira/browse/STORM-399 > Project: Apache Storm (Incubating) > Issue Type: Bug > Affects Versions: 0.9.2-incubating > Reporter: Curtis Allen > Priority: Minor > > Using storm and storm-kafka 0.9.2-incubating > In the storm kafka spout the default for maxOffsetBehind is 100000 > see https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/KafkaConfig.java#L38 > This default is too low and causes the kafka spout to start from the latest offset instead of the last committed offset without warning. > see https://github.com/apache/incubator-storm/blob/v0.9.2-incubating/external/storm-kafka/src/jvm/storm/kafka/PartitionManager.java#L95 > Producing the following log output from the storm worker processes > 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Read last commit > offset from zookeeper: 15266940; old topology_id: > ef3f1f89-f64c-4947-b6eb-0c7fb9adb9ea - new topology_id: > 5747dba6-c947-4c4f-af4a-4f50a84817bf > 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Last commit offset > from zookeeper: 15266940 > 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Commit offset 22092614 > is more than 100000 behind, resetting to startOffsetTime=-2 > 2014-07-09 18:02:15 s.k.PartitionManager [INFO] Starting Kafka > prd-use1c-pr-08-kafka-kamq-0004:4 from offset 22092614 > To fix this problem I ended up setting spout config in my topology like so > spoutConf.maxOffsetBehind = Long.MAX_VALUE; > Why would the kafka spout skip to the latest offset if the current offset > is more then 100000 behind by default? > This seems like a bad default value, the spout literally skipped over > months of data without any warning. > Are the core contributors open to accepting a pull request that would set > the default to Long.MAX_VALUE? -- This message was sent by Atlassian JIRA (v6.2#6252)