storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From curtisallen <>
Subject [GitHub] incubator-storm pull request: STORM-399 update KafkaConfig.maxOffs...
Date Wed, 09 Jul 2014 21:46:54 GMT
GitHub user curtisallen opened a pull request:

    STORM-399 update KafkaConfig.maxOffsetBehind default to be Long.MAX_VALUE

    I've recently upgraded to storm and storm-kafka `0.9.2-incubating`, replacing the
spout I was using previously.
    I have a large kafka log that I needed processed. I started my topology with 
    storm.kafka.SpoutConfig spoutConfig = new SpoutConfig....
    spoutConfig.forceFromStart = true;
    I then needed to make some tweaks in my application code and restarted the topology with
spoutConfig.forceFromStart = false. Expecting to pick up where I left off in my kafka log.
Instead the kafka spout started from the latest offset. Upon investigation I found this log
message in my storm worker logs
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Read last commit offset from zookeeper:
15266940; old topology_id: ef3f1f89-f64c-4947-b6eb-0c7fb9adb9ea - new topology_id: 5747dba6-c947-4c4f-af4a-4f50a84817bf
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Last commit offset from zookeeper: 15266940
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Commit offset 22092614 is more than 100000
behind, resetting to startOffsetTime=-2
    2014-07-09 18:02:15 s.k.PartitionManager [INFO] Starting Kafka prd-use1c-pr-08-kafka-kamq-0004:4
from offset 22092614
    Digging in the storm-kafka spout I found this line
    To fix this problem I ended up setting my spout config like so
    spoutConf.maxOffsetBehind = Long.MAX_VALUE; 
    Now finally to my question.
    Why would the kafka spout skip to the latest offset if the current offset is more then
100000 behind by default?
    This seems like a bad default value, the spout literally skipped over months of data without
any warning.   
    This pull request sets the default value to `Long.MAX_VALUE`

You can merge this pull request into a Git repository by running:

    $ git pull STORM-399-kafka-spout-increase-default

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #183
commit 2b1d6cfe9b3d567234246b2e01bfb02b64131302
Author: Curtis Allen <>
Date:   2014-07-09T21:43:14Z

    STORM-399 update KafkaConfig.maxOffsetBehind default to be Long.MAX_VALUE


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message