storm-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Curtis Allen (JIRA)" <>
Subject [jira] [Created] (STORM-399) Kafka Spout defaulting to latest offset when current offset is older then 100k
Date Wed, 09 Jul 2014 21:33:07 GMT
Curtis Allen created STORM-399:

             Summary: Kafka Spout defaulting to latest offset when current offset is older
then 100k
                 Key: STORM-399
             Project: Apache Storm (Incubating)
          Issue Type: Bug
    Affects Versions: 0.9.2-incubating
            Reporter: Curtis Allen
            Priority: Minor

Using storm and storm-kafka 0.9.2-incubating

In the storm kafka spout the default for maxOffsetBehind is 100000

This default is too low and causes the kafka spout to start from the latest offset instead
of the last committed offset without warning.

Producing the following log output from the storm worker processes

2014-07-09 18:02:15 s.k.PartitionManager [INFO] Read last commit
offset from zookeeper: 15266940; old topology_id:
ef3f1f89-f64c-4947-b6eb-0c7fb9adb9ea - new topology_id:
2014-07-09 18:02:15 s.k.PartitionManager [INFO] Last commit offset
from zookeeper: 15266940
2014-07-09 18:02:15 s.k.PartitionManager [INFO] Commit offset 22092614
is more than 100000 behind, resetting to startOffsetTime=-2
2014-07-09 18:02:15 s.k.PartitionManager [INFO] Starting Kafka
prd-use1c-pr-08-kafka-kamq-0004:4 from offset 22092614

To fix this problem I ended up setting spout config in my topology like so

spoutConf.maxOffsetBehind = Long.MAX_VALUE;

Why would the kafka spout skip to the latest offset if the current offset
is more then 100000 behind by default?

This seems like a bad default value, the spout literally skipped over
months of data without any warning.

Are the core contributors open to accepting a pull request that would set
the default to Long.MAX_VALUE?

This message was sent by Atlassian JIRA

View raw message