kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elias Levy (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-4683) Mismatch between Stream windowed store and broker log retention logic
Date Sat, 21 Jan 2017 01:33:26 GMT
Elias Levy created KAFKA-4683:
---------------------------------

             Summary: Mismatch between Stream windowed store and broker log retention logic
                 Key: KAFKA-4683
                 URL: https://issues.apache.org/jira/browse/KAFKA-4683
             Project: Kafka
          Issue Type: Bug
          Components: log, streams
    Affects Versions: 0.10.1.1
            Reporter: Elias Levy


The RocksDBWindowStore keeps key-value entries for a configurable retention period.  The leading
edge of the time period kept is determined the newest timestamp of an inserted KV.  The trailing
edge is this leading edge minus the requested retention period.

If logging is enabled, changes to the store are written to a change log topic that is configured
with a retention.ms value equal to the store retention period.  The leading edge of the time
period kept by the log is the current time.  The trailing edge is the leading edge minus the
requested retention period.

The difference on how the leading edge is determined can result in unexpected behavior.

If the stream application is processing data older than the retention period and storing it
in a windowed store, the store will have data for the retention period looking back from the
newest timestamp of the processed message.  But the messages written to the state changeling
will almost immediately be deleted by the broker, as they will fall outside of the retention
window as it computes it.  

If the application is stopped and restarted in this state, and if the local state has been
lost of some reason, the application won't be able to recover the sate from the broker, as
it broker has deleted it.


In addition, I've noticed that there is a discrepancy on what timestamp is used between the
store and the change log.  The store will use the timestamp passed as an argument to {{put}},
or if no timestamp is passed, fallback to {{context.timestamp}}.  But {{StoreChangeLogger.logChange}}
does not take a timestamp.  Instead is always uses {{context.timestamp}} to write the change
to the broker.  Thus it is possible that the state store and the change log to use different
timestamps for the same KV.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message