spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Dutrow (JIRA)" <>
Subject [jira] [Created] (SPARK-9947) Separate Metadata and State Checkpoint Data
Date Thu, 13 Aug 2015 20:53:45 GMT
Dan Dutrow created SPARK-9947:

             Summary: Separate Metadata and State Checkpoint Data
                 Key: SPARK-9947
             Project: Spark
          Issue Type: Improvement
          Components: Streaming
            Reporter: Dan Dutrow

This is the proposal. 

The simpler direct API (the one that does not take explicit offsets) can be modified to also
pick up the initial offset from ZK if is specified. This is exactly similar to how
we find the latest or earliest offset in that API, just that instead of latest/earliest offset
of the topic we want to find the offset from the consumer group. The group offsets is ZK is
not used at all for any further processing and restarting, so the exactly-once semantics is
not broken. 

The use case where this is useful is simplified code upgrade. If the user wants to upgrade
the code, he/she can the context stop gracefully which will ensure the ZK consumer group offset
will be updated with the last offsets processed. Then the new code is started (not restarted
from checkpoint) can pickup  the consumer group offset from ZK and continue where the previous
code had left off. 

Without the functionality of picking up consumer group offsets to start (that is, currently)
the only way to do this is for the users to save the offsets somewhere (file, database, etc.)
and manage the offsets themselves. I just want to simplify this process. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message