flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gyfora <...@git.apache.org>
Subject [GitHub] flink pull request: New operator state interfaces
Date Sun, 21 Jun 2015 12:01:00 GMT
Github user gyfora commented on the pull request:

    Thanks for the nice feedback Stephan it makes a lot of sense what you have written :+1:

    Let me quickly react to these points separately:
    ### Current State Interface
    I am okay with keeping the current Checkpointed interfaces, but I wonder if they are equivalent
with the simple non-partitioned state abstractions. 
    Anyways, I want to do some refactor to make partitioned/non-partitioned states more transparent
on the runtime and allow mixing them. This will probably mean that I will add a boolean flag
to the getOperatorState(...) method to get either partitioned or non-partitioned state (partitioned
would throw exception if it is not a keyed stream).
    ### State Backing / Checkpointing & Restoring
    This is a very good idea, I like it :) I think a slight refactor would allow us to specify
any action to be taken during a snapshot, which would either return the handles and store
it right now, or return null after doing the proper interaction with the external store. I
 think it is safe to say that this could be an extension do be added later. 
    I see some immediate issues with the group commit logic to the external store. I wonder
when we would actually do the commit, because the fact that the operator takes a snapshot
doesnt mean that the checkpoint will be completed so we might need to roll-back. This assumes
that we can roll back on the external store as well, and that is a strong assumption. Another
way would be to commit the changes with the checkpoint committer interface (like in the KafkaSource)
but then if we fail before committing we end up with a corrupted snapshot (the system thinks
its complete but in reality the external storage doesnt contain the changes so no way to restore).
    ### API integration
    I agree completely. We need to allow mixing partitioned/nonpartitioned states as I said
in the first point.
    ### Asynchronous state checkpoints
    ### Incremental state checkpoints
    Agreed, and I even think maybe we can support per-key incremental snapshots if that is
absolutely necessary (such as windowbuffers). This could be an interface which incrementally
computes the delta after each update(...) call. This might be tricky though but should be
    ### Shared replicated state
    This would be very cool indeed :)
    I will start doing some refactoring to make it possible to incorporate the suggested changes.

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message