flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5715) Asynchronous snapshotting for HeapKeyedStateBackend
Date Thu, 09 Mar 2017 10:11:38 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902829#comment-15902829

ASF GitHub Bot commented on FLINK-5715:

Github user StephanEwen commented on the issue:

    I think this is all in all very good code!
    One thing I am worried about is the testing time now. The `EventTimeWindowCheckpointingITCase`
tests already take super long, now we have two more.
    What we should probably do is make the following:
      - The data volume is very high in that test, and I think that was mainly done to stress
RocksDB's async snapshots a bit.
      - The heaviness can be moved to a RocksDB specific async snapshot test (that does not
need to use windows)
      - The base of the EventTimeWindowCheckpointingITCases can then be made much more lightweight.

> Asynchronous snapshotting for HeapKeyedStateBackend
> ---------------------------------------------------
>                 Key: FLINK-5715
>                 URL: https://issues.apache.org/jira/browse/FLINK-5715
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing
>    Affects Versions: 1.3.0
>            Reporter: Stefan Richter
>            Assignee: Stefan Richter
> Blocking snapshots render the HeapKeyedStateBackend practically unusable for many user
in productions. Their jobs can not tolerate stopped processing for the time it takes to write
gigabytes of data from memory to disk. Asynchronous snapshots would be a solution to this
problem. The challenge for the implementation is coming up with a copy-on-write scheme for
the in-memory hash maps that build the foundation of this backend. After taking a closer look,
this problem is twofold. First, providing CoW semantics for the hashmap itself, as a mutible
structure, thereby avoiding costly locking or blocking where possible. Second, CoW for the
mutable value objects, e.g. through cloning via serializers.  

This message was sent by Atlassian JIRA

View raw message