flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yu Li (Jira)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-12692) Support disk spilling in HeapKeyedStateBackend
Date Sat, 11 Apr 2020 08:02:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-12692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17081207#comment-17081207

Yu Li commented on FLINK-12692:

Thanks for the interest [~aldu29], we're now testing it and will upload the binary by end
of April.

> Support disk spilling in HeapKeyedStateBackend
> ----------------------------------------------
>                 Key: FLINK-12692
>                 URL: https://issues.apache.org/jira/browse/FLINK-12692
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / State Backends
>            Reporter: Yu Li
>            Assignee: Yu Li
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.11.0
>          Time Spent: 10m
>  Remaining Estimate: 0h
> {{HeapKeyedStateBackend}} is one of the two {{KeyedStateBackends}} in Flink, since state
lives as Java objects on the heap and the de/serialization only happens during state snapshot
and restore, it outperforms {{RocksDBKeyedStateBackend}} when all data could reside in memory.
> However, along with the advantage, {{HeapKeyedStateBackend}} also has its shortcomings,
and the most painful one is the difficulty to estimate the maximum heap size (Xmx) to set,
and we will suffer from GC impact once the heap memory is not enough to hold all state data.
There’re several (inevitable) causes for such scenario, including (but not limited to):
> * Memory overhead of Java object representation (tens of times of the serialized data
> * Data flood caused by burst traffic.
> * Data accumulation caused by source malfunction.
> To resolve this problem, we propose a solution to support spilling state data to disk
before heap memory is exhausted. We will monitor the heap usage and choose the coldest data
to spill, and reload them when heap memory is regained after data removing or TTL expiration,
> More details please refer to the design doc and mailing list discussion.

This message was sent by Atlassian Jira

View raw message