flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-8297) RocksDBListState stores whole list in single byte[]
Date Sun, 11 Feb 2018 21:03:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-8297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360123#comment-16360123

ASF GitHub Bot commented on FLINK-8297:

Github user je-ik commented on the issue:

    @aljoscha I (partly) reworked this PR as you suggest. There are still some unresolved
questions though:
     1) I'm not 100% sure how to cleanly support the migration between list state savepoints,
would you have any pointers on how should I address this?
     2) I didn't test the new version on actual flink job yet, it just passes tests
    I think there will be some more modifications needed, so I will test this on real data
when there is agreement on the actual implementation.
    Thanks in advance for any comments!

> RocksDBListState stores whole list in single byte[]
> ---------------------------------------------------
>                 Key: FLINK-8297
>                 URL: https://issues.apache.org/jira/browse/FLINK-8297
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.4.0, 1.3.2
>            Reporter: Jan Lukavsk√Ĺ
>            Priority: Major
> RocksDBListState currently keeps whole list of data in single RocksDB key-value pair,
which implies that the list actually must fit into memory. Larger lists are not supported
and end up with OOME or other error. The RocksDBListState could be modified so that individual
items in list are stored in separate keys in RocksDB and can then be iterated over. A simple
implementation could reuse existing RocksDBMapState, with key as index to the list and a single
RocksDBValueState keeping track of how many items has already been added to the list. Because
this implementation might be less efficient in come cases, it would be good to make it opt-in
by a construct like
> {{new RocksDBStateBackend().enableLargeListsPerKey()}}

This message was sent by Atlassian JIRA

View raw message