flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6219) Add a state backend which supports sorting
Date Fri, 31 Mar 2017 08:56:42 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15950557#comment-15950557
] 

Fabian Hueske commented on FLINK-6219:
--------------------------------------

The problem with `ValueState` is that it always deserializes the whole state.
The access pattern that we would need for OVER windows in the Table API is as follows:

- efficient insert into sorted order without deserializing the whole state
- iteration over the state in sorted order with deserialization as needed (not the whole state
at once)
- efficient delete from the head of the sorted queue without deserializing the whole state.

In fact, we would only need to sort on time ({{long}}) so our use case would benefit from
a {{TimeSortedQueue}} which might be easier and more efficiently to implement because it does
not need a custom comparator.

> Add a state backend which supports sorting
> ------------------------------------------
>
>                 Key: FLINK-6219
>                 URL: https://issues.apache.org/jira/browse/FLINK-6219
>             Project: Flink
>          Issue Type: New Feature
>          Components: State Backends, Checkpointing, Table API & SQL
>            Reporter: sunjincheng
>
> When we implement the OVER window of [FLIP11|https://cwiki.apache.org/confluence/display/FLINK/FLIP-11%3A+Table+API+Stream+Aggregations]
> We notice that we need a state backend which supports sorting, allows for efficient insertion,
traversal in order, and removal from the head. 
> For example: In event-time OVER window, we need to sort by time,If the datas as follow:
> {code}
> (1L, 1, Hello)
> (2L, 2, Hello)
> (5L, 5, Hello)
> (4L, 4, Hello)
> {code}
> We randomly insert the datas, just like:
> {code}
> put((2L, 2, Hello)),put((1L, 1, Hello)),put((5L, 5, Hello)),put((4L, 4, Hello)),
> {code}
> We deal with elements in time order:
> {code}
> process((1L, 1, Hello)),process((2L, 2, Hello)),process((4L, 4, Hello)),process((5L,
5, Hello))
> {code}
> Welcome anyone to give feedback,And what do you think? [~xiaogang.shi] [~aljoscha]
[~fhueske] 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message