flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaoxuan Wang (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-6216) DataStream unbounded groupby aggregate with early firing
Date Wed, 29 Mar 2017 19:22:41 GMT
Shaoxuan Wang created FLINK-6216:

             Summary: DataStream unbounded groupby aggregate with early firing
                 Key: FLINK-6216
                 URL: https://issues.apache.org/jira/browse/FLINK-6216
             Project: Flink
          Issue Type: New Feature
          Components: Table API & SQL
            Reporter: Shaoxuan Wang
            Assignee: Shaoxuan Wang

Groupby aggregate results in a replace table. For infinite groupby aggregate, we need a mechanism
to define when the data should be emitted (early-fired). This task is aimed to implement the
initial version of unbounded groupby aggregate, where we update and emit aggregate value per
each arrived record. In the future, we will implement the mechanism and interface to let user
define the frequency/period of early-firing the unbounded groupby aggregation results.

The limit space of backend state is one of major obstacles for supporting unbounded groupby
aggregate in practical. Due to this reason, we suggest two common (and very useful) use-cases
of this unbounded groupby aggregate:
1. The range of grouping key is limit. In this case, a new arrival record will either insert
to state as new record or replace the existing record in the backend state. The data in the
backend state will not be evicted if the resource is properly provisioned by the user, such
that we can provision the correctness on aggregation results.
2. When the grouping key is unlimited, we will not be able ensure the 100% correctness of
"unbounded groupby aggregate". In this case, we will reply on the TTL mechanism of the RocksDB
backend state to evicted old data such that we can provision the correct results in a certain
time range.

This message was sent by Atlassian JIRA

View raw message