flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6491) Add QueryConfig to specify state retention time for streaming queries
Date Wed, 10 May 2017 15:56:05 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16004893#comment-16004893

ASF GitHub Bot commented on FLINK-6491:

Github user fhueske commented on a diff in the pull request:

    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/table/plan/nodes/datastream/DataStreamGroupAggregate.scala
    @@ -100,9 +104,18 @@ class DataStreamGroupAggregate(
             inputSchema.logicalType, groupings, getRowType, namedAggregates, Nil))
    -  override def translateToPlan(tableEnv: StreamTableEnvironment): DataStream[CRow] =
    +  override def translateToPlan(
    +      tableEnv: StreamTableEnvironment,
    +      qConfig: StreamQueryConfig): DataStream[CRow] = {
    +    if (qConfig.getMinIdleStateRetentionTime < 0 || qConfig.getMaxIdleStateRetentionTime
< 0) {
    --- End diff --
    we should also check that this is not a global aggregate and only emit a warning if the
aggregate is partitioned.

> Add QueryConfig to specify state retention time for streaming queries
> ---------------------------------------------------------------------
>                 Key: FLINK-6491
>                 URL: https://issues.apache.org/jira/browse/FLINK-6491
>             Project: Flink
>          Issue Type: Bug
>          Components: Table API & SQL
>    Affects Versions: 1.3.0
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>            Priority: Critical
> By now we have a couple of streaming operators (group-windows, over-windows, non-windowed
aggregations) that require operator state. Since state is not automatically cleaned-up by
Flink, we need to add a mechanism to configure a state retention time. 
> If configured, a query will retain state for a specified period of state inactivity.
If state is not accessed within this period of time, it will be cleared. I propose to add
two parameters for this, a min and a max retention time. The min retention time specifies
the earliest time and the max retention time the latest time when state is cleared. The reasoning
for having two parameters is that we can avoid to register many timers if we have more freedom
when to discard state.
> This issue also introduces a QueryConfig object which can be passed to a streaming query,
when it is emitted to a TableSink or converted to a DataStream (append or retraction).

This message was sent by Atlassian JIRA

View raw message