Mailing-List: contact issues-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@flink.apache.org
Date: Mon, 13 Feb 2017 07:58:41 +0000 (UTC)
From: "sunjincheng (JIRA)" <jira@apache.org>
To: issues@flink.apache.org
Message-ID: <JIRA.13038123.1485438729000.60825.1486972721732@Atlassian.JIRA>
In-Reply-To: <JIRA.13038123.1485438729000@Atlassian.JIRA>
References: <JIRA.13038123.1485438729000@Atlassian.JIRA> <JIRA.13038123.1485438729242@jira-lw-us.apache.org>
Subject: [jira] [Commented] (FLINK-5657) Add processing time OVER RANGE
 BETWEEN UNBOUNDED PRECEDING aggregation to SQL
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
archived-at: Mon, 13 Feb 2017 12:20:50 -0000


    [ https://issues.apache.org/jira/browse/FLINK-5657?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1586=
3303#comment-15863303 ]=20

sunjincheng commented on FLINK-5657:
------------------------------------

Hi,guys=EF=BC=8CI made a preliminary implementation of this JIRA.
My approach is:
1. Calcite -> Flink
    "LogicalProject with RexOver expression" --(normalize rule)-> "Calcite'=
s LogicalWindow" --(opt rule) -> DataStreamRowWindowAggregate

2. datastreamAPI:
  a. With partitionBy situation:=20
     approach1: inputDS.map().keyby().reduce().map()=20
     approach2: inputDS.map().Keyby().process()
  b. Without paritionBy situation:=20
     inputDS.map().setParallelism(1), map has implement CheckPointedFunctio=
n.

 3. About OrderBy:
    According to the natural order of elements, procTime () use for generat=
e end-time of the window and guaranteed  pass the sql validation.

HI,[~fhueske] IMO. =E2=80=9CCalcite -> FLINK=E2=80=9D part should be rowWin=
dow related JIRAs shared part, in order to share as soon as possible, I wou=
ld like to change JIRA. into two subtasks:
  1. rowWindow with partitionBy=20
  2. rowWindow without partitionBy.

 It's that make sense for you? I would be very grateful if you could give m=
e some advices.=20

> Add processing time OVER RANGE BETWEEN UNBOUNDED PRECEDING aggregation to=
 SQL
> -------------------------------------------------------------------------=
----
>
>                 Key: FLINK-5657
>                 URL: https://issues.apache.org/jira/browse/FLINK-5657
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> The goal of this issue is to add support for OVER RANGE aggregations on p=
rocessing time streams to the SQL interface.
> Queries similar to the following should be supported:
> {code}
> SELECT=20
>   a,=20
>   SUM(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN UNBOUNDED=
 PRECEDING AND CURRENT ROW) AS sumB,
>   MIN(b) OVER (PARTITION BY c ORDER BY procTime() RANGE BETWEEN UNBOUNDED=
 PRECEDING AND CURRENT ROW) AS minB
> FROM myStream
> {code}
> The following restrictions should initially apply:
> - All OVER clauses in the same SELECT clause must be exactly the same.
> - The PARTITION BY clause is optional (no partitioning results in single =
threaded execution).
> - The ORDER BY clause may only have procTime() as parameter. procTime() i=
s a parameterless scalar function that just indicates processing time mode.
> - bounded PRECEDING is not supported (see FLINK-5654)
> - FOLLOWING is not supported.
> The restrictions will be resolved in follow up issues. If we find that so=
me of the restrictions are trivial to address, we can add the functionality=
 in this issue as well.
> This issue includes:
> - Design of the DataStream operator to compute OVER ROW aggregates
> - Translation from Calcite's RelNode representation (LogicalProject with =
RexOver expression).


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)