flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shaoxuan Wang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-5564) User Defined Aggregates
Date Wed, 25 Jan 2017 06:47:26 GMT

     [ https://issues.apache.org/jira/browse/FLINK-5564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shaoxuan Wang updated FLINK-5564:
---------------------------------
    Description: 
User-defined aggregates would be a great addition to the Table API / SQL.
The current aggregate interface is not well suited for the external users.  This issue proposes
to redesign the aggregate such that we can expose an better external UDAGG interface to the
users. The detailed design proposal can be found here: https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit

Motivation:
1. The current aggregate interface is not very concise to the users. One needs to know the
design details of the intermediate Row buffer before implements an Aggregate. Seven functions
are needed even for a simple Count aggregate.
2. Another limitation of current aggregate function is that it can only be applied on one
single column. There are many scenarios which require the aggregate function taking multiple
columns as the inputs.
3. “Retraction” is not considered and covered in the current Aggregate.
4. It might be very good to have a local/global aggregate query plan optimization, which is
very promising to optimize UDAGG performance in some scenarios.

Proposed Changes:
1. Implement an aggregate dataStream API (Done by [Atlassian|http://atlassian.com])
2. Update all the existing aggregates to use the new aggregate dataStream API
3. Provide a better User-Defined Aggregate interface
4. Add retraction support
5. Add local/global aggregate

  was:
User-defined aggregates would be a great addition to the Table API / SQL.
The current aggregate interface is not well suited for the external users.  This issue proposes
to redesign the aggregate such that we can expose an better external UDAGG interface to the
users. The detailed design proposal can be found here: https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit

Motivation:
1. The current aggregate interface is not very concise to the users. One needs to know the
design details of the intermediate Row buffer before implements an Aggregate. Seven functions
are needed even for a simple Count aggregate.
2. Another limitation of current aggregate function is that it can only be applied on one
single column. There are many scenarios which require the aggregate function taking multiple
columns as the inputs.
3. “Retraction” is not considered and covered in the current Aggregate.
4. It might be very good to have a local/global aggregate query plan optimization, which is
very promising to optimize UDAGG performance in some scenarios.

Proposed Changes:
1. Implement an aggregate dataStream API
2. Update all the existing aggregates to use the new aggregate dataStream API
3. Provide a better User-Design Aggregate interface
4. Add retraction support
5. Add local/global aggregate


> User Defined Aggregates
> -----------------------
>
>                 Key: FLINK-5564
>                 URL: https://issues.apache.org/jira/browse/FLINK-5564
>             Project: Flink
>          Issue Type: Improvement
>          Components: Table API & SQL
>            Reporter: Shaoxuan Wang
>            Assignee: Shaoxuan Wang
>
> User-defined aggregates would be a great addition to the Table API / SQL.
> The current aggregate interface is not well suited for the external users.  This issue
proposes to redesign the aggregate such that we can expose an better external UDAGG interface
to the users. The detailed design proposal can be found here: https://docs.google.com/document/d/19JXK8jLIi8IqV9yf7hOs_Oz67yXOypY7Uh5gIOK2r-U/edit
> Motivation:
> 1. The current aggregate interface is not very concise to the users. One needs to know
the design details of the intermediate Row buffer before implements an Aggregate. Seven functions
are needed even for a simple Count aggregate.
> 2. Another limitation of current aggregate function is that it can only be applied on
one single column. There are many scenarios which require the aggregate function taking multiple
columns as the inputs.
> 3. “Retraction” is not considered and covered in the current Aggregate.
> 4. It might be very good to have a local/global aggregate query plan optimization, which
is very promising to optimize UDAGG performance in some scenarios.
> Proposed Changes:
> 1. Implement an aggregate dataStream API (Done by [Atlassian|http://atlassian.com])
> 2. Update all the existing aggregates to use the new aggregate dataStream API
> 3. Provide a better User-Defined Aggregate interface
> 4. Add retraction support
> 5. Add local/global aggregate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message