flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3941) Add support for UNION (with duplicate elimination)
Date Wed, 25 May 2016 09:27:12 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3941?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15299736#comment-15299736
] 

ASF GitHub Bot commented on FLINK-3941:
---------------------------------------

Github user fhueske commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2025#discussion_r64541694
  
    --- Diff: flink-libraries/flink-table/src/main/scala/org/apache/flink/api/table/plan/nodes/dataset/DataSetUnion.scala
---
    @@ -69,16 +73,23 @@ class DataSetUnion(
           rows + metadata.getRowCount(child)
         }
     
    -    planner.getCostFactory.makeCost(rowCnt, 0, 0)
    +    planner.getCostFactory.makeCost(
    +      rowCnt,
    +      if (all) 0 else rowCnt,
    +      if (all) 0 else rowCnt)
       }
     
       override def translateToPlan(
           tableEnv: BatchTableEnvironment,
           expectedType: Option[TypeInformation[Any]]): DataSet[Any] = {
     
    -    val leftDataSet = left.asInstanceOf[DataSetRel].translateToPlan(tableEnv)
    -    val rightDataSet = right.asInstanceOf[DataSetRel].translateToPlan(tableEnv)
    -    leftDataSet.union(rightDataSet).asInstanceOf[DataSet[Any]]
    +    val leftDataSet = left.asInstanceOf[DataSetRel].translateToPlan(tableEnv, expectedType)
    +    val rightDataSet = right.asInstanceOf[DataSetRel].translateToPlan(tableEnv, expectedType)
    +    if (all) {
    +      leftDataSet.union(rightDataSet).asInstanceOf[DataSet[Any]]
    +    } else {
    +      leftDataSet.union(rightDataSet).distinct().asInstanceOf[DataSet[Any]]
    --- End diff --
    
    Oh, yes. Completely forgot about that rule... 😊 
    So, we already supported the non-all union for SQL. Only the Table API was missing the
`union()` method.
    I think there are two ways to continue: 
    - remove the `UnionToDistinctRule` from `FlinkRuleSets`
    - revert the changes on `DataSetUnion` (except of pushing down the `expectedType`) and
`DataSetUnionRule`.
    
    I am fine either ways.


> Add support for UNION (with duplicate elimination)
> --------------------------------------------------
>
>                 Key: FLINK-3941
>                 URL: https://issues.apache.org/jira/browse/FLINK-3941
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API
>    Affects Versions: 1.1.0
>            Reporter: Fabian Hueske
>            Assignee: Yijie Shen
>            Priority: Minor
>
> Currently, only UNION ALL is supported by Table API and SQL.
> UNION (with duplicate elimination) can be supported by applying a {{DataSet.distinct()}}
after the union on all fields. This issue includes:
> - Extending {{DataSetUnion}}
> - Relaxing {{DataSetUnionRule}} to translated non-all unions.
> - Extend the Table API with union() method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message