flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-5498) Add support for left/right outer joins with non-equality predicates (and 1+ equality predicates)
Date Sat, 25 Mar 2017 16:55:42 GMT

    [ https://issues.apache.org/jira/browse/FLINK-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941789#comment-15941789
] 

Fabian Hueske commented on FLINK-5498:
--------------------------------------

Thanks [~lincoln.86xy], 

we might be able to tweak the plan and even avoid a sort for the GroupReduceFunction if we
execute the join with a sort merge join strategy and sort the outer side on all attributes
(with the key attributes being a prefix).

> Add support for left/right outer joins with non-equality predicates (and 1+ equality
predicates)
> ------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-5498
>                 URL: https://issues.apache.org/jira/browse/FLINK-5498
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>    Affects Versions: 1.3.0
>            Reporter: lincoln.lee
>            Assignee: lincoln.lee
>            Priority: Minor
>
> I found the expected result of a unit test case incorrect compare to that in a RDMBS,

> see flink-libraries/flink-table/src/test/scala/org/apache/flink/table/api/scala/batch/table/JoinITCase.scala
> {code:title=JoinITCase.scala}
> def testRightJoinWithNotOnlyEquiJoin(): Unit = {
>      ...
>      val ds1 = CollectionDataSets.get3TupleDataSet(env).toTable(tEnv, 'a, 'b, 'c)
>      val ds2 = CollectionDataSets.get5TupleDataSet(env).toTable(tEnv, 'd, 'e, 'f, 'g,
'h)
>      val joinT = ds1.rightOuterJoin(ds2, 'a === 'd && 'b < 'h).select('c,
'g)
>  
>      val expected = "Hello world,BCD\n"
>      val results = joinT.toDataSet[Row].collect()
>      TestBaseUtils.compareResultAsText(results.asJava, expected)
> }
> {code}
> Then I took some time to learn about the ‘outer join’ in relational databases, the
right result of above case should be(tested in SQL Server and MySQL, the results are same):
> {code}
> > select c, g from tuple3 right outer join tuple5 on a=f and b<h;
> c                                g                               
> -------------------------------- --------------------------------
> NULL                             Hallo                           
> NULL                             Hallo Welt                      
> NULL                             Hallo Welt wie                  
> NULL                             Hallo Welt wie gehts?           
> NULL                             ABC                             
> Hello world                      BCD                             
> NULL                             CDE                             
> NULL                             DEF                             
> NULL                             EFG                             
> NULL                             FGH                             
> NULL                             GHI                             
> NULL                             HIJ                             
> NULL                             IJK                             
> NULL                             JKL                             
> NULL                             KLM   
> {code}
> the join condition {{rightOuterJoin('a === 'd && 'b < 'h)}} is not equivalent
to {{rightOuterJoin('a === 'd).where('b < 'h)}}.  
> The problem is rooted in the code-generated {{JoinFunction}} (see {{DataSetJoin.translateToPlan()}},
line 188). If the join condition does not match, we must emit the outer row padded with nulls
instead of returning from the function without emitting anything.
> The code-generated {{JoinFunction}} does also include equality predicates. These should
be removed before generating the code, e.g., in {{DataSetJoinRule}} when generating the {{DataSetJoin}}
with help of {{JoinInfo.getRemaining()}}.
> More details: https://goo.gl/ngekca



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message