drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
Date Wed, 06 Apr 2016 15:40:25 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228484#comment-15228484
] 

ASF GitHub Bot commented on DRILL-4539:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/462#discussion_r58728331
  
    --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
---
    @@ -169,4 +176,223 @@ private static boolean containIdentity(List<? extends RexNode>
exps,
         }
         return true;
       }
    +
    +  /**
    +   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, RexNode, List,
List)}. Modified to rewrite
    --- End diff --
    
    Agree that we ideally should leverage the Calcite code..especially since this method is
pretty heavily used and modified periodically so keeping Drill's version of this method in
sync will be difficult. 


> Add support for Null Equality Joins
> -----------------------------------
>
>                 Key: DRILL-4539
>                 URL: https://issues.apache.org/jira/browse/DRILL-4539
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Jacques Nadeau
>            Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
>     `business`.`state` AS `state`,
>     SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
>     `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
>     `t1`.`city` AS `city`,
>     `t1`.`state` AS `state`
>   FROM (
>     SELECT `business`.`city` AS `city`,
>       `business`.`state` AS `state`,
>       `business`.`business_id` AS `business_id`,
>       SUM(`business`.`stars`) AS `X_measure__A`
>     FROM `mongo.academic`.`business` `business`
>     GROUP BY `business`.`city`,
>       `business`.`state`,
>       `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
>     `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND (`t2`.`city` IS
NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` IS NULL) AND (`t2`.`state` IS
NULL))))
> {code}
> If you look at the join condition, you'll note that the join condition is an equality
condition which also allows null=null. We should add a planning rewrite rule and execution
join option to allow null equality so that we don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message