drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4539) Add support for Null Equality Joins
Date Wed, 06 Apr 2016 15:49:25 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228503#comment-15228503
] 

ASF GitHub Bot commented on DRILL-4539:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/462#discussion_r58730151
  
    --- Diff: exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
    @@ -215,6 +192,36 @@ public void eval() {
         }
       }
     
    +  <#-- IS_DISTINCT_FROM function -->
    +  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
    --- End diff --
    
    @vkorukanti, I want to clarify...if the query only had a join condition with IS_NOT_DISTINCT_FROM,
I would think it should work just with your convertlet changes, since both HashJoin and MergeJoin
handle this type of join condition.  Is the reason you had to implement the full comparator
codegen to handle more general types of comparisons ?  e.g in the SELECT list if I say  'SELECT
 a IS NOT DISTINCT FROM b'  ?    Suppose we had a convertlet that only preserved the IS (NOT)
DISTINCT FROM join condition, and defaulted to the Calcite rewrite using the CASE expression,
then we would not have to implement the full comparator. 


> Add support for Null Equality Joins
> -----------------------------------
>
>                 Key: DRILL-4539
>                 URL: https://issues.apache.org/jira/browse/DRILL-4539
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Jacques Nadeau
>            Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
>     `business`.`state` AS `state`,
>     SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
>     `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
>     `t1`.`city` AS `city`,
>     `t1`.`state` AS `state`
>   FROM (
>     SELECT `business`.`city` AS `city`,
>       `business`.`state` AS `state`,
>       `business`.`business_id` AS `business_id`,
>       SUM(`business`.`stars`) AS `X_measure__A`
>     FROM `mongo.academic`.`business` `business`
>     GROUP BY `business`.`city`,
>       `business`.`state`,
>       `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
>     `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND (`t2`.`city` IS
NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` IS NULL) AND (`t2`.`state` IS
NULL))))
> {code}
> If you look at the join condition, you'll note that the join condition is an equality
condition which also allows null=null. We should add a planning rewrite rule and execution
join option to allow null equality so that we don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message