impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Brown (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5725) coalesce() not being fully applied with outer joins on kudu tables
Date Thu, 27 Jul 2017 00:10:00 GMT
Michael Brown created IMPALA-5725:
-------------------------------------

             Summary: coalesce() not being fully applied with outer joins on kudu tables
                 Key: IMPALA-5725
                 URL: https://issues.apache.org/jira/browse/IMPALA-5725
             Project: IMPALA
          Issue Type: Bug
          Components: Frontend
    Affects Versions: Impala 2.10.0
            Reporter: Michael Brown
            Assignee: Matthew Jacobs
            Priority: Blocker


{{SELECT COALESCE()}} on multiple arguments spanning different tables with an {{OUTTER JOIN}}
on Kudu tables is not properly being applied. This behavior is

# different relative to Kudu tables in 2.9
# different relative to the 2.10 behavior with HDFS, seemingly making this Kudu-specific
# different from Postgres, which matches the HDFS behavior, further making this seem Kudu-specific

Consider this query:
{noformat}
USE tpch_kudu;
SELECT
COALESCE(a2.n_nationkey, a1.p_size),
a2.n_nationkey,
a1.p_size
FROM part a1
LEFT JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey);
{noformat}

Some of the rows returned include:

{noformat}
+-------------------------------------+-------------+--------+
| coalesce(a2.n_nationkey, a1.p_size) | n_nationkey | p_size |
+-------------------------------------+-------------+--------+
[snip]
| 21                                  | 21          | 21     |
| 22                                  | 22          | 22     |
| 23                                  | 23          | 23     |
| 24                                  | 24          | 24     |
| NULL                                | NULL        | 25     |
| NULL                                | NULL        | 26     |
| NULL                                | NULL        | 27     |
[snip]
{noformat}

The {{COALESCE()}} column is not returning the value of {{p_size}} when its first argument,
{{n_nationkey}} is {{NULL}}. {{tpch_kudu.nation n_nationkey}} has values between 0 and 24,
hence the {{NULL}} values in that column when {{part.p_size}} is greater.

This goes away if you keep the query above but switch the ordering of the {{COALESCE()}} arguments.

I can see the same sort of problems if I write similar {{RIGHT}} or {{FULL OUTER JOIN}} queries:

{noformat}
USE tpch_kudu;
SELECT
DISTINCT
COALESCE(a2.n_nationkey, a1.p_size),
a2.n_nationkey,
a1.p_size
FROM part a1
FULL OUTER JOIN nation a2 ON (a1.p_size) = (a2.n_nationkey)
ORDER BY 1,2,3;
{noformat}

{noformat}
USE tpch_kudu;
SELECT
DISTINCT
COALESCE(a2.n_nationkey, a1.p_size),
a2.n_nationkey,
a1.p_size
FROM nation a2
RIGHT JOIN part a1 ON (a1.p_size) = (a2.n_nationkey)
ORDER BY 1,2,3;
{noformat}

Explain-level 2 plans and profiles will be attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message