impala-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Sokalski (JIRA)" <j...@apache.org>
Subject [jira] [Created] (IMPALA-5260) Make joined tables distinct to improve performance
Date Fri, 28 Apr 2017 14:11:04 GMT
Michael Sokalski created IMPALA-5260:
----------------------------------------

             Summary: Make joined tables distinct to improve performance
                 Key: IMPALA-5260
                 URL: https://issues.apache.org/jira/browse/IMPALA-5260
             Project: IMPALA
          Issue Type: Improvement
    Affects Versions: Impala 2.6.0
            Reporter: Michael Sokalski
            Priority: Minor


Consider the following select statement:

{{select tB.bField, count(tA.aField) ct
from tableA tA
join tableB tB using (id)
where (...)
group by tB.bField
order by ct}}

if tableB has a large number of rows (but still less than tableA), performance can be orders
of magnitude slower than the equivalent query:

{{select tB.bField, count(tA.aField) ct
from tableA tA
join (select distinct bField, id[, ...] from tableB) tB using (id)
where (...)
group by tB.bField
order by ct}}

It appears to me that the slower query gets bogged down with shuttling unnecessary data between
nodes.

Is it possible, and beneficial, to make such a query improvement implicit in Impala's query
optimizer?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message