drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5371) Large run-time overhead for nested SELECT queries
Date Mon, 20 Mar 2017 21:58:41 GMT
Paul Rogers created DRILL-5371:
----------------------------------

             Summary: Large run-time overhead for nested SELECT queries
                 Key: DRILL-5371
                 URL: https://issues.apache.org/jira/browse/DRILL-5371
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.10.0
            Reporter: Paul Rogers


See DRILL-5370 - a test in which Drill was stress-tested with nested SELECT queries of ever-increasing
size.

Semantically, the query does nothing other than:

SELECT a AS b AS c AS ... AS z FROM foo;

The above is not valid SQL, of course, but it shows that the nested SELECTs do nothing other
than create static aliases for columns, and do so many times via layers of nested SELECTs.

{code}
SELECT y AS z FROM
    (SELECT x AS y FROM
        (SELECT w AS x FROM ...
                           (SELECT a FROM someTable))))...))
{code}

Because the nested selects do not actual processing, only impose aliases, the optimizer should
be able to optimize away the aliasing. That is, there should be no need for any run-time work
to simply change the name of a column.

However, when run (with 200 columns, each with 500 character names, but only 10 rows), the
overhead in a debug build is somewhere between 1/2 and 1 second per nesting.

That is, for just 10 rows, each layer of nested SELECT adds about 1 second to the execution
time.

Queries of this form may be pathological if written by humans. But, they are typical of queries
generated by BI tools. Hence, Drill performance for such tools can be increased simply by
avoiding doing unnecessary work.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message