drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5920) Drill incorrectly projects column aliases to scan operator
Date Wed, 01 Nov 2017 18:40:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234545#comment-16234545
] 

Paul Rogers commented on DRILL-5920:
------------------------------------

Turns out that this problem is due to a misunderstanding of how SQL works.

Consider the original unit test query:

{code}
select max(columns[1]) as col1
from cp.`textinput/input1.csv`
where col1 is not null
{code}

The query is incorrectly using a column alias in the {{WHERE}} clause.

To quote from a Google search:

bq. Standard SQL disallows references to column aliases in a WHERE clause. This restriction
is imposed because when the WHERE clause is evaluated, the columnvalue may not yet have been
determined. column_alias can be used in an ORDER BY clause, but it cannot be used in a WHERE,
GROUP BY, or HAVING clause

Thanks to [~amansinha100] for pointing out this fact.

> Drill incorrectly projects column aliases to scan operator
> ----------------------------------------------------------
>
>                 Key: DRILL-5920
>                 URL: https://issues.apache.org/jira/browse/DRILL-5920
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Major
>
> The {{TestNewTextReader.ensureColumnNameDisplayedinError}} unit test runs this query:
> {code}
> select max(columns[1]) as col1
> from cp.`textinput/input1.csv`
> where col1 is not null
> {code}
> The following appears in the {{SubScan}} for the {{TextFormatPlugin}}:
> {noformat}
> [`col1`, `columns`[1]]
> {noformat}
> This is clearly wrong. The actual table column is {{columns}} (and, specifically, element
1.) {{col1} is an alias that should never have been pushed down to the data source because
the data source does not know about aliases.
> Further, the projection list makes no distinction between the "real" and "alias" columns,
so, to the data source, both look like real table columns.
> The current workaround is to create a nullable int column for {{col1}} which is, presumably,
replaced by a later projection operator.
> Because this behavior is wrong, we must think though all the possible failure cases and
how to handle them in this incorrect design. What if the alias matches an (expensive) table
column? What if the alias is the same as some base column in the same query?
> {code}
> SELECT a as b, b as c FROM ...
> {code}
> Incorrect name handling may work in many cases, but it does lead to problems because
the behavior is not following the accepted SQL standards.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message