hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Sichi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1395) Table aliases are ambiguous
Date Fri, 02 Jul 2010 15:43:50 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12884724#action_12884724
] 

John Sichi commented on HIVE-1395:
----------------------------------

Actually, after thinking about it some more, it's not practical to prevent alias reuse, even
in strict mode.  Here's why.

Suppose I have

CREATE VIEW V AS SELECT * FROM BLAH T1 JOIN FLUB T2 ON T1.J=T2.K;

SELECT * FROM V T1 WHERE T1.X=3;

When we expand the view reference in the query, we'll end up with

SELECT * FROM (
    SELECT * FROM BLAH T1 JOIN FLUB T2 ON T1.J=T2.K
) V T1 WHERE T1.X=3;

And now in the expansion, T1 is legitimately duplicated, even though the person querying the
view didn't even know that T1 was used inside the view definition (in general, could be very
deep).

Expanding the view in this way is what allows us to do a lot of optimizations such as pushing
predicates (e.g. T1.X=3) all the way down into the view. 


> Table aliases are ambiguous
> ---------------------------
>
>                 Key: HIVE-1395
>                 URL: https://issues.apache.org/jira/browse/HIVE-1395
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: Adam Kramer
>             Fix For: 0.6.0, 0.7.0
>
>
> Consider this query:
> SELECT a.num FROM (
>   SELECT a.num AS num, b.num AS num2
>   FROM foo a LEFT OUTER JOIN bar b ON a.num=b.num
> ) a
> WHERE a.num2 IS NULL;
> ...in this case, the table alias 'a' is ambiguous. It could be the outer table (i.e.,
the subquery result), or it could be the inner table (foo).
> In the above case, Hive silently parses the outer reference to a as the inner reference.
The result, then, is akin to:
> SELECT foo.num FROM foo WHERE bar.num IS NULL. This is bad.
> The bigger problem, however, is that Hive even lets people use the same table alias at
multiple points in the query. We should simply throw an exception during the parse stage if
there is any ambiguity in which table is which, just like we do if the column names are ambiguous.
> Or, if for some reason we need people to be able to use 'a' to refer to multiple tables
or subqueries, it would be excellent if the exact parsing structure were made clear and added
to the wiki. In that case, I will file a separate bug JIRA to complain about how it should
be different. :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message