hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-4956) Allow multiple tables in from clause if all them have the same schema, but can be partitioned differently
Date Tue, 30 Jul 2013 17:35:48 GMT

    [ https://issues.apache.org/jira/browse/HIVE-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13724133#comment-13724133
] 

Xuefu Zhang commented on HIVE-4956:
-----------------------------------

The syntax, "select ... from T1, T2 ..." without join, might cause semantic confusion as in
some databases it really means crossing (Cartesian production), which has a different meaning
from yours. From a database point of view, A table is a table, and two table are two tables.
Treating two tables as one seems going beyond what SQL defines. It might be conceptually clearer
if we allow tables have heterogeneous partitions. Of course, this may be more involved.
                
> Allow multiple tables in from clause if all them have the same schema, but can be partitioned
differently
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-4956
>                 URL: https://issues.apache.org/jira/browse/HIVE-4956
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Amareshwari Sriramadasu
>            Assignee: Amareshwari Sriramadasu
>
> We have a usecase where the table storage partitioning changes over time.
> For ex:
>  we can have a table T1 which is partitioned by p1. But overtime, we want to partition
the table on p1 and p2 as well. The new table can be T2. So, if we have to query table on
partition p1, it will be a union query across two table T1 and T2. Especially with aggregations
like avg, it becomes costly union query because we cannot make use of mapside aggregations
and other optimizations.
> The proposal is to support queries of the following format :
> select t.x, t.y, .... from T1,T2 t where t.p1='x' OR t.p1='y' ... [groupby-clause] [having-clause]
[orderby-clause] and so on.
> Here we allow from clause as a comma separated list of tables with an alias and alias
will be used in the full query, and partition pruning will happen on the actual tables to
pick up the right paths. This will work because the difference is only on picking up the input
paths and whole operator tree does not change. If this sounds a good usecase, I can put up
the changes required to support the same.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message