hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-252) Automatically add CLUSTER BY and set the number of reducers if the target table is declared with "CLUSTERED BY (xxx) INTO yyy BUCKETS"
Date Wed, 28 Jan 2009 07:27:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667954#action_12667954
] 

Joydeep Sen Sarma commented on HIVE-252:
----------------------------------------

suppose i have 

... cluster by a.x insert overwrite table T select a.x+1,...;

where T is declared clustered on first column. clearly the query does not require any modification
- but it will be hard to detect this in the compiler. or i am missing something.

did someone request this? (a little curious - since i think that the act of declaring a table
to be clustered would be typically done by an advanced user. such users can write the correct
query without a lot of compiler smarts. if, on the other hand, we want to help out the average
user - we would be a lot better served by inferring and storing  the clustering property of
the target table/partition automatically from the query - so that we can leverage it for future
plans).

> Automatically add CLUSTER BY and set the number of reducers if the target table is declared
with "CLUSTERED BY (xxx) INTO yyy BUCKETS"
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-252
>                 URL: https://issues.apache.org/jira/browse/HIVE-252
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Zheng Shao
>
> We should automatically add a "cluster by" clause to the following query with 64 reducers.
> CREATE TABLE aaa (a BIGINT, b INT)
> PARTITIONED BY(ds STRING)
> CLUSTERED BY(a) INTO 64 BUCKETS 
> STORED AS SEQUENCEFILE;
> INSERT OVERWRITE TABLE aaa PARTITION(ds='2009-01-24')
> SELECT a.a, a.b
> FROM training_set a
> WHERE a.ds = '2009-01-24';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message