hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-252) Automatically add CLUSTER BY and set the number of reducers if the target table is declared with "CLUSTERED BY (xxx) INTO yyy BUCKETS"
Date Tue, 27 Jan 2009 20:28:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667796#action_12667796
] 

Ashish Thusoo commented on HIVE-252:
------------------------------------

I think what Zheng is suggesting is that if the DDL has a clustering then before the filesinkoperator
we introduce the plan fragment for cluster by. 

I guess we could eliminate any non needed cluster bys from the operator plan using the tree
walker so we would not incur the cost of another clustering if the data is already being clustered..

> Automatically add CLUSTER BY and set the number of reducers if the target table is declared
with "CLUSTERED BY (xxx) INTO yyy BUCKETS"
> --------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-252
>                 URL: https://issues.apache.org/jira/browse/HIVE-252
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Zheng Shao
>
> We should automatically add a "cluster by" clause to the following query with 64 reducers.
> CREATE TABLE aaa (a BIGINT, b INT)
> PARTITIONED BY(ds STRING)
> CLUSTERED BY(a) INTO 64 BUCKETS 
> STORED AS SEQUENCEFILE;
> INSERT OVERWRITE TABLE aaa PARTITION(ds='2009-01-24')
> SELECT a.a, a.b
> FROM training_set a
> WHERE a.ds = '2009-01-24';

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message