hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Venky Iyer (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-4591) Tag columns as partitioning columns
Date Wed, 05 Nov 2008 09:33:44 GMT
Tag columns as partitioning columns
-----------------------------------

                 Key: HADOOP-4591
                 URL: https://issues.apache.org/jira/browse/HADOOP-4591
             Project: Hadoop Core
          Issue Type: Wish
          Components: contrib/hive
            Reporter: Venky Iyer



    CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
    COMMENT 'This is a table' 
    PARTITIONED BY(dt STRING) 
    STORED AS SEQUENCEFILE; 

The goal here is to annotate a column as being a "partitioning" column. Consider pcol in the
above example. It is annotated with 'PARTITIONING', which implies that the create table
has 

PARTITIONED BY (dt, pcol)

and every write to this table has implicitly

INSERT OVERWRITE tname PARTITION (pcol='X')
WHERE output.pcol = 'X'

for every distinct value X that pcol takes.

This is ideally an addition on top of the explicit partitioning that is already in the syntax,
so that if I said

INSERT OVERWRITE tname PARTITION (dt='D')

it would still go into the partition (dt='D", pcol='Y') when the value of pcol is Y.

It would be up to the user to make sure the cardinality of these columns is reasonable, and
that enough data goes into each partition that there is some net benefit (just as it is in
the explicit case).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message