hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-50) Tag columns as partitioning columns
Date Mon, 01 Dec 2008 17:29:44 GMT

     [ https://issues.apache.org/jira/browse/HIVE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ashish Thusoo updated HIVE-50:
------------------------------

    Component/s: Query Processor
    Description: 
    CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
    COMMENT 'This is a table' 
    PARTITIONED BY(dt STRING) 
    STORED AS SEQUENCEFILE; 

The goal here is to annotate a column as being a "partitioning" column. Consider pcol in the
above example. It is annotated with 'PARTITIONING', which implies that the create table
has 

PARTITIONED BY (dt, pcol)

and every write to this table has implicitly

INSERT OVERWRITE tname PARTITION (pcol='X')
WHERE output.pcol = 'X'

for every distinct value X that pcol takes.

This is ideally an addition on top of the explicit partitioning that is already in the syntax,
so that if I said

INSERT OVERWRITE tname PARTITION (dt='D')

it would still go into the partition (dt='D", pcol='Y') when the value of pcol is Y.

It would be up to the user to make sure the cardinality of these columns is reasonable, and
that enough data goes into each partition that there is some net benefit (just as it is in
the explicit case).

  was:

    CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
    COMMENT 'This is a table' 
    PARTITIONED BY(dt STRING) 
    STORED AS SEQUENCEFILE; 

The goal here is to annotate a column as being a "partitioning" column. Consider pcol in the
above example. It is annotated with 'PARTITIONING', which implies that the create table
has 

PARTITIONED BY (dt, pcol)

and every write to this table has implicitly

INSERT OVERWRITE tname PARTITION (pcol='X')
WHERE output.pcol = 'X'

for every distinct value X that pcol takes.

This is ideally an addition on top of the explicit partitioning that is already in the syntax,
so that if I said

INSERT OVERWRITE tname PARTITION (dt='D')

it would still go into the partition (dt='D", pcol='Y') when the value of pcol is Y.

It would be up to the user to make sure the cardinality of these columns is reasonable, and
that enough data goes into each partition that there is some net benefit (just as it is in
the explicit case).


> Tag columns as partitioning columns
> -----------------------------------
>
>                 Key: HIVE-50
>                 URL: https://issues.apache.org/jira/browse/HIVE-50
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Venky Iyer
>
>     CREATE TABLE tname (INT cname1, INT pcol PARTITIONING )
>     COMMENT 'This is a table' 
>     PARTITIONED BY(dt STRING) 
>     STORED AS SEQUENCEFILE; 
> The goal here is to annotate a column as being a "partitioning" column. Consider pcol
in the above example. It is annotated with 'PARTITIONING', which implies that the create table
> has 
> PARTITIONED BY (dt, pcol)
> and every write to this table has implicitly
> INSERT OVERWRITE tname PARTITION (pcol='X')
> WHERE output.pcol = 'X'
> for every distinct value X that pcol takes.
> This is ideally an addition on top of the explicit partitioning that is already in the
syntax, so that if I said
> INSERT OVERWRITE tname PARTITION (dt='D')
> it would still go into the partition (dt='D", pcol='Y') when the value of pcol is Y.
> It would be up to the user to make sure the cardinality of these columns is reasonable,
and that enough data goes into each partition that there is some net benefit (just as it is
in the explicit case).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message