hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark Grover (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-3077) Insert overwrite table doesn't fail for bucketed tables and breaks bucketing
Date Fri, 01 Jun 2012 04:24:23 GMT

     [ https://issues.apache.org/jira/browse/HIVE-3077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mark Grover updated HIVE-3077:
------------------------------

    Description: 
If table my_table is bucketed, the command "insert into table my_table ..." is supposed to
give an error stating "Bucketized tables do not support INSERT INTO".

However, it doesn't seem to do that in all cases.
Consider the following example on Hive 0.9.0:
create table src(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 
create table dest(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 

Now, put some data into x (after enable hive.enforce.bucketing and hive.enforce.sorting to
be true).

Then, do:
insert into table dest select * from src; 

This should fail since dest is a bucketized table. However, this succeeds creating a 33rd
file inside the HDFS folder for the table, thereby corrupting it.

This happens regardless of whether the src table is bucketed or not.

  was:
If table my_table is bucketed, the command "insert into table my_table ..." is supposed to
give an error stating "Bucketized tables do not support INSERT INTO".

However, it doesn't seem to do that in all cases.
Consider the following example on Hive 0.9.0:
create table src(x string) clustered by(x) sorted by (x) into 32 buckets; 
create table dest(x string) clustered by(x) sorted by (x) into 32 buckets; 

Now, put some data into x (after enable hive.enforce.bucketing and hive.enforce.sorting to
be true).

Then, do:
insert into table dest select * from src; 

This should fail since dest is a bucketized table. However, this succeeds creating a 33rd
file inside the HDFS folder for the table, thereby corrupting it.

This happens regardless of whether the src table is bucketed or not.

    
> Insert overwrite table doesn't fail for bucketed tables and breaks bucketing
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-3077
>                 URL: https://issues.apache.org/jira/browse/HIVE-3077
>             Project: Hive
>          Issue Type: Bug
>          Components: CLI
>    Affects Versions: 0.8.0, 0.8.1, 0.9.0, 0.10.0, 0.9.1
>         Environment: java version "1.6.0_30"
> hive version 0.9.0
> hadoop version 0.20.205.0
>            Reporter: Mark Grover
>
> If table my_table is bucketed, the command "insert into table my_table ..." is supposed
to give an error stating "Bucketized tables do not support INSERT INTO".
> However, it doesn't seem to do that in all cases.
> Consider the following example on Hive 0.9.0:
> create table src(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 
> create table dest(x string) clustered by( x ) sorted by ( x ) into 32 buckets; 
> Now, put some data into x (after enable hive.enforce.bucketing and hive.enforce.sorting
to be true).
> Then, do:
> insert into table dest select * from src; 
> This should fail since dest is a bucketized table. However, this succeeds creating a
33rd file inside the HDFS folder for the table, thereby corrupting it.
> This happens regardless of whether the src table is bucketed or not.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message