hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-14970) repeated insert into is broken for buckets (incorrect results for tablesample)
Date Mon, 17 Oct 2016 19:42:58 GMT

     [ https://issues.apache.org/jira/browse/HIVE-14970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-14970:
------------------------------------
    Summary: repeated insert into is broken for buckets (incorrect results for tablesample)
 (was: repeated insert into is broken for buckets (incorrect results))

> repeated insert into is broken for buckets (incorrect results for tablesample)
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-14970
>                 URL: https://issues.apache.org/jira/browse/HIVE-14970
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Critical
>
> Running on a regular CLI driver
> {noformat}
> CREATE TABLE src_bucket(key STRING, value STRING) CLUSTERED BY (key) SORTED BY (key)
INTO 2 BUCKETS;
> insert into table src_bucket select key,value from srcpart limit 10;
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/src_bucket/;
> select *, INPUT__FILE__NAME from src_bucket;
> select * from src_bucket tablesample (bucket 1 out of 2) s;
> select * from src_bucket tablesample (bucket 2 out of 2) s;
> insert into table src_bucket select key,value from srcpart limit 10;
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/src_bucket/;
> select *, INPUT__FILE__NAME from src_bucket;
> select * from src_bucket tablesample (bucket 1 out of 2) s;
> select * from src_bucket tablesample (bucket 2 out of 2) s;
> {noformat}
> Results in the following (with masking disabled and grepping away the noise).
> Looks like bucket mapping completely breaks due to extra files, which may have implications
for all the optimizations that depend on them.
> This should work or at least fail if this is not supported.
> {noformat}
> PREHOOK: query: CREATE TABLE src_bucket(key STRING, value STRING) CLUSTERED BY (key)
SORTED BY (key) INTO 2 BUCKETS
> PREHOOK: query: insert into table src_bucket select key,value from srcpart limit 10
> Found 2 items
> -rwxr-xr-x   1 sergey staff         46 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> -rwxr-xr-x   1 sergey staff         68 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> PREHOOK: query: select *, INPUT__FILE__NAME from src_bucket
> 165	val_165	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 255	val_255	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 484	val_484	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 86	val_86	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 238	val_238	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 27	val_27	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 278	val_278	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 311	val_311	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 409	val_409	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 98	val_98	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> PREHOOK: query: select * from src_bucket tablesample (bucket 1 out of 2) s
> 165	val_165
> 255	val_255
> 484	val_484
> 86	val_86
> PREHOOK: query: select * from src_bucket tablesample (bucket 2 out of 2) s
> 238	val_238
> 27	val_27
> 278	val_278
> 311	val_311
> 409	val_409
> 98	val_98
> {noformat}
> So far so good.
> {noformat}
> PREHOOK: query: insert into table src_bucket select key,value from srcpart limit 10
> Found 4 items
> -rwxr-xr-x   1 sergey staff         46 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> -rwxr-xr-x   1 sergey staff         46 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> -rwxr-xr-x   1 sergey staff         68 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> -rwxr-xr-x   1 sergey staff         68 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> PREHOOK: query: select *, INPUT__FILE__NAME from src_bucket
> 165	val_165	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 255	val_255	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 484	val_484	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 86	val_86	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 165	val_165	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 255	val_255	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 484	val_484	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 86	val_86	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 238	val_238	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 27	val_27	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 278	val_278	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 311	val_311	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 409	val_409	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 98	val_98	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 238	val_238	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 27	val_27	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 278	val_278	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 311	val_311	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 409	val_409	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 98	val_98	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> PREHOOK: query: select * from src_bucket tablesample (bucket 1 out of 2) s
> 165	val_165
> 255	val_255
> 484	val_484
> 86	val_86
> PREHOOK: query: select * from src_bucket tablesample (bucket 2 out of 2) s
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message