hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-14970) repeated insert into is broken for buckets (incorrect results for tablesample)
Date Mon, 17 Oct 2016 19:43:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-14970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15583230#comment-15583230
] 

Sergey Shelukhin commented on HIVE-14970:
-----------------------------------------

According to [~gopalv] this is specific to table sample.
Perhaps we should disable or remove SamplePruner entirely, and check other places to make
sure that noone else makes this assumption (one file per bucket, in order)

> repeated insert into is broken for buckets (incorrect results for tablesample)
> ------------------------------------------------------------------------------
>
>                 Key: HIVE-14970
>                 URL: https://issues.apache.org/jira/browse/HIVE-14970
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Sergey Shelukhin
>            Priority: Critical
>
> Running on a regular CLI driver
> {noformat}
> CREATE TABLE src_bucket(key STRING, value STRING) CLUSTERED BY (key) SORTED BY (key)
INTO 2 BUCKETS;
> insert into table src_bucket select key,value from srcpart limit 10;
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/src_bucket/;
> select *, INPUT__FILE__NAME from src_bucket;
> select * from src_bucket tablesample (bucket 1 out of 2) s;
> select * from src_bucket tablesample (bucket 2 out of 2) s;
> insert into table src_bucket select key,value from srcpart limit 10;
> dfs -ls ${hiveconf:hive.metastore.warehouse.dir}/src_bucket/;
> select *, INPUT__FILE__NAME from src_bucket;
> select * from src_bucket tablesample (bucket 1 out of 2) s;
> select * from src_bucket tablesample (bucket 2 out of 2) s;
> {noformat}
> Results in the following (with masking disabled and grepping away the noise).
> Looks like bucket mapping completely breaks due to extra files, which may have implications
for all the optimizations that depend on them.
> This should work or at least fail if this is not supported.
> {noformat}
> PREHOOK: query: CREATE TABLE src_bucket(key STRING, value STRING) CLUSTERED BY (key)
SORTED BY (key) INTO 2 BUCKETS
> PREHOOK: query: insert into table src_bucket select key,value from srcpart limit 10
> Found 2 items
> -rwxr-xr-x   1 sergey staff         46 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> -rwxr-xr-x   1 sergey staff         68 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> PREHOOK: query: select *, INPUT__FILE__NAME from src_bucket
> 165	val_165	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 255	val_255	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 484	val_484	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 86	val_86	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 238	val_238	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 27	val_27	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 278	val_278	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 311	val_311	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 409	val_409	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 98	val_98	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> PREHOOK: query: select * from src_bucket tablesample (bucket 1 out of 2) s
> 165	val_165
> 255	val_255
> 484	val_484
> 86	val_86
> PREHOOK: query: select * from src_bucket tablesample (bucket 2 out of 2) s
> 238	val_238
> 27	val_27
> 278	val_278
> 311	val_311
> 409	val_409
> 98	val_98
> {noformat}
> So far so good.
> {noformat}
> PREHOOK: query: insert into table src_bucket select key,value from srcpart limit 10
> Found 4 items
> -rwxr-xr-x   1 sergey staff         46 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> -rwxr-xr-x   1 sergey staff         46 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> -rwxr-xr-x   1 sergey staff         68 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> -rwxr-xr-x   1 sergey staff         68 2016-10-14 16:09 pfile:///Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> PREHOOK: query: select *, INPUT__FILE__NAME from src_bucket
> 165	val_165	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 255	val_255	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 484	val_484	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 86	val_86	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0
> 165	val_165	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 255	val_255	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 484	val_484	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 86	val_86	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000000_0_copy_1
> 238	val_238	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 27	val_27	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 278	val_278	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 311	val_311	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 409	val_409	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 98	val_98	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0
> 238	val_238	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 27	val_27	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 278	val_278	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 311	val_311	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 409	val_409	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> 98	val_98	pfile:/Users/sergey/git/hive/itests/qtest/target/warehouse/src_bucket/000001_0_copy_1
> PREHOOK: query: select * from src_bucket tablesample (bucket 1 out of 2) s
> 165	val_165
> 255	val_255
> 484	val_484
> 86	val_86
> PREHOOK: query: select * from src_bucket tablesample (bucket 2 out of 2) s
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message