hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables
Date Fri, 29 May 2015 21:30:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565200#comment-14565200
] 

Xuefu Zhang edited comment on HIVE-10283 at 5/29/15 9:29 PM:
-------------------------------------------------------------

[~xuefuz] && [~szehon], could you find someone who know this part well work on the
issue. Currently, in upstream master code , number of buckets is not respected even with insert
overwrite. (insert overwrite only create 1 bucket file while the table definition is 2. 
Reproduce:
{noformat}
create table buckettest (data string) partitioned by (state string) clustered by (data) into
2 buckets;
set hive.enforce.bucketing = true;
insert overwrite table buckettest partition(state='MA') select code from jsmall limit 10;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true;
0: jdbc:hive2://localhost:10000> select * from buckettest a join buckettestoutput2 b on
(a.data=b.data);
select * from buckettest a join buckettestoutpu 
t2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed
table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting
hive.enforce.bucketmapjoin to false. The number of buckets for table buckettest partition
state=MA is 2, whereas the number of files is 1 (state=42000,code=10141)
{noformat}



was (Author: ychena):
[~xuefuz] && [~szehon], could you find someone who know this part well work on the
issue. Currently, in upstream master code , number of buckets is not respected even with insert
overwrite. (insert overwrite only create 1 bucket file while the table definition is 2. 
Reproduce:
{noformat}
create table buckettest (data string) partitioned by (state string) clustered by (data) into
2 buckets;
set hive.enforce.bucketing = true;
insert overwrite table buckettest partition(state='MA') select code from jsmall limit 10;
set hive.auto.convert.sortmerge.join=true;
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true;
0: jdbc:hive2://localhost:10000> select * from buckettest a join buckettestoutput2 b on
(a.data=b.data);
select * from buckettest a join buckettestoutpu 
t2 b on (a.data=b.data);
Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed
table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting
hive.enforce.bucketmapjoin to false. The number of buckets for table buckettest partition
state=MA is 2, whereas the number of files is 1 (state=42000,code=10141)



> HIVE-4240 may be causing issue with bucketed tables 
> ----------------------------------------------------
>
>                 Key: HIVE-10283
>                 URL: https://issues.apache.org/jira/browse/HIVE-10283
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Ryan P
>
> I suspect that by removing the reducer, HIVE-4240, may be causing issues. Because of
this inserts will not consolidate 'buckets' into single files which is problematic when attempting
to use bucketmapjoin.
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> Then I inserted the following data into the "buckettestinput" table 
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8 
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'first%' 
> SELECT * 
> FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'second%' 
> check the results of the table sample query. 
> for sort merge bucket map join 
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data) 
> hive> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);

> FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix
the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false.
The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message