hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10283) HIVE-4240 may be causing issue with bucketed tables
Date Fri, 29 May 2015 21:41:18 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14565467#comment-14565467
] 

Xuefu Zhang commented on HIVE-10283:
------------------------------------

If this is the case, then it's a serious issue. My guess is that the number of reducer isn't
set correctly. This seems to be a different issue than this JIRA. Could you please create
a new JIRA. If this also happens to 1.2 release, then we need to mark it as a blocker for
1.2.1 release as well.

> HIVE-4240 may be causing issue with bucketed tables 
> ----------------------------------------------------
>
>                 Key: HIVE-10283
>                 URL: https://issues.apache.org/jira/browse/HIVE-10283
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Ryan P
>
> I suspect that by removing the reducer, HIVE-4240, may be causing issues. Because of
this inserts will not consolidate 'buckets' into single files which is problematic when attempting
to use bucketmapjoin.
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ','; 
> Then I inserted the following data into the "buckettestinput" table 
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8 
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'first%' 
> SELECT * 
> FROM buckettestoutput1 TABLESAMPLE(BUCKET 1 OUT OF 1 ON data) s; 
> insert into table buckettestoutput1 
> select * from buckettestinput where data like 'second%' 
> check the results of the table sample query. 
> for sort merge bucket map join 
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> set hive.auto.convert.sortmerge.join.noconditionaltask=true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data) 
> hive> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);

> FAILED: SemanticException [Error 10141]: Bucketed table metadata is not correct. Fix
the metadata or don't use bucketed mapjoin, by setting hive.enforce.bucketmapjoin to false.
The number of buckets for table buckettestoutput1 is 2, whereas the number of files is 4 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message