hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongzhi Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10880) The bucket number is not respected in insert overwrite.
Date Fri, 05 Jun 2015 12:44:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14574432#comment-14574432
] 

Yongzhi Chen commented on HIVE-10880:
-------------------------------------

The failures are not related. 
Following two testes failed age more than 10. 
org.apache.hadoop.hive.cli.TestMinimrCliDriver.testCliDriver_ql_rewrite_gbtoidx_cbo_2
org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark.testTempTable

org.apache.hadoop.hive.cli.TestCliDriver.testCliDriver_autogen_colalias failed in build 4179(the
build after this build) too.

For the spark failure, I tested locally, all pass. And my code change only affect when hive.enforce.bucketing
is true, the spark test never set this value. So it is not related. 

-------------------------------------------------------
 T E S T S
-------------------------------------------------------

-------------------------------------------------------
 T E S T S
-------------------------------------------------------
Running org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark
Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 72.272 sec - in org.apache.hive.jdbc.TestJdbcWithLocalClusterSpark

Results :

Tests run: 5, Failures: 0, Errors: 0, Skipped: 0

Could anyone review the code? Thanks

> The bucket number is not respected in insert overwrite.
> -------------------------------------------------------
>
>                 Key: HIVE-10880
>                 URL: https://issues.apache.org/jira/browse/HIVE-10880
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Yongzhi Chen
>            Assignee: Yongzhi Chen
>            Priority: Blocker
>         Attachments: HIVE-10880.1.patch, HIVE-10880.2.patch, HIVE-10880.3.patch
>
>
> When hive.enforce.bucketing is true, the bucket number defined in the table is no longer
respected in current master and 1.2. This is a regression.
> Reproduce:
> {noformat}
> CREATE TABLE IF NOT EXISTS buckettestinput( 
> data string 
> ) 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput1( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> CREATE TABLE IF NOT EXISTS buckettestoutput2( 
> data string 
> )CLUSTERED BY(data) 
> INTO 2 BUCKETS 
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
> Then I inserted the following data into the "buckettestinput" table
> firstinsert1 
> firstinsert2 
> firstinsert3 
> firstinsert4 
> firstinsert5 
> firstinsert6 
> firstinsert7 
> firstinsert8 
> secondinsert1 
> secondinsert2 
> secondinsert3 
> secondinsert4 
> secondinsert5 
> secondinsert6 
> secondinsert7 
> secondinsert8
> set hive.enforce.bucketing = true; 
> set hive.enforce.sorting=true;
> insert overwrite table buckettestoutput1 
> select * from buckettestinput where data like 'first%';
> set hive.auto.convert.sortmerge.join=true; 
> set hive.optimize.bucketmapjoin = true; 
> set hive.optimize.bucketmapjoin.sortedmerge = true; 
> select * from buckettestoutput1 a join buckettestoutput2 b on (a.data=b.data);
> Error: Error while compiling statement: FAILED: SemanticException [Error 10141]: Bucketed
table metadata is not correct. Fix the metadata or don't use bucketed mapjoin, by setting
hive.enforce.bucketmapjoin to false. The number of buckets for table buckettestoutput1 is
2, whereas the number of files is 1 (state=42000,code=10141)
> {noformat}
> The related debug information related to insert overwrite:
> {noformat}
> 0: jdbc:hive2://localhost:10000> insert overwrite table buckettestoutput1 
> select * from buckettestinput where data like 'first%'insert overwrite table buckettestoutput1

> 0: jdbc:hive2://localhost:10000> ;
> select * from buckettestinput where data like ' 
> first%';
> INFO  : Number of reduce tasks determined at compile time: 2
> INFO  : In order to change the average load for a reducer (in bytes):
> INFO  :   set hive.exec.reducers.bytes.per.reducer=<number>
> INFO  : In order to limit the maximum number of reducers:
> INFO  :   set hive.exec.reducers.max=<number>
> INFO  : In order to set a constant number of reducers:
> INFO  :   set mapred.reduce.tasks=<number>
> INFO  : Job running in-process (local Hadoop)
> INFO  : 2015-06-01 11:09:29,650 Stage-1 map = 86%,  reduce = 100%
> INFO  : Ended Job = job_local107155352_0001
> INFO  : Loading data to table default.buckettestoutput1 from file:/user/hive/warehouse/buckettestoutput1/.hive-staging_hive_2015-06-01_11-09-28_166_3109203968904090801-1/-ext-10000
> INFO  : Table default.buckettestoutput1 stats: [numFiles=1, numRows=4, totalSize=52,
rawDataSize=48]
> No rows affected (1.692 seconds)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message