hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From wzc1...@gmail.com
Subject 回复: hive 0.11 auto convert join bug report
Date Sun, 11 Aug 2013 07:50:46 GMT
Hi all:
when I change the table alias dim_pay_date to A, the query pass in hive 0.11(https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_change_alias_pass):

use test;
create table if not exists src ( `key` int,`val` string);
load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite into table src;
drop table if exists orderpayment_small;
create table orderpayment_small (`dealid` int,`date` string,`time` string, `cityid` int, `userid`
int);
insert overwrite table orderpayment_small select 748, '2011-03-24', '2011-03-24', 55 ,5372613
from src limit 1;
drop table if exists user_small;
create table user_small( userid int);
insert overwrite table user_small select key from src limit 100;
set hive.auto.convert.join.noconditionaltask.size = 200;
SELECT
`A`.`date`
, `deal`.`dealid`
FROM `orderpayment_small` `orderpayment`
JOIN `orderpayment_small` `A` ON `A`.`date` = `orderpayment`.`date`
JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = `orderpayment`.`cityid`
JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
limit 5;



It's quite strange and interesting now. I will keep searching for the answer to this issue.




在 2013年8月9日星期五,上午3:32,wzc1989@gmail.com 写道:

> Hi all:  
> I'm currently testing hive11 and encounter one bug with hive.auto.convert.join, I construct
a testcase so everyone can reproduce it(or you can reach the testcase here:https://gist.github.com/code6/6187569#file-hive11_auto_convert_join_bug):
>  
> use test;
> create table src ( `key` int,`val` string);
> load data local inpath '/Users/code6/git/hive/data/files/kv1.txt' overwrite into table
src;
> drop table if exists orderpayment_small;
> create table orderpayment_small (`dealid` int,`date` string,`time` string, `cityid` int,
`userid` int);
> insert overwrite table orderpayment_small select 748, '2011-03-24', '2011-03-24', 55
,5372613 from src limit 1;
> drop table if exists user_small;
> create table user_small( userid int);
> insert overwrite table user_small select key from src limit 100;
> set hive.auto.convert.join.noconditionaltask.size = 200;
> SELECT
> `dim_pay_date`.`date`
> , `deal`.`dealid`
> FROM `orderpayment_small` `orderpayment`
> JOIN `orderpayment_small` `dim_pay_date` ON `dim_pay_date`.`date` = `orderpayment`.`date`
> JOIN `orderpayment_small` `deal` ON `deal`.`dealid` = `orderpayment`.`dealid`
> JOIN `orderpayment_small` `order_city` ON `order_city`.`cityid` = `orderpayment`.`cityid`
> JOIN `user_small` `user` ON `user`.`userid` = `orderpayment`.`userid`
> limit 5;
>  
>  
> You should replace the path of kv1.txt by yourself. You can run the above query in hive
0.11 and it will fail with ArrayIndexOutOfBoundsException, You can see the explain result
and the console output of the query here : https://gist.github.com/code6/6187569
>  
> I compile the trunk code but it doesn't work with this query. I can run this query in
hive 0.9 with hive.auto.convert.join turns on.
>  
> I try to dig into this problem and I think it may be caused by the map join optimization.
Some adjacent operators aren't match for the input/output tableinfo(column positions diff).
 
>  
> I'm not able to fix this bug and I would appreciate it if someone would like to look
into this problem.
>  
> Thanks.  


Mime
View raw message