hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: hive sql on tez run forever
Date Mon, 11 May 2015 17:13:43 GMT
Hi,

> I change the sql where condition to (where t.update_time >=
>'2015-05-04') , the sql can return result for a while. Because
>t.update_time
> >= '2015-05-04' can  filter many row when table scan. But why change
>where condition to
> (where t.update_time >= '2015-05-04' or length(t8.end_user_id)>0) ,the
>sql run forever as follows:


The OR clause is probably causing the problems.

We¹re probably not pushing down the OR clauses down to the original table
scans.

This is most likely a hive PPD miss where you do something like

select a.*,b.* from a,b where a.x = b.x and (a.y = 1 or b.z = 1);

where it doesn¹t get planned as

select a1.*, b1.* from (select a.* from a where a.y=1) a1, (select b.*
from b where b.z = 1) b1 where a1.x = b1.x;

instead gets planned as a full-scan JOIN, then a filter.

Can you spend some time and try to rewrite down your case to something
like the above queries?

If that works, then file a JIRA.

Cheers,
Gopal



Mime
View raw message