hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prasanth Jayachandran <>
Subject Re: Experimental results using TPC-DS (versus Spark and Presto)
Date Tue, 31 Jan 2017 23:56:04 GMT
Hi Dongwon

Thanks for the presentation! Very insightful.
I just filed a bug for query72. Hive’s CBO seems to be selecting wrong join order.

In the following link you can find a rewrite for the query which gives much better runtime
(in my testing I was able to run in 130s on 1TB scale on 6 node LLAP cluster).
I also disabled the date filter that turns into NULL > NULL + 5 expression in the queries.
Ideally, we want CBO to pick up join order in the rewritten query (which should be fixed with

The above link contains
- original query
- modified query
- the explain output (txt and svg) for original and modified

Thanks for reporting the issue and I hope this helps.


On Jan 30, 2017, at 9:39 PM, 김동원 <<>>

gopal :
In the attached  gopal.tar.gz, I put two svg images and two text files after rerunning query72
with and without the following inequation:
and d3.d_date > d1.d_date + 5

FYI, I already did Hive experiments with and without the inequation because Presto doesn't
allow it at the time of query submission,
but Hive's running times are not that different.
- Dongwon

2017. 1. 31. 오후 12:48, Gopal Vijayaraghavan <<>>

Gopal : (yarn logs -application $APPID) doesn't contain a line
containing HISTORY so it doesn't produce svg file. Should I turn on
some option to get the lines containing HISTORY in yarn application

There's a config option which controls who much data is written to the
log there.

I think there's an interval type clause in the 72 query, which might be a problem.

and d3.d_date > d1.d_date + 5

That might be doing UDFToDouble(d_date) > UDFToDouble(d_date) + 5, which will evaluate


Because UDFToDouble("1997-01-01") is NULL.

So, seeing your explain would go a long way in finding out what's going on.

The swimlane raw data is also somewhat interesting to me, because I also draw a differen t
set of graphs from the same HISTORY data.

to locate bottlenecks in the system.


View raw message