hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Kuzmenko <f1she...@gmail.com>
Subject Hive LIMIT clause slows query
Date Thu, 02 Nov 2017 12:25:45 GMT
I'm using HDP 2.5.0 with 1.2.1 Hive.
Performing some tests I noticed that my query works better if I don't use
limit clause.

My query is:

insert into table *results_table *partition (task_id=xxx)
select * from *data_table *
where dt=20171102
and .....
limit 1000000


This query runs in about 30 seconds, but without limit clause I can get
about 20 seconds.

Query execution plan with limit <https://pastebin.com/Cmp2rPNr>, and without
<https://pastebin.com/z1ps2EhG>.


I can't remove limit clause because in some cases there's to much results
and I don't whant to store them all in result table.
Why limit affects performance so much?  Intuitively, it seems that with
limit clause it should work faster. What can I do to improve prefomance?

Mime
View raw message