hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: limit clause + fetch optimization
Date Wed, 22 Jul 2015 02:20:54 GMT

> I've been experimenting with 'select *' and 'select * limit X' in
>beeline and watching the hive-server2 log to understand when a M/R job is
>triggered and when not.  It seems like whenever I set a limit, the job is
>avoided, but with no limit, it is run.

https://issues.apache.org/jira/browse/HIVE-10156


It¹s sitting on my back-burner (I know the fix, but I¹m working on the
LLAP branch).

> hive.limit.optimize.fetch.max
>
> That defaults to 50,000 and as I understand it, whenever I set limit to
>above that number, a job should be triggered.  But I can set limit to
>something very high (e.g. 10M) and no job runs.

That configs belong to a different optimization - the global limit case,
which works as follows.

Run query with a 50k row sample of the input, then if it doesn¹t produce
enough rows, re-run the query with the full input data-set.

You will notice errors on your JDBC connections with that optimization
turned on (like HIVE-9382) and will get the following log line "Retry
query with a different approachŠ² in the HS2 logs.

So I suggest not turning on the Global Limit optimization, if you¹re on
JDBC/ODBC.

Cheers,
Gopal
 



Mime
View raw message