hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: what is the difference between ³hive.compute.splits.in.am=true²and "hive.compute.splits.in.am=false"
Date Tue, 19 Jan 2016 03:44:02 GMT

>what is the difference between³hive.compute.splits.in.am=true²and
>"hive.compute.splits.in.am=false"?
>which value is better?

First up, those options are specific to Tez.

The old MapReduce model was to always compute splits before asking for
resources to run. And this uses the gateway host (where the CLI runs) to
do that.

That model runs sequentially and overload single gateway machines during
heavy concurrency, particularly when used via ODBC (HiveServer2 mode).

Here's an old slide explaining how that speeds up queries.

http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey/29


This dynamic & pipelined model lays down the foundation for optimizations
like Tez's dynamic partition pruning.

Cheers,
Gopal



Mime
View raw message