hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jörn Franke <jornfra...@gmail.com>
Subject Re: Using spark in tandem with Hive
Date Tue, 01 Dec 2015 16:58:07 GMT
How did you create the tables? Do you have automated statistics activated in Hive?

Btw mr is outdated as a Hive execution engine. Use TEZ (maybe wait for 0.8 for sub second
queries ) or use Spark as an execution engine in Hive.

> On 01 Dec 2015, at 17:40, Mich Talebzadeh <mich@peridale.co.uk> wrote:
> 
> What if we decide to use spark with Hive. I look to hear similar views
>  
> My test bed comprised
>  
> 1.    Spark version 1.5.2
> 2.    Hive version 1.2.1
> 3.    Hadoop version 2.6
>  
>  
> I made Spark to use Hive metastore. So using spark-sql I can pretty do whatever one can
do with HiveQL
>  
> I created and populated an ORC table in spark-sql.. It took 90 seconds to create and
populate the table with 1.7 million rows
>  
> spark-sql> select count(1) from tt;
> 1767886
> Time taken: 5.169 seconds, Fetched 1 row(s)
>  
> Now let me try to do the said operation on the same table with HCL and MR
>  
> hive> use asehadoop;
> OK
> Time taken: 0.639 seconds
> hive> select count(1) from tt;
> Query ID = hduser_20151201162717_e3102633-f501-413b-b9cb-384ac50880ac
> Total jobs = 1
> Launching Job 1 out of 1
> Number of reduce tasks determined at compile time: 1
> In order to change the average load for a reducer (in bytes):
>   set hive.exec.reducers.bytes.per.reducer=<number>
> In order to limit the maximum number of reducers:
>   set hive.exec.reducers.max=<number>
> In order to set a constant number of reducers:
>   set mapreduce.job.reduces=<number>
> Starting Job = job_1448969636093_0001, Tracking URL = http://rhes564:8088/proxy/application_1448969636093_0001/
> Kill Command = /home/hduser/hadoop-2.6.0/bin/hadoop job  -kill job_1448969636093_0001
> Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
> 2015-12-01 16:27:27,154 Stage-1 map = 0%,  reduce = 0%
> 2015-12-01 16:27:35,427 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.65 sec
> 2015-12-01 16:27:41,611 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.71 sec
> MapReduce Total cumulative CPU time: 3 seconds 710 msec
> Ended Job = job_1448969636093_0001
> MapReduce Jobs Launched:
> Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.71 sec   HDFS Read: 520151 HDFS
Write: 8 SUCCESS
> Total MapReduce CPU Time Spent: 3 seconds 710 msec
> OK
> 1767886
> Time taken: 25.635 seconds, Fetched: 1 row(s)
>  
> So 5 seconds in Spark versus 25 seconds in Hive
>  
> On a point query Hive does not seem to return the correct timing?
>  
> hive> select * from tt where data_object_id = 10;
> Time taken: 0.063 seconds, Fetched: 72 row(s)
>  
> Whereas in Spark I get
>  
> spark-sql>  select * from tt where data_object_id = 10;
> Time taken: 9.002 seconds, Fetched 72 row(s)
>  
> 9 seconds looks far more plausible to me than 0.063 seonds. Or in an unlikely event Spark
returns elapsed time, whereas Hive returns execution time?
>  
> Thanks
>  
> Mich Talebzadeh
>  
> Sybase ASE 15 Gold Medal Award 2008
> A Winning Strategy: Running the most Critical Financial Data on ASE 15
> http://login.sybase.com/files/Product_Overviews/ASE-Winning-Strategy-091908.pdf
> Author of the books "A Practitioner’s Guide to Upgrading to Sybase ASE 15", ISBN 978-0-9563693-0-7.
> co-author "Sybase Transact SQL Guidelines Best Practices", ISBN 978-0-9759693-0-4
> Publications due shortly:
> Complex Event Processing in Heterogeneous Environments, ISBN: 978-0-9563693-3-8
> Oracle and Sybase, Concepts and Contrasts, ISBN: 978-0-9563693-1-4, volume one out shortly
>  
> http://talebzadehmich.wordpress.com
>  
> NOTE: The information in this email is proprietary and confidential. This message is
for the designated recipient only, if you are not the intended recipient, you should destroy
it immediately. Any information in this message shall not be understood as given or endorsed
by Peridale Technology Ltd, its subsidiaries or their employees, unless expressly so stated.
It is the responsibility of the recipient to ensure that this email is virus free, therefore
neither Peridale Ltd, its subsidiaries nor their employees accept any responsibility.
>  

Mime
View raw message