hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Hive on TEZ + LLAP
Date Mon, 18 Jul 2016 23:29:03 GMT
These looks pretty impressive. What execution mode were you running these?
Yarn client may be?

     *Query                MR/sec
TEZ/sec                 TEZ+LLAP/sec*
                          203.317                   13.681
3.809
    *Order of Magnitude*    -------                   15
times                53 times
      *faster*


My calculations on Hive 2 on Spark 1.3.1 (obviously we are comparing
different bases but it is interesting as a sample) reflects the following:

Table             MR/sec                 Spark/sec  Order of Magnitude
faster
Parquet           239.532                14.38           16 times
ORC               202.333                17.77           11 times

So the hybrid engine seems to make much difference which if I just consider
Tez only and Tez + LLAP the gain is more than 3 times

Cheers,


Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 18 July 2016 at 23:53, Gopal Vijayaraghavan <gopalv@apache.org> wrote:

>
> > Also has there been simple benchmarks to compare:
> >
> > 1. Hive on MR
> > 2. Hine on Tez
> > 3. Hive on Tez with LLAP
>
> I ran one today, with a small BI query in my test suite against a 1Tb
> data-set.
>
> TL;DR - MRv2 (203.317 seconds), Tez (13.681s), LLAP (3.809s).
>
> *Warning*: This is not a historical view, all engines are using the same
> new & improved vectorized operators from 2.2.0-SNAPSHOT, only the physical
> planner and the physical scheduling is different between runs.
>
> The difference between pre-Stinger, Stinger and Stinger.next is much much
> larger than this.
>
> <
> https://github.com/hortonworks/hive-testbench/blob/hive14/sample-queries-t
> pcds/query55.sql>
>
>
> select  i_brand_id brand_id, i_brand brand,
>         sum(ss_ext_sales_price) ext_price
>  from date_dim, store_sales, item
>  where date_dim.d_date_sk = store_sales.ss_sold_date_sk
>         and store_sales.ss_item_sk = item.i_item_sk
>         and i_manager_id=36
>         and d_moy=12
>         and d_year=2001
>  group by i_brand, i_brand_id
>  order by ext_price desc, i_brand_id
> limit 100 ;
>
>
> =================MRv2==============
>
>
> set hive.execution.engine=mr;
>
> ...
> 2016-07-18 22:22:57     Uploaded 1 File to:
> file:/tmp/gopal/b58a60d6-ff05-47bc-ad02-428aaa15779d/hive_2016-07-18_22-22-
> 43_389_3112118969207749230-1/-local-10007/HashTable-Stage-3/MapJoin-mapfile
> 131--.hashtable (914 bytes)
>
> 2016-07-18 22:22:57     End of local task; Time Taken: 2.47 sec.
> ...
> Time taken: 203.317 seconds, Fetched: 100 row(s)
>
> =================Tez===============
>
>
>
> set hive.execution.engine=tez;
> set hive.llap.execution.mode=none;
>
> Time taken: 13.681 seconds, Fetched: 100 row(s)
>
> =================LLAP==============
>
>
> set hive.llap.execution.mode=all;
>
>
>
> Task Execution Summary
> ---------------------------------------------------------------------------
> -------------------
>   VERTICES   DURATION(ms)  CPU_TIME(ms)  GC_TIME(ms)  INPUT_RECORDS
> OUTPUT_RECORDS
> ---------------------------------------------------------------------------
> -------------------
>      Map 1        1016.00             0            0     93,123,704
>    9,048
>      Map 4           0.00             0            0         10,000
>       31
>      Map 5           0.00             0            0        296,344
>    2,675
>  Reducer 2         207.00             0            0          9,048
>      100
>  Reducer 3           0.00             0            0            100
>        0
> ---------------------------------------------------------------------------
> -------------------
>
>
> Query Execution Summary
> ---------------------------------------------------------------------------
> -------------------
> OPERATION                            DURATION
> ---------------------------------------------------------------------------
> -------------------
> Compile Query                           1.64s
> Prepare Plan                            0.32s
> Submit Plan                             0.57s
> Start DAG                               0.21s
> Run DAG                                 1.02s
> ---------------------------------------------------------------------------
> -------------------
>
>
> Time taken: 3.809 seconds, Fetched: 100 row(s)
>
>
> Annoyingly now, the 1.64s to compile the query is a huge fraction, since
> it only takes 1.02s to execute the join+aggregate over 93 million rows.
>
> Hopefully in a couple of weeks, we'll cut that 1.64s into nearly nothing
> once we merge HIVE-13995 into master.
>
>
> More about the historical view, the new Vectorization codepaths are a big
> part of this speed up, when you compare historically or against an
> incompletely vectorized format like Parquet (HIVE-8128 looks abandoned).
>
> set hive.vectorized.execution.mapjoin.native.enabled=false;
>
>
> Time taken: 34.372 seconds, Fetched: 100 row(s)
> hive>
>
>
> Cheers,
> Gopal
>
>
>
>
>
>
>
>
>
>

Mime
View raw message