hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mich Talebzadeh <mich.talebza...@gmail.com>
Subject Re: Hive on TEZ + LLAP
Date Tue, 19 Jul 2016 17:56:28 GMT
Sounds like if I am correct joining a fact table store_sales; with two
dimensions?

cool

thanks



Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 July 2016 at 18:31, Gopal Vijayaraghavan <gopalv@apache.org> wrote:

> > What was the type (Parquet, text, ORC etc) and row count for each three
> >tables above?
>
> I always use ORC for flat columnar data.
>
> ORC is designed to be ideal if you have measure/dimensions normalized into
> tables - most SQL workloads don't start with an indefinite depth tree.
>
> hive> select count(1) from store_sales;
> OK
> 2879987999
> Time taken: 2.603 seconds, Fetched: 1 row(s)
> hive> select count(1) from store;
> OK
> 1002
> Time taken: 0.213 seconds, Fetched: 1 row(s)
> hive> select count(1) from date_dim;
> OK
> 73049
> Time taken: 0.186 seconds, Fetched: 1 row(s)
> hive>
>
> The DPP semi-join for date_dim is very fast, so out of the ~2.8 billion
> records only 93 million are read into the cache.
>
> Standard TPC-DS data-set at 1000 scale - same layout you can get from
> hive-testbench && ./tpcds-setup.sh 1000;
>
> Cheers,
> Gopal
>
>
>

Mime
View raw message