hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gopal Vijayaraghavan <gop...@apache.org>
Subject Re: Hive 2 release what is new in this release please
Date Thu, 18 Feb 2016 18:35:01 GMT

> Is there such notes as what is new in Hive 2 say new features etc?
 

Sergey had a in-depth presentation at the last meetup

<https://cwiki.apache.org/confluence/display/Hive/Presentations#Presentatio
ns-January2016HiveUserGroupMeetup>

Notable omission - Jason's custom edge for Tez, which vectorized shuffle
joins (in 1.3.x but unreleased).


> Primarily interested on what has been added such as partition pruning
>etc.

LLAP & Hbase metastore are the big chunks (~hundreds of JIRAs). CBO/Stats
has had some major changes too to keep up.

Partition pruning has been there since the first release of Hive, IIRC.

Dynamic Partition pruning however was added in 0.14 (HIVE-7826) - shuffle
joins & map-joins having partition keys will filter the big side using the
small side's output (commonly used in de-dup of incoming streams).

insert into bigtable select * from etl where NOT EXISTS (select txnid from
bigtable where bigtable.qhour_key = etl.qhour_key and bigtable.txnid =
etl.txnid);

That plans a shuffle join because the value producing table in a left semi
join cannot be on the hash-join side.

So, even though this is not a map-join, Tez can plan a pruning control
edge to delay split-generation on the big-table to ensure we don't
schedule any scans for any qhour_key partitions which are not in
ETL.qhour_key.

Are you thinking of bucket pruning, which was added in LLAP+2.0.0? I'm
still working on the remaining bits of bucket pruning for the simple case.

Like

select * from table where bucket_key IN (1,10,100);

is likely to be faster if table is multiple gigabytes than if it was 900Mb.

Cheers,
Gopal
 



Mime
View raw message