hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <>
Subject [jira] [Commented] (HIVE-20501) Vectorization: Closed range fast-path for Fast Long hashset
Date Thu, 06 Sep 2018 02:12:00 GMT


Gopal V commented on HIVE-20501:

This is trying to eliminate the hash computation and bucket probe lookup entirely for HashSet
and HashMultiSet cases and return directly from the min-max result.

isSimpleRange returns true if the key-range is entirely continuous, by checking for total
# of keys and the min-max longs. 

if the min=1, max=10 and there are 10 keys assigned, with no cases where newKey=false, then
the assumption can be made that the hashset contains [1,10] and therefore for any value between
1-10, there's no further lookups necessary to return a result.

However, the inner loop JIT profiles of this tells me that I need to move the branches up
into VectorMapJoinLeftSemiLongOperator and  VectorMapJoinInnerBigOnlyLongOperator.

> Vectorization: Closed range fast-path for Fast Long hashset 
> ------------------------------------------------------------
>                 Key: HIVE-20501
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Major
>         Attachments: HIVE-20501.1.patch
> In scenarios where the surrogate keys are entirely contiguous, the cache can offer a
fast-path for [min,max], without a further lookup in the hashtable.
> {code}
> hive> select min(c_customer_sk), max(c_customer_sk), max(c_customer_sk) - min(c_customer_sk),
count(1) from customer;
> 1       65000000        64999999        65000000
> {code}

This message was sent by Atlassian JIRA

View raw message