hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes
Date Sun, 07 May 2017 08:29:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15999733#comment-15999733
] 

Gopal V commented on HIVE-16592:
--------------------------------

The best hash function in my tests for the working set I use has been 

{code}
      key ^= (key >>> 20) ^ (key >>> 12);
      key ^= (key >>> 7) ^ (key >>> 4);
      return (int)key;
{code}

Unlike the hash64shift, this isn't reversible.

> Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16592
>                 URL: https://issues.apache.org/jira/browse/HIVE-16592
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gopal V
>            Priority: Minor
>
> {code}
> public static int calculateLongHashCode(long key) {
>     key = (~key) + (key << 21); // key = (key << 21) - key - 1;
>     key = key ^ (key >>> 24);
>     key = (key + (key << 3)) + (key << 8); // key * 265
>     key = key ^ (key >>> 14);
>     key = (key + (key << 2)) + (key << 4); // key * 21
>     key = key ^ (key >>> 28);
>     key = key + (key << 31);
>     return (int) key;
>   }
> {code}
> Does not mix enough bits into the lower 32 bits, which are used for the bucket probes.
> The 1997 document lists 
> {code}
> public int hash6432shift(long key)
> {
>   key = (~key) + (key << 18); // key = (key << 18) - key - 1;
>   key = key ^ (key >>> 31);
>   key = key * 21; // key = (key + (key << 2)) + (key << 4);
>   key = key ^ (key >>> 11);
>   key = key + (key << 6);
>   key = key ^ (key >>> 22);
>   return (int) key;
> }
> {code}
> as the algorithm for keeping the lower 32 bits well distributed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message