hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-16592) Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes
Date Fri, 05 May 2017 07:06:04 GMT

     [ https://issues.apache.org/jira/browse/HIVE-16592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated HIVE-16592:
---------------------------
    Description: 
{code}
public static int calculateLongHashCode(long key) {

    key = (~key) + (key << 21); // key = (key << 21) - key - 1;
    key = key ^ (key >>> 24);
    key = (key + (key << 3)) + (key << 8); // key * 265
    key = key ^ (key >>> 14);
    key = (key + (key << 2)) + (key << 4); // key * 21
    key = key ^ (key >>> 28);
    key = key + (key << 31);

    return (int) key;
  }
{code}

Does not mix enough bits into the lower 32 bits, which are used for the bucket probes.

The 1997 document lists 

{code}
public int hash6432shift(long key)
{
  key = (~key) + (key << 18); // key = (key << 18) - key - 1;
  key = key ^ (key >>> 31);
  key = key * 21; // key = (key + (key << 2)) + (key << 4);
  key = key ^ (key >>> 11);
  key = key + (key << 6);
  key = key ^ (key >>> 22);
  return (int) key;
}
{code}

as the algorithm for keeping the lower 32 bits well distributed.

> Vectorization: Long hashes use hash64shift and not hash6432shift to generate int hashCodes
> ------------------------------------------------------------------------------------------
>
>                 Key: HIVE-16592
>                 URL: https://issues.apache.org/jira/browse/HIVE-16592
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Gopal V
>            Priority: Minor
>
> {code}
> public static int calculateLongHashCode(long key) {
>     key = (~key) + (key << 21); // key = (key << 21) - key - 1;
>     key = key ^ (key >>> 24);
>     key = (key + (key << 3)) + (key << 8); // key * 265
>     key = key ^ (key >>> 14);
>     key = (key + (key << 2)) + (key << 4); // key * 21
>     key = key ^ (key >>> 28);
>     key = key + (key << 31);
>     return (int) key;
>   }
> {code}
> Does not mix enough bits into the lower 32 bits, which are used for the bucket probes.
> The 1997 document lists 
> {code}
> public int hash6432shift(long key)
> {
>   key = (~key) + (key << 18); // key = (key << 18) - key - 1;
>   key = key ^ (key >>> 31);
>   key = key * 21; // key = (key + (key << 2)) + (key << 4);
>   key = key ^ (key >>> 11);
>   key = key + (key << 6);
>   key = key ^ (key >>> 22);
>   return (int) key;
> }
> {code}
> as the algorithm for keeping the lower 32 bits well distributed.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message