hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-20873) Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
Date Mon, 26 Nov 2018 04:52:00 GMT

    [ https://issues.apache.org/jira/browse/HIVE-20873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698475#comment-16698475
] 

Gopal V commented on HIVE-20873:
--------------------------------

[~teddy.choi]: this is good to go into Apache - has been tested and found to be good.

> Use Murmur hash for VectorHashKeyWrapperTwoLong to reduce hash collision
> ------------------------------------------------------------------------
>
>                 Key: HIVE-20873
>                 URL: https://issues.apache.org/jira/browse/HIVE-20873
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Teddy Choi
>            Assignee: Teddy Choi
>            Priority: Major
>              Labels: pull-request-available
>         Attachments: HIVE-20873.1.patch, HIVE-20873.2.patch, HIVE-20873.3.patch
>
>
> VectorHashKeyWrapperTwoLong is implemented with few bit shift operators and XOR operators
for short computation time, but more hash collision. Group by operations become very slow
on large data sets. It needs Murmur hash or a better hash function for less hash collision.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message