hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gopal V (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7121) Use murmur hash to distribute HiveKey
Date Thu, 29 May 2014 05:03:01 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gopal V updated HIVE-7121:
--------------------------

    Attachment: HIVE-7121.2.patch

Fix the table bucketed insert case by limiting the murmur hash to cases with variable number
of reducers.

This neatly avoids using the new hash function where there are explicit bucket counts or reducer
counts.

> Use murmur hash to distribute HiveKey
> -------------------------------------
>
>                 Key: HIVE-7121
>                 URL: https://issues.apache.org/jira/browse/HIVE-7121
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: HIVE-7121.1.patch, HIVE-7121.2.patch, HIVE-7121.WIP.patch
>
>
> The current hashCode implementation produces poor parallelism when dealing with single
integers or doubles.
> And for partitioned inserts into a 1 bucket table, there is a significant hotspot on
Reducer #31.
> Removing the magic number 31 and using a more normal hash algorithm would help fix these
hotspots.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message