hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-16151) BytesBytesHashTable allocates large arrays
Date Thu, 06 Apr 2017 23:51:42 GMT

    [ https://issues.apache.org/jira/browse/HIVE-16151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959989#comment-15959989
] 

Sergey Shelukhin edited comment on HIVE-16151 at 4/6/17 11:50 PM:
------------------------------------------------------------------

Need to look at that null check and where it happens, it could be avoidable. 4% seems too
much as this is kind of obscure.
We could even just allocate all the memory at once as before, but in small chunks, eliminating
it. Since with a good hash function, we expect every sub-array to be used anyway.


was (Author: sershe):
Need to look at that null check and where it happens, it could be avoidable. 4% seems too
much as this is kind of obscure.
We could even just allocate all the memory at once as before, but in small chunks, eliminating
it.

> BytesBytesHashTable allocates large arrays
> ------------------------------------------
>
>                 Key: HIVE-16151
>                 URL: https://issues.apache.org/jira/browse/HIVE-16151
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Prasanth Jayachandran
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-16151.patch
>
>
> These arrays cause GC pressure and also impose key count limitations on the table. Wrt
the latter, we won't be able to get rid of it without a 64-bit hash function, but for now
we can get rid of the former. If we need the latter we'd add murmur64 and probably account
for it differently for resize (we don't want to blow up the hashtable by 4 bytes/key in the
common case where #of keys is less than ~1.5B :))



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message