hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-6430) MapJoin hash table has large memory overhead
Date Thu, 17 Apr 2014 01:07:15 GMT

     [ https://issues.apache.org/jira/browse/HIVE-6430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Sergey Shelukhin updated HIVE-6430:
-----------------------------------

    Attachment: HIVE-6430.07.patch

Patch that fixes some issues, main thing is that Murmur hash from guava is used; hashing behavior
is very bad with previous hash code method and perf suffers a lot.
There's also an issue with previously used expand method. To make expand fast, hash is now
stored fully. This is not necessary for anything else so it's a tradeoff - more memory (+4
bytes per key) or expensive rehash. We may do it later.
Fast paths were added to WriteBuffers for the majority of cases where whatever we are doing
is all in one buffer. There's some bug in there that causes some queries to fail, I'll investigate...
want to UL patch with what is done, the queries with large map joins that do work now run
approximately as fast as before (will later measure more precisely) in a fraction of memory.

> MapJoin hash table has large memory overhead
> --------------------------------------------
>
>                 Key: HIVE-6430
>                 URL: https://issues.apache.org/jira/browse/HIVE-6430
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-6430.01.patch, HIVE-6430.02.patch, HIVE-6430.03.patch, HIVE-6430.04.patch,
HIVE-6430.05.patch, HIVE-6430.06.patch, HIVE-6430.07.patch, HIVE-6430.patch
>
>
> Right now, in some queries, I see that storing e.g. 4 ints (2 for key and 2 for row)
can take several hundred bytes, which is ridiculous. I am reducing the size of MJKey and MJRowContainer
in other jiras, but in general we don't need to have java hash table there.  We can either
use primitive-friendly hashtable like the one from HPPC (Apache-licenced), or some variation,
to map primitive keys to single row storage structure without an object per row (similar to
vectorization).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message