hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-7617) optimize bytes mapjoin hash table read path wrt serialization, at least for common cases
Date Wed, 13 Aug 2014 05:01:16 GMT

    [ https://issues.apache.org/jira/browse/HIVE-7617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095153#comment-14095153
] 

Mostafa Mokhtar commented on HIVE-7617:
---------------------------------------

Which query did you run?
Can you try this :
 select sum(ss_ext_sales_price) from store_sales JOIN date_dim ON store_sales.ss_sold_date_sk
= date_dim.d_date_sk where ss_sold_date between '1998-01-01' and '2000-01-01' and date_dim.d_year
between '1998' and '2000';

> optimize bytes mapjoin hash table read path wrt serialization, at least for common cases
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-7617
>                 URL: https://issues.apache.org/jira/browse/HIVE-7617
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HIVE-7617.01.patch, HIVE-7617.patch, HIVE-7617.prelim.patch
>
>
> BytesBytes has table stores keys in the byte array for compact representation, however
that means that the straightforward implementation of lookups serializes lookup keys to byte
arrays, which is relatively expensive.
> We can either shortcut hashcode and compare for common types on read path (integral types
which would cover most of the real-world keys), or specialize hashtable and from BytesBytes...
create LongBytes, StringBytes, or whatever. First one seems simpler now.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message