hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10673) Dynamically partitioned hash join for Tez
Date Mon, 29 Jun 2015 18:55:06 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14606142#comment-14606142
] 

Jason Dere commented on HIVE-10673:
-----------------------------------

[~mmokhtar] or [~gopalv] can probably give more detail, but they found that during a shuffle
join a large amount of the CPU/IO was spent sorting. 

While this does not work for MR, for other execution engines (such as Tez), it is possible
to create a reduce-side join that uses unsorted inputs, in order to eliminate the sorting.
We use the hash join algorithm to perform the join in the reducer, so this requires the small
tables in the join to fit in the hash table for this to work. Testing with this patch [~mmokhtar]
found some decent time savings.

> Dynamically partitioned hash join for Tez
> -----------------------------------------
>
>                 Key: HIVE-10673
>                 URL: https://issues.apache.org/jira/browse/HIVE-10673
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning, Query Processor
>            Reporter: Jason Dere
>            Assignee: Jason Dere
>         Attachments: HIVE-10673.1.patch, HIVE-10673.2.patch, HIVE-10673.3.patch, HIVE-10673.4.patch,
HIVE-10673.5.patch
>
>
> Reduce-side hash join (using MapJoinOperator), where the Tez inputs to the reducer are
unsorted.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message