hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8202) Support SMB Join for Hive on Spark [Spark Branch]
Date Mon, 13 Oct 2014 21:05:33 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169973#comment-14169973
] 

Xuefu Zhang commented on HIVE-8202:
-----------------------------------

Hi [~szehon], thanks for the excellent writeup. I'm wondering if you can convert the MS doc
to pdf and attach it here. Somehow, the open office has problem displaying the charts in the
doc. Thanks.

> Support SMB Join for Hive on Spark [Spark Branch]
> -------------------------------------------------
>
>                 Key: HIVE-8202
>                 URL: https://issues.apache.org/jira/browse/HIVE-8202
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>            Assignee: Szehon Ho
>         Attachments: Hive on Spark SMB Join.docx
>
>
> SMB joins are used wherever the tables are sorted and bucketed. It's a reduce-side join.
The join boils down to just merging the already sorted tables, allowing this operation to
be faster than an ordinary map-join. However, if the tables are partitioned, there could be
a slow down as each mapper would need to get a very small chunk of a partition which has a
single key. Thus, in some scenarios it's beneficial to convert SMB join to SMB map join as
well.
> The task is to research and support the conversion from regular SMB join to SMB map join
for Spark execution engine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message