hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-8621) Dump small table join data into appropriate number of broadcast variables [Spark Branch]
Date Thu, 30 Oct 2014 02:56:34 GMT

    [ https://issues.apache.org/jira/browse/HIVE-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14189547#comment-14189547
] 

Xuefu Zhang commented on HIVE-8621:
-----------------------------------

[~ssatish] Thanks for sharing your thoughts and findings. We have been reevaluating Spark's
broadcast variables for the purpose of small tables. Spark's broadcast variable works well
for small amount of data, but memory issues become mounting when broadcasting large amount
of the data. For bucket join, the table to be broadcast isn't necessary small. To make things
worth, Spark needs to keep the variable live at the driver, even after the variable is broadcast.
For this reason, we are considering to use MR's way to broadcast the small tables. I'm working
on a writeup and create subtasks for this piece. Hopefully, we can reuse or clone quite some
amount of code.

> Dump small table join data into appropriate number of broadcast variables [Spark Branch]
> ----------------------------------------------------------------------------------------
>
>                 Key: HIVE-8621
>                 URL: https://issues.apache.org/jira/browse/HIVE-8621
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Suhas Satish
>            Assignee: Suhas Satish
>
> The number of broadcast variables that must be created is m x n where
> 'm' is  the number of small tables in the (m+1) way join and n is the number of buckets
of tables. If unbucketed, n=1
> This is a sub-task of map-join for spark 
> https://issues.apache.org/jira/browse/HIVE-7613
> This can use the baseline patch for map-join
> https://issues.apache.org/jira/browse/HIVE-8616



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message