hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10084) Improve common join performance [Spark Branch]
Date Thu, 06 Apr 2017 16:18:41 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15959215#comment-15959215
] 

Xuefu Zhang commented on HIVE-10084:
------------------------------------

Hi [~stakiar], The conclusion came from a benchmark between Spark and Tez in Hive. However,
a lot of things have changed, so I'm not sure if this still holds true.

You can construct a query that invokes common join to see how it performs, and profile it
if necessary. I think the difference might come from Spark shuffle. We have recently changed
the usage of spark shuffle, so it's unclear to me if there is anything to do before you actually
benchmark it.

> Improve common join performance [Spark Branch]
> ----------------------------------------------
>
>                 Key: HIVE-10084
>                 URL: https://issues.apache.org/jira/browse/HIVE-10084
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Spark
>            Reporter: Xuefu Zhang
>
> Benchmark shows that Hive on Spark shows some numbers which indicate that common join
performance can be improved. This task is to investigate and fix the issue.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message