hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-10302) Load small tables (for map join) in executor memory only once [Spark Branch]
Date Mon, 01 Jun 2015 22:44:19 GMT

    [ https://issues.apache.org/jira/browse/HIVE-10302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14568170#comment-14568170
] 

Sergey Shelukhin commented on HIVE-10302:
-----------------------------------------

Hi. This appears to have broken the build:
{noformat}
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.1:compile
(default-compile) on project hive-exec: Compilation failure: Compilation failure:
[ERROR] /Users/sergey/git/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HivePairFlatMapFunction.java:[51,7]
cannot find symbol
[ERROR] symbol:   variable SmallTableCache
[ERROR] location: class org.apache.hadoop.hive.ql.exec.spark.HivePairFlatMapFunction<T,K,V>
[ERROR] /Users/sergey/git/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java:[129,42]
cannot find symbol
[ERROR] symbol:   variable SmallTableCache
[ERROR] location: class org.apache.hadoop.hive.ql.exec.spark.HashTableLoader
[ERROR] /Users/sergey/git/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java:[132,24]
cannot find symbol
[ERROR] symbol:   variable SmallTableCache
[ERROR] location: class org.apache.hadoop.hive.ql.exec.spark.HashTableLoader
[ERROR] /Users/sergey/git/hive/ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java:[135,11]
cannot find symbol
[ERROR] symbol:   variable SmallTableCache
[ERROR] location: class org.apache.hadoop.hive.ql.exec.spark.HashTableLoader
[ERROR] -> [Help 1]
{noformat}.

Can you please revert or fix?

> Load small tables (for map join) in executor memory only once [Spark Branch]
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-10302
>                 URL: https://issues.apache.org/jira/browse/HIVE-10302
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Jimmy Xiang
>            Assignee: Jimmy Xiang
>             Fix For: 1.3.0
>
>         Attachments: 10302.patch, HIVE-10302.2-spark.patch, HIVE-10302.3-spark.patch,
HIVE-10302.4-spark.patch, HIVE-10302.spark-1.patch
>
>
> Usually there are multiple cores in a Spark executor, and thus it's possible that multiple
map-join tasks can be running in the same executor (concurrently or sequentially). Currently,
each task will load its own copy of the small tables for map join into memory, ending up with
inefficiency. Ideally, we only load the small tables once and share them among the tasks running
in that executor.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message