hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Sun" <>
Subject Review Request 26181: HIVE-8262 - Create CacheTran that transforms the input RDD by caching it [Spark Branch]
Date Tue, 30 Sep 2014 18:06:31 GMT

This is an automatically generated e-mail. To reply, visit:

Review request for hive and Xuefu Zhang.

Bugs: HIVE-8262

Repository: hive-git


In a few cases we need to cache a RDD to avoid recompute it for better performance. However,
caching a map input RDD is different from caching a regular RDD due to SPARK-3693. The way
to cache a Hadoop RDD, which is the input to MapWork, is to cache, the result RDD that is
transformed from the original Hadoop RDD by applying a map function, in which <key, value>
pairs are copied. To cache intermediate RDDs, such as that from a shuffle, is just calling
This task is to create a CacheTran to capture this, which can be used to plug in Spark Plan
when caching is desirable. 


  ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ PRE-CREATION 




Chao Sun

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message