hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Xuefu Zhang" <xzh...@cloudera.com>
Subject Re: Review Request 34455: HIVE-10550 Dynamic RDD caching optimization for HoS.[Spark Branch]
Date Wed, 20 May 2015 21:12:40 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34455/#review84572
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java
<https://reviews.apache.org/r/34455/#comment135909>

    Currently the storage level is memory+disk. Any reason to change it to memory_only?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java
<https://reviews.apache.org/r/34455/#comment135908>

    Can we keep the old code around. I understand it's not currently used.



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java
<https://reviews.apache.org/r/34455/#comment135911>

    I cannot construct a case where a MapTran would need caching. Do you have an example?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java
<https://reviews.apache.org/r/34455/#comment135910>

    Can we keep the comment around?



ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/LocalSparkJobStatus.java
<https://reviews.apache.org/r/34455/#comment135904>

    We need to consider if it makes to unpersist rdds explicitly.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java
<https://reviews.apache.org/r/34455/#comment135896>

    Nit: variable name.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java
<https://reviews.apache.org/r/34455/#comment135897>

    Nit: class name. Maybe CommonParentWorkMatcher?



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java
<https://reviews.apache.org/r/34455/#comment135903>

    This nested loop doesn't seem efficient. SparkWork is basically a graph. Finding a node
that has multiple children should be fairly easy using graph traversing. An example would
be in SplitSparkWorkResolver.
    
    Getting the value for the threshold should be outside the look, nevertheless.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java
<https://reviews.apache.org/r/34455/#comment135900>

    We need to consider the case where the statistics may not be present.



ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java
<https://reviews.apache.org/r/34455/#comment135894>

    I think we need a better name for this variable, something like cachedWorks or worksToCache.



spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java
<https://reviews.apache.org/r/34455/#comment135893>

    Do you think it makes sense for us to release the cache as soon as the job is completed,
as it's done here?


- Xuefu Zhang


On May 20, 2015, 2:37 a.m., chengxiang li wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34455/
> -----------------------------------------------------------
> 
> (Updated May 20, 2015, 2:37 a.m.)
> 
> 
> Review request for hive, Chao Sun, Jimmy Xiang, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10550
>     https://issues.apache.org/jira/browse/HIVE-10550
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira description
> 
> 
> Diffs
> -----
> 
>   common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 43c53fc 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/CacheTran.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/LocalHiveSparkClient.java 19d3fee

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapInput.java 26cfebd 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/MapTran.java 2170243 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ReduceTran.java e60dfac 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/RemoteHiveSparkClient.java 8b15099

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/ShuffleTran.java a774395 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlan.java ee5c78a 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java e6c845c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/status/impl/LocalSparkJobStatus.java
5d62596 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkMapJoinResolver.java
8e56263 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/SparkRddCachingResolver.java
PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SparkSkewJoinProcFactory.java
5990d17 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/spark/SplitSparkWorkResolver.java fb20080

>   ql/src/java/org/apache/hadoop/hive/ql/parse/spark/SparkCompiler.java 19aae70 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/SparkWork.java bb5dd79 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobContext.java af6332e 
>   spark-client/src/main/java/org/apache/hive/spark/client/JobContextImpl.java beed8a3

>   spark-client/src/main/java/org/apache/hive/spark/client/MonitorCallback.java e1e899e

>   spark-client/src/main/java/org/apache/hive/spark/client/RemoteDriver.java b77c9e8 
>   spark-client/src/test/java/org/apache/hive/spark/client/TestSparkClient.java d33ad7e

> 
> Diff: https://reviews.apache.org/r/34455/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> chengxiang li
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message