hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jimmy Xiang" <jxi...@cloudera.com>
Subject Re: Review Request 33251: HIVE-10302 Cache small tables in memory [Spark Branch]
Date Wed, 22 Apr 2015 16:27:02 GMT


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java, line 120
> > <https://reviews.apache.org/r/33251/diff/2/?file=937278#file937278line120>
> >
> >     1. For clarity, it might be good to put this in a separate private method.
> >     2. Does it work if we just synchronize on mapJoinTables[pos]?

1. Have put it in a separate method. 2. Probabl it does work to synchronize on mapJoinTables[pos]
since it is not the same object in different tasks.


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java, line 52
> > <https://reviews.apache.org/r/33251/diff/2/?file=937280#file937280line52>
> >
> >     Method naming, see below.

Fixed.  Changed the method a little so that it doesn't do more than clean up small table caches.


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 51
> > <https://reviews.apache.org/r/33251/diff/2/?file=937281#file937281line51>
> >
> >     Using thread-local makes me a little nervous, but let's discuss about this offline.

Per our discussion, we don't use thread-local any more.


> On April 21, 2015, 2:17 p.m., Xuefu Zhang wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java, line 60
> > <https://reviews.apache.org/r/33251/diff/2/?file=937281#file937281line60>
> >
> >     The method name suggests no indication of a side effect of setting thread local
value. We'd better put this outside of this method.
> >     
> >     In addition, the method name seems also a little confusing in that it suggests
cleanup is for sure but in fact it's conditional.

Removed the thread local variable usage.  As to the method name, it means we clean up the
cache a little, it doesn't mean we remove all the cahced contents. I added some javadoc to
make it a little clear.


- Jimmy


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/33251/#review80969
-----------------------------------------------------------


On April 21, 2015, 1:37 a.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/33251/
> -----------------------------------------------------------
> 
> (Updated April 21, 2015, 1:37 a.m.)
> 
> 
> Review request for hive, Chao Sun, Szehon Ho, and Xuefu Zhang.
> 
> 
> Bugs: HIVE-10302
>     https://issues.apache.org/jira/browse/HIVE-10302
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Cached the small table containter so that mapjoin tasks can use it if the task is executed
on the same Spark executor.
> The cache is released right before the next job after the mapjoin job is done.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HashTableLoader.java fe108c4 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkPlanGenerator.java 3f240f5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkRecordHandler.java 97b3471 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/SparkUtilities.java 72ab913 
> 
> Diff: https://reviews.apache.org/r/33251/diff/
> 
> 
> Testing
> -------
> 
> Ran several queries in live cluster. ptest pending.
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message