hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rui Li" <rui...@intel.com>
Subject Re: Review Request 30739: HIVE-9574 Lazy computing in HiveBaseFunctionResultList may hurt performance [Spark Branch]
Date Tue, 10 Feb 2015 01:41:19 GMT


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> >
> 
> Rui Li wrote:
>     Some high level question, do we still need two buffers? And does it make sense to
use something like a queue instead of an array as the buffer?
> 
> Jimmy Xiang wrote:
>     Queue should work too. Using too buffers makes it easier to switch between read and
write. Switching itself is cheap here. For RowContainer, it is expensive to switch because
of first()/clear(), etc.

Thanks for the explanation Jimmy. I was just wondering if we can use a single queue as the
buffer and avoid switching between two arrays and managing the cusors. That should make it
less complicated right?


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, line 54
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line54>
> >
> >     If I understand correctly, this can be renamed to something like IN_MEMORY_NUM_ROWS?
> 
> Jimmy Xiang wrote:
>     Yes, you are right. Both are ok. Any strong reason for renaming it?

No, I just feel cache size is more like some size in bytes.


> On Feb. 9, 2015, 2:51 a.m., Rui Li wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java, line 236
> > <https://reviews.apache.org/r/30739/diff/4/?file=853475#file853475line236>
> >
> >     I suppose this is to avoid frequent switch buffer? But why the magic number
1?
> 
> Jimmy Xiang wrote:
>     Right. If it is 1, there is no need to switch buffer. For other number, we need to
switch anyway. I assume there are many scenarios that there is just one row.

I see thanks.


- Rui


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30739/#review71597
-----------------------------------------------------------


On Feb. 9, 2015, 7:41 p.m., Jimmy Xiang wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/30739/
> -----------------------------------------------------------
> 
> (Updated Feb. 9, 2015, 7:41 p.m.)
> 
> 
> Review request for hive, Rui Li and Xuefu Zhang.
> 
> 
> Bugs: HIVE-9574
>     https://issues.apache.org/jira/browse/HIVE-9574
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> Result KV cache doesn't use RowContainer any more since it has logic we don't need, which
is some overhead. We don't do lazy computing right away, instead we wait a little till the
cache is close to spill.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveBaseFunctionResultList.java 78ab680

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveKVResultCache.java 8ead0cb 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunction.java 7a09b4d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveMapFunctionResultList.java e92e299

>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunction.java 070ea4d 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/HiveReduceFunctionResultList.java
d4ff37c 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/spark/KryoSerializer.java 286816b 
>   ql/src/test/org/apache/hadoop/hive/ql/exec/spark/TestHiveKVResultCache.java 0df4598

> 
> Diff: https://reviews.apache.org/r/30739/diff/
> 
> 
> Testing
> -------
> 
> Unit test, test on cluster
> 
> 
> Thanks,
> 
> Jimmy Xiang
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message