hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remus Rusanu <rem...@microsoft.com>
Subject RE: Review Request 13059: HIVE-4850 Implement vector mode map join
Date Mon, 11 Nov 2013 20:13:41 GMT
Thanks for the review!
I will probably have to add a JIRA to address them, otherwise I don't have a vehicle for submitting
the patch :)

-----Original Message-----
From: Eric Hanson [mailto:noreply@reviews.apache.org] On Behalf Of Eric Hanson
Sent: Monday, November 11, 2013 10:09 PM
To: Jitendra Pandey; Eric Hanson (SQL SERVER)
Cc: Remus Rusanu; hive
Subject: Re: Review Request 13059: HIVE-4850 Implement vector mode map join


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/#review28671
-----------------------------------------------------------



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java
<https://reviews.apache.org/r/13059/#comment55588>

    Nice to see good comments



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java
<https://reviews.apache.org/r/13059/#comment55589>

    A comment that explains at a high level where and how this interface is used would be
helpful.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55593>

    should these fields be marked private?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55594>

    Please add a comment that explains this method.
    
    Should this be a new method on ByteColumnVector or can you use BytesColumnVector.setVal()?
    
    setVal automatically extends the internal buffer if needed.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55612>

    Please add a comment to explain the purpose of this method
    



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55617>

    conventions are to put blanks before and after operators =, < etc.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55616>

    The Sun Java coding style conventions that are used for Hive say to use this style:
    
    } else [if (...)] {
      ...
    }



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55618>

    Please add a descriptive comment for this method



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java
<https://reviews.apache.org/r/13059/#comment55622>

    and the what?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55627>

    remove blank comment?



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55629>

    correct spelling of Vectorization



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55632>

    supper -> super?
    Please explain what out-of-band params are.



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java
<https://reviews.apache.org/r/13059/#comment55635>

    colon should be surounded by blanks



ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java
<https://reviews.apache.org/r/13059/#comment55653>

    Please add a comment that describes what this method does.



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55654>

    Please add a comment



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55649>

    Please add a comment explaining what's done by this method



ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java
<https://reviews.apache.org/r/13059/#comment55650>

    Please add a descriptive comment for this method


This looks good. I made a bunch of stylistic comments. Could you also add a page or so of
design description to the design document for vectorized query execution attached to HIVE-4160?
Thanks Remus. -Eric

- Eric Hanson


On Oct. 12, 2013, 9:51 p.m., Remus Rusanu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13059/
> -----------------------------------------------------------
> 
> (Updated Oct. 12, 2013, 9:51 p.m.)
> 
> 
> Review request for hive, Eric Hanson and Jitendra Pandey.
> 
> 
> Bugs: HIVE-4850
>     https://issues.apache.org/jira/browse/HIVE-4850
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This is a working implementation based on current trunk. It is simpler than the .1 patch
in as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator
is literally calling the row-mode implementaiton for each row in the input batch and collects
the row-mode forward into the output batch. This is not as bad as it seems because the JOIN
operators has to resort to row-mode operations anyway, due to the small tables (hashtables)
being row-mode (objects and object-inspectors). By delegating the entire join logic to the
row mode we piggyback on the correctness of exiting implementation. I do plan to come up with
a full-vectorized mode implementation but that would require changes to the hash table creation-serialization.
Note that the filtering and key evaluation of the big table does use vectorized operators.
the row mode applies only to the key HT lookup and to the JOIN logic
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/JoinUtil.java fa9ee35 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 54f2644 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorSelectOperator.java 0fb763a

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 8f10644

>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
9e189c9 
>   ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java 02c32cb 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b 
>   ql/src/test/queries/clientpositive/vectorized_mapjoin.q PRE-CREATION 
>   ql/src/test/results/clientpositive/vectorized_mapjoin.q.out PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/13059/diff/
> 
> 
> Testing
> -------
> 
> Manually run some join queries on alltypes_orc table.
> 
> 
> Thanks,
> 
> Remus Rusanu
> 
>

Mime
View raw message