hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu" <rem...@microsoft.com>
Subject Re: Review Request 13059: HIVE-4850 Implement vector mode map join
Date Thu, 03 Oct 2013 14:20:45 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13059/
-----------------------------------------------------------

(Updated Oct. 3, 2013, 2:20 p.m.)


Review request for hive, Eric Hanson and Jitendra Pandey.


Bugs: HIVE-4850
    https://issues.apache.org/jira/browse/HIVE-4850


Repository: hive-git


Description (updated)
-------

This is a working implementation based on current trunk. It is simpler than the .1 patch in
as it delegates the JOIN entirely to the row-mode MapJoinOperator. The vectorized operator
is literally calling the row-mode implementaiton for each row in the input batch and collects
the row-mode forward into the output batch. This is not as bad as it seems because the JOIN
operators has to resort to row-mode operations anyway, due to the small tables (hashtables)
being row-mode (objects and object-inspectors). By delegating the entire join logic to the
row mode we piggyback on the correctness of exiting implementation. I do plan to come up with
a full-vectorized mode implementation but that would require changes to the hash table creation-serialization.
Note that the filtering and key evaluation of the big table does use vectorized operators.
the row mode applies only to the key HT lookup and to the JOIN logic


Diffs
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/CommonJoinOperator.java d320b47 
  ql/src/java/org/apache/hadoop/hive/ql/exec/JoinOperator.java 86db044 
  ql/src/java/org/apache/hadoop/hive/ql/exec/MapJoinOperator.java 153b8ea 
  ql/src/java/org/apache/hadoop/hive/ql/exec/OperatorFactory.java 8ab5395 
  ql/src/java/org/apache/hadoop/hive/ql/exec/SMBMapJoinOperator.java cde1a59 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ColumnVector.java 8b4c615 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssign.java PRE-CREATION 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorColumnAssignFactory.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorHashKeyWrapperBatch.java 9955d09

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapJoinOperator.java PRE-CREATION

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorReduceSinkOperator.java 6df3551

  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 02ebe14 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizedRowBatch.java ff13f89 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/expressions/VectorExpressionWriterFactory.java
9e189c9 
  ql/src/java/org/apache/hadoop/hive/ql/optimizer/physical/Vectorizer.java df1c5a6 
  ql/src/java/org/apache/hadoop/hive/ql/plan/MapWork.java a72ec8b 

Diff: https://reviews.apache.org/r/13059/diff/


Testing
-------

Manually run some join queries on alltypes_orc table.


Thanks,

Remus Rusanu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message