hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu" <rem...@microsoft.com>
Subject Re: Review Request: HIVE-4595 Add support for string type keys in vectorized GROUP BY
Date Fri, 24 May 2013 09:37:59 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/11345/
-----------------------------------------------------------

(Updated May 24, 2013, 9:37 a.m.)


Review request for hive, Jitendra Pandey, Eric Hanson, and Sarvesh Sakalanaga.


Changes
-------

Fix keyHash loop


Description
-------

Extend the VectorHashKeyWrapper and VectorHashKeyWrapperBatch to support ByteColumnVector
(ie. string) keys. The addition falls into the existing VectorKeyHashWrapper behavior: the
string keys support is 'compiled' once per query into a VectorHashKeyWrapperBatch instance.
The VectorHashKeyWrapper is extended to support byte[] key. It stores the key values just
like the ByteColumnVector class, by using a byte[][], a start int[] and a lenght int[]. During
batch processing ther eis no value copy, the keywrappers take a reference to the data from
the batch (ie. they refer the same byte[p] and copy the start/length). This avoids potentially
expensive size-of-key copy operations *before* the hash probe. The VectorHashKeyWrapper clonning
that occurs when a probe reveleas a missing key in the hash will copy the key (it must) and
this is the only time we copy the key values.


This addresses bug HIVE-4595.
    https://issues.apache.org/jira/browse/HIVE-4595


Diffs (updated)
-----

  ql/src/java/org/apache/hadoop/hive/ql/exec/VectorHashKeyWrapper.java 35712d0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/VectorHashKeyWrapperBatch.java c23614c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorizationContext.java 1ef4955 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/FakeVectorRowBatchFromObjectIterables.java
PRE-CREATION 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/TestVectorGroupByOperator.java b3b5cd2

  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromIterables.java
cf3399d 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/util/FakeVectorRowBatchFromLongIterables.java
PRE-CREATION 

Diff: https://reviews.apache.org/r/11345/diff/


Testing
-------

Extended vectorized GROUP BY unit test to cover String keys for some cases.


Thanks,

Remus Rusanu


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message