hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Remus Rusanu" <>
Subject Re: Review Request: HIVE-4595 Add support for string type keys in vectorized GROUP BY
Date Fri, 24 May 2013 09:37:59 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated May 24, 2013, 9:37 a.m.)

Review request for hive, Jitendra Pandey, Eric Hanson, and Sarvesh Sakalanaga.


Fix keyHash loop


Extend the VectorHashKeyWrapper and VectorHashKeyWrapperBatch to support ByteColumnVector
(ie. string) keys. The addition falls into the existing VectorKeyHashWrapper behavior: the
string keys support is 'compiled' once per query into a VectorHashKeyWrapperBatch instance.
The VectorHashKeyWrapper is extended to support byte[] key. It stores the key values just
like the ByteColumnVector class, by using a byte[][], a start int[] and a lenght int[]. During
batch processing ther eis no value copy, the keywrappers take a reference to the data from
the batch (ie. they refer the same byte[p] and copy the start/length). This avoids potentially
expensive size-of-key copy operations *before* the hash probe. The VectorHashKeyWrapper clonning
that occurs when a probe reveleas a missing key in the hash will copy the key (it must) and
this is the only time we copy the key values.

This addresses bug HIVE-4595.

Diffs (updated)

  ql/src/java/org/apache/hadoop/hive/ql/exec/ 35712d0 
  ql/src/java/org/apache/hadoop/hive/ql/exec/ c23614c 
  ql/src/java/org/apache/hadoop/hive/ql/exec/vector/ 1ef4955 
  ql/src/test/org/apache/hadoop/hive/ql/exec/vector/ b3b5cd2




Extended vectorized GROUP BY unit test to cover String keys for some cases.


Remus Rusanu

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message