hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Hanson (JIRA)" <>
Subject [jira] [Commented] (HIVE-5762) Implement vectorized support for the DECIMAL data type
Date Mon, 09 Dec 2013 22:04:07 GMT


Eric Hanson commented on HIVE-5762:

I'm thinking about using this basic structure for a decimal column vector for limited-precision
decimals. Then a utility package of static functions can be implemented to do decimal arithmetic
on individual values. It should be possible to make this a lot faster than if the code relies
on java.math.BigDecimal, because it is less general, and because new() and garbage collection
will be reduced.

public class DecimalColumnVector extends ColumnVector {
  public int precision; // precision of all elements in vector (max 38)
  public int scale;     // scale of all elements in vector (max 38)
  public static final int WORDS_PER_VALUE = 4;

   * Logically a vector of 128 bit unsigned int, that is "little-endian."  This
   * means that for a value v, v[0] is least significant. The 4-word
   * 32 bit values are treated as unsigned. However,the high-order bit
   * of the highest word (word 3) must be 0.
  public int[][] vector;
  public byte[] sign;  // -1 if negative, 0 if zero, 1 if positive

  public DecimalColumnVector() {
    final int len = VectorizedRowBatch.DEFAULT_SIZE;
    vector = new int[len][];
    for (int i = 0; i < len; i++) {
      vector[i] = new int[WORDS_PER_VALUE];
    sign = new byte[len];

> Implement vectorized support for the DECIMAL data type
> ------------------------------------------------------
>                 Key: HIVE-5762
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Eric Hanson
> Add support to allow queries referencing DECIMAL columns and expression results to run
efficiently in vectorized mode.  Include unit tests and end-to-end tests. 
> Before starting or at least going very far, please write design specification (a new
section for the design spec attached to HIVE-4160) for how support for the different DECIMAL
types should work in vectorized mode, and the roadmap, and have it reviewed. 
> It may be feasible to re-use LongColumnVector and related VectorExpression classes for
fixed-point decimal in certain data ranges. That should be at least considered to get faster
performance and save code. For unlimited precision DECIMAL, a new column vector subtype may
be needed, or a BytesColumnVector could be re-used.

This message was sent by Atlassian JIRA

View raw message