hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Howard (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-6017) Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive
Date Tue, 24 Feb 2015 23:13:04 GMT

    [ https://issues.apache.org/jira/browse/HIVE-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14335635#comment-14335635
] 

Michael Howard commented on HIVE-6017:
--------------------------------------

These are observations about the problems with "native" multiplication and division in Decimal128.


Hive 0.13 source code is visible at:
http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hive/hive-common/0.13.0/org/apache/hadoop/hive/common/type/Decimal128.java

The "native" Decimal128 multiply and divide code is not in use. 

Instead, there is a comment which says:
> As of 2/11/2014 this has a known bug in multiplication. ...
> The fix employed now is to replace this code with code that uses Java BigDecimal multiply.


The fundamental problem with the multiply code is that UnsignedInt128.multiplyArrays4And4To4NoOverflow()
is broken. 

There are multiple stages where the intermediate/temp variable 'product' is calculated. 
Unfortunately, carry information is being lost (out the top) whenever the results of two or
more multiplications are being summed together. 

It is perfectly fine to say:
  uint64 = uint32 * uint32
or
  uint64 = uint32 * uint32 + carry

However, the following is invalid because it will lose (carry) information
 uint64 = (uint32 * uint32) + (uint32 * uint32)

an example case is:
0xFFFFFFFFL * 0xFFFFFFFFL + 0xFFFFFFFFL * 0xFFFFFFFFL

This sum cannot be held in a uint64 ... a carry out the top is lost. 

This is easier to envision if you do the same operation with 8=>16 bits instead of 32=>64
bits
0xFF * 0xFF = 0xFE01 // ok
0xFF * 0xFF + 0xFF = 0xFE01 + 0xFF = 0xFEFF // ok
0xFF * 0xFF + 0xFF * 0xFF = 0xFE01 + 0xFE01 = 0x1FC02 // not ok ... overflows 16 bits ...
carry lost

The UnsignedInt128.multiplyArrays4And4To4NoOverflow() and UnsignedInt128.multiplyArrays4And4To8()
methods use this pattern repeatedly, and therefore are broken. 

These routines need to be implemented in such a way that the right-hand-side contains no more
than a single multiply (uint32 * uint32) and a single addition (+ uint32)

This is fine:
1899    product = (right[0] & SqlMathUtil.LONG_MASK)
1900        * (left[0] & SqlMathUtil.LONG_MASK);
1901    int z0 = (int) product;

This is not ... the sum will overflow and 'product' will have lost the carry
1903    product = (right[0] & SqlMathUtil.LONG_MASK)
1904        * (left[1] & SqlMathUtil.LONG_MASK)
1905        + (right[1] & SqlMathUtil.LONG_MASK)
1906        * (left[0] & SqlMathUtil.LONG_MASK) + (product >>> 32);
1907    int z1 = (int) product;

This code needs to be rewritten as a sum of terms where each term is a uint32[4]*uint32 which
is left shifted. 

UnsignedInt128.multiplyDestructive(int) and UnsignedInt128.addDestructive(int[]) are just
fine. SqlMathUtil.multiplyMultiprecision(int[], int) is also fine. 

UnsignedInt128.multiplyArrays4And4To4NoOverflow() and UnsignedInt128.multiplyArrays4And4To8()
 need to be implemented using these as building blocks. 

The goal of this was to provide "high-performance". The potential benefit comes from fact
that these objects are mutable and it avoids the object creation/GC associated with BigDecimal.
Therefore, it is a little painful to see:

1173  public void More ...multiplyDestructive(Decimal128 right, short newScale) {
1174    HiveDecimal rightHD = HiveDecimal.create(right.toBigDecimal());
1175    HiveDecimal thisHD = HiveDecimal.create(this.toBigDecimal());
1176    HiveDecimal result = thisHD.multiply(rightHD);
...
1185    this.update(result.bigDecimalValue().toPlainString(), newScale);


Enough on multiplication. Regarding division ...

1240  public void More ...divideDestructiveNativeDecimal128

has a comment:
  As of 1/20/2014 this has a known bug in division. See TestDecimal128.testKnownPriorErrors().


TestDecimal128.testKnownPriorErrors uses both Decimal128.multiplication and Decimal128.division.
We know that there is a problem with multiplication. Unclear whether or not there is a problem
with Decimal128.division.

I have not investigated Decimal128.division in detail. 
Based upon a quick glance, I did not see any fundamental problems with SqlMathUtil.divideMultiPrecision().
So the fundamental underpinnings might be OK. 

However, I did not see any reference to RoundingMode behavior in Decimal128.divideDestructiveNativeDecimal128().

Decimal128 should produce the same results as BigDecimal ... that is at least desired, and
probably required. 
In order to accomplish this Decimal128 will need to deal with RoundingModes ... or at least
round consistently with RoundingMode.HALF_UP.


> Contribute Decimal128 high-performance decimal(p, s) package from Microsoft to Hive
> -----------------------------------------------------------------------------------
>
>                 Key: HIVE-6017
>                 URL: https://issues.apache.org/jira/browse/HIVE-6017
>             Project: Hive
>          Issue Type: Sub-task
>    Affects Versions: 0.13.0
>            Reporter: Eric Hanson
>            Assignee: Eric Hanson
>             Fix For: 0.13.0
>
>         Attachments: HIVE-6017.01.patch, HIVE-6017.02.patch, HIVE-6017.03.patch, HIVE-6017.04.patch
>
>
> Contribute the Decimal128 high-performance decimal package developed by Microsoft to
Hive. This was originally written for Microsoft PolyBase by Hideaki Kimura.
> This code is about 8X more efficient than Java BigDecimal for typical operations. It
uses a finite (128 bit) precision and can handle up to decimal(38, X). It is also "mutable"
so you can change the contents of an existing object. This helps reduce the cost of new()
and garbage collection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message