hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-15335) Fast Decimal
Date Wed, 21 Dec 2016 20:22:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768058#comment-15768058
] 

Matt McCline edited comment on HIVE-15335 at 12/21/16 8:22 PM:
---------------------------------------------------------------

Query benchmark on V1 showed very, very high cost in HiveDecimalWritable (serialization/deserialization,
creation of HiveDecimal for getHiveDecimal), in ORC decimal deserialization (BigInteger).

The cost of V1 decimal add turns out not to be add but the cost of HiveDecimalWritable.getDecimal()
and then serializing in back into BigInteger bytes for HiveDecimalWritable.set.  Everywhere
code was doing a getHiveDecimal to pass it around between components.

Making HiveDecimalWritable a fast, first class citizen was major part of this change.  That
included making HiveDecimalWritable the object of choice to pass around or operate on directly.
 E.g. Vectorized SUM aggregation eliminated almost call calls HiveDecimalWritable.getHiveDecimal()
for its summing.

One query benchmark on the new code showed 3X improvement and the add method cost was in the
noise.  So storing decimals in 1 long instead of 3 (i.e. so called fast path) isn't the place
to look.  Microbenchmarks on add cost miss the boat.  The fast path is using HiveDecimalWritable.mutableAdd
and the fast V2 serialization/deserialization methods including the HiveDecimal.create family
/ HiveDecimalWritable.set family.  Another way of thinking about the fast path is not using
BigInteger / BigDecimal.


was (Author: mmccline):
Query benchmark on V1 showed very, very high cost in HiveDecimalWritable (serialization/deserialization,
creation of HiveDecimal for getHiveDecimal), in ORC decimal deserialization (BigInteger).

The cost of V1 decimal add turns out not to be add but the cost of HiveDecimalWritable.getDecimal()
and then serializing in back into BigInteger bytes for HiveDecimalWritable.set.  Everywhere
code was doing a getHiveDecimal to pass it around between components.

Making HiveDecimalWritable a fast, first class citizen was major part of this change.  That
included making HiveDecimalWritable the object of choice to pass around or operate on directly.
 E.g. Vectorized SUM aggregation eliminated almost call calls HiveDecimalWritable.getHiveDecimal()
for its summing.

One query benchmark on the new code showed 3X improvement and the add method cost was in the
noise.  So storing decimals in 1 long instead of 3 (i.e. so called fast path isn't the place
to look.  Microbenchmarks on add cost miss the boat.  The fast path is using HiveDecimalWritable.mutableAdd
and the fast V2 serialization/deserialization methods including the HiveDecimal.create family
/ HiveDecimalWritable.set family.

> Fast Decimal
> ------------
>
>                 Key: HIVE-15335
>                 URL: https://issues.apache.org/jira/browse/HIVE-15335
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, HIVE-15335.03.patch, HIVE-15335.04.patch,
HIVE-15335.05.patch, HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, HIVE-15335.09.patch,
HIVE-15335.091.patch, HIVE-15335.092.patch, HIVE-15335.093.patch, HIVE-15335.094.patch, HIVE-15335.095.patch,
HIVE-15335.096.patch, HIVE-15335.097.patch, HIVE-15335.098.patch, HIVE-15335.099.patch, HIVE-15335.0991.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal internally as
a BigDecimal with a faster version that does not allocate extra objects
> Replace HiveDecimalWritable implementation with a faster version that has new mutable*
calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and stores the result as a fast
decimal instead of a slow byte array containing a serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message