hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt McCline (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HIVE-15335) Fast Decimal
Date Thu, 15 Dec 2016 01:08:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749966#comment-15749966
] 

Matt McCline edited comment on HIVE-15335 at 12/15/16 1:08 AM:
---------------------------------------------------------------

I have great difficulty accepting that DecimalColumnVector is now a public API.  I haven't
even begun to think of all the problems this will create.

I made quite a number of changes to HiveDecimal and HiveDecimalWritable.  Not just the internals
but to the interfaces.  For example, current HiveDecimalWritable is very slow because it internally
represents decimals as BigInteger binary bytes. It exposes the binary bytes through getInternalStorage().
  I zapped that immediately.  The compatibility I designed for was serialization/deserialization
of binary bits and text and decimal execution behavior -- not code compatibility.  Binary
bit compatibility ensures ORC will be able to read/write the same information.  The TestHiveDecimal
class verifies that the binary bit compatibility with SerializationUtils (ORC’s serialization),
with BigInteger binary bit compatibility (LazyBinary, Avro, Parquet), and same behavior with
OldHiveDecimal/OldHiveDecimalWritable (the original HiveDecimal/HiveDecimalWritable renamed).
 I needed to be able to make major code changes (the core fast decimal implementation class
is 9,000 lines) to get good performance with ORC serialization/deserialization of decimals
and with all other decimal operations (except division/remainder).  Matching the semantics
of Hive decimals and BigDecimal that execute quickly is quite challenging.

I need to be able to take a hammer to the code in the future to get good performance.  I've
done some experimenting improving the performance of HiveChar/HiveVarchar and its writables.
 Very little of the original code will survive -- just like with fast decimals.


was (Author: mmccline):
I have great difficulty accepting that DecimalColumnVector is now a public API.  Gunther will
need to take that up with you.

I made quite a number of changes to HiveDecimal and HiveDecimalWritable.  Not just the internals
but to the interfaces.  For example, current HiveDecimalWritable is very slow because it internally
represents decimals as BigInteger binary bytes. It exposes the binary bytes through getInternalStorage().
  I zapped that immediately.  The compatibility I designed for was serialization/deserialization
of binary bits and text and decimal execution behavior -- not code compatibility.  Binary
bit compatibility ensures ORC will be able to read/write the same information.  The TestHiveDecimal
class verifies that the binary bit compatibility with SerializationUtils (ORC’s serialization),
with BigInteger binary bit compatibility (LazyBinary, Avro, Parquet), and same behavior with
OldHiveDecimal/OldHiveDecimalWritable (the original HiveDecimal/HiveDecimalWritable renamed).
 I needed to be able to make major code changes (the core fast decimal implementation class
is 9,000 lines) to get good performance with ORC serialization/deserialization of decimals
and with all other decimal operations (except division/remainder).  Matching the semantics
of Hive decimals and BigDecimal that execute quickly is quite challenging.

I need to be able to take a hammer to the code in the future to get good performance.  I've
done some experimenting improving the performance of HiveChar/HiveVarchar and its writables.
 Very little of the original code will survive -- just like with fast decimals.

> Fast Decimal
> ------------
>
>                 Key: HIVE-15335
>                 URL: https://issues.apache.org/jira/browse/HIVE-15335
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Matt McCline
>            Assignee: Matt McCline
>            Priority: Critical
>         Attachments: HIVE-15335.01.patch, HIVE-15335.02.patch, HIVE-15335.03.patch, HIVE-15335.04.patch,
HIVE-15335.05.patch, HIVE-15335.06.patch, HIVE-15335.07.patch, HIVE-15335.08.patch, HIVE-15335.09.patch,
HIVE-15335.091.patch, HIVE-15335.092.patch
>
>
> Replace HiveDecimal implementation that currently represents the decimal internally as
a BigDecimal with a faster version that does not allocate extra objects
> Replace HiveDecimalWritable implementation with a faster version that has new mutable*
calls (e.g. mutableAdd, mutableEnforcePrecisionScale, etc) and stores the result as a fast
decimal instead of a slow byte array containing a serialized BigInteger.
> Provide faster ways to serialize/deserialize decimals.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message