drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Volodymyr Vysotskyi (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex
Date Thu, 11 Jan 2018 11:49:00 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Volodymyr Vysotskyi reassigned DRILL-4834:

    Assignee: Volodymyr Vysotskyi

> decimal implementation is vulnerable to overflow errors, and extremely complex
> ------------------------------------------------------------------------------
>                 Key: DRILL-4834
>                 URL: https://issues.apache.org/jira/browse/DRILL-4834
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.6.0
>         Environment: Drill 1.7 on any platform
>            Reporter: Dave Oshinsky
>            Assignee: Volodymyr Vysotskyi
>             Fix For: 1.13.0
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java template
to handle the situation where a precision is not supplied (i.e., the supplied precision is
zero) for an integer value that is to be casted to a decimal.  The Drill decimal implementation
uses a limited selection of fixed decimal precision data types (the total number of decimal
digits, i.e., Decimal9, 18, 28, 38) to represent decimal values.  If the destination precision
is too small to represent the input integer that is being casted, there is no clean way to
deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to more efficient
use of memory, it often will actually lead to less efficient use of memory (when the fixed
precision is specified significantly larger than is actually needed to represent the numbers),
and it results in a tremendous mushrooming of the complexity of the code.  For each fixed
precision (and there are only a limited set of selections, 9, 18, 28, 38, which itself leads
to memory inefficiency), there is a separate set of code generated from templates.  For each
pairwise combination of decimal or non-decimal numeric types, there are multiple places in
the code where conversions must be handled, or conditions must be included to handle the difference
in precision between the two types.  A one-size-fits-all approach (using a variable width
vector to represent any decimal precision) would usually be more memory-efficient (since precisions
are often over-specified), and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.

This message was sent by Atlassian JIRA

View raw message