drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex
Date Thu, 11 Jan 2018 17:30:01 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16322592#comment-16322592

ASF GitHub Bot commented on DRILL-4834:

Github user vvysotskyi commented on a diff in the pull request:

    --- Diff: exec/vector/src/main/codegen/templates/ComplexWriters.java ---
    @@ -99,7 +99,7 @@ public void write(Nullable${minor.class?cap_first}Holder h) {
       <#if !(minor.class == "Decimal9" || minor.class == "Decimal18" || minor.class ==
"Decimal28Sparse" || minor.class == "Decimal38Sparse" || minor.class == "Decimal28Dense" ||
minor.class == "Decimal38Dense")>
       public void write${minor.class}(<#list fields as field>${field.type} ${field.name}<#if
field_has_next>, </#if></#list>) {
    -    mutator.addSafe(idx(), <#list fields as field>${field.name}<#if field_has_next>,
    +    mutator.addSafe(idx(), <#list fields as field><#if field.name == "scale"><#break></#if>${field.name}<#if
field_has_next && fields[field_index+1].name != "scale" >, </#if></#list>);
    --- End diff --
    I meant to replace this line by something like this: 
    <#if minor.class == "VarDecimal">
    mutator.addSafe(idx()<#list fields as field><#if field.include!true>, ${field.name}</#if></#list>);
// or something better
    mutator.addSafe(idx(), <#list fields as field>${field.name}<#if field_has_next>,
    Since `field.include` is used only for Decimal types, we can replace all this code by
this line:
    mutator.addSafe(idx()<#list fields as field><#if field.include!true>, ${field.name}</#if></#list>);
    But do we use this method for VarDecimal somewhere? If not, it would be better to add
VarDecimal `minor.class` to the if condition with other decimal types above.
    BTW, please modify similar changes below in this class and in `RepeatedValueVectors.java`

> decimal implementation is vulnerable to overflow errors, and extremely complex
> ------------------------------------------------------------------------------
>                 Key: DRILL-4834
>                 URL: https://issues.apache.org/jira/browse/DRILL-4834
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.6.0
>         Environment: Drill 1.7 on any platform
>            Reporter: Dave Oshinsky
>            Assignee: Dave Oshinsky
>             Fix For: 1.13.0
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java template
to handle the situation where a precision is not supplied (i.e., the supplied precision is
zero) for an integer value that is to be casted to a decimal.  The Drill decimal implementation
uses a limited selection of fixed decimal precision data types (the total number of decimal
digits, i.e., Decimal9, 18, 28, 38) to represent decimal values.  If the destination precision
is too small to represent the input integer that is being casted, there is no clean way to
deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to more efficient
use of memory, it often will actually lead to less efficient use of memory (when the fixed
precision is specified significantly larger than is actually needed to represent the numbers),
and it results in a tremendous mushrooming of the complexity of the code.  For each fixed
precision (and there are only a limited set of selections, 9, 18, 28, 38, which itself leads
to memory inefficiency), there is a separate set of code generated from templates.  For each
pairwise combination of decimal or non-decimal numeric types, there are multiple places in
the code where conversions must be handled, or conditions must be included to handle the difference
in precision between the two types.  A one-size-fits-all approach (using a variable width
vector to represent any decimal precision) would usually be more memory-efficient (since precisions
are often over-specified), and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.

This message was sent by Atlassian JIRA

View raw message