drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4834) decimal implementation is vulnerable to overflow errors, and extremely complex
Date Tue, 09 Jan 2018 18:53:02 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16318931#comment-16318931

ASF GitHub Bot commented on DRILL-4834:

Github user vvysotskyi commented on a diff in the pull request:

    --- Diff: exec/java-exec/src/main/codegen/templates/Decimal/DecimalFunctions.java ---
    @@ -102,7 +111,578 @@
     <#-- For each DECIMAL... type (in DecimalTypes.tdd) ... -->
     <#list comparisonTypesDecimal.decimalTypes as type>
    -<#if type.name.endsWith("Sparse")>
    +<#if type.name.endsWith("VarDecimal")>
    +<@pp.changeOutputFile name="/org/apache/drill/exec/expr/fn/impl/${type.name}Functions.java"
    +<#include "/@includes/license.ftl" />
    +package org.apache.drill.exec.expr.fn.impl;
    +<#include "/@includes/vv_imports.ftl" />
    +import org.apache.drill.exec.expr.DrillSimpleFunc;
    +import org.apache.drill.exec.expr.annotations.FunctionTemplate;
    +import org.apache.drill.exec.expr.annotations.FunctionTemplate.NullHandling;
    +import org.apache.drill.exec.expr.annotations.Output;
    +import org.apache.drill.exec.expr.annotations.Param;
    +import org.apache.drill.exec.expr.annotations.Workspace;
    +import org.apache.drill.exec.expr.fn.FunctionGenerationHelper;
    +import org.apache.drill.exec.expr.holders.*;
    +import org.apache.drill.exec.record.RecordBatch;
    +import io.netty.buffer.ByteBuf;
    +import io.netty.buffer.DrillBuf;
    +import java.nio.ByteBuffer;
    +public class ${type.name}Functions {
    +    private static void initBuffer(DrillBuf buffer) {
    +        // for VarDecimal, this method of setting initial size is actually only a very
rough heuristic.
    +        int size = (${type.storage} * (org.apache.drill.exec.util.DecimalUtility.INTEGER_SIZE));
    +        buffer = buffer.reallocIfNeeded(size);
    +     }
    +    @FunctionTemplate(name = "subtract", scope = FunctionTemplate.FunctionScope.DECIMAL_ADD_SCALE,
nulls = NullHandling.NULL_IF_NULL)
    --- End diff --
    Please specify `FunctionTemplate` correct scope, returnType and checkPrecisionRange for
all functions in this class.

> decimal implementation is vulnerable to overflow errors, and extremely complex
> ------------------------------------------------------------------------------
>                 Key: DRILL-4834
>                 URL: https://issues.apache.org/jira/browse/DRILL-4834
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.6.0
>         Environment: Drill 1.7 on any platform
>            Reporter: Dave Oshinsky
>             Fix For: Future
> While working on a fix for DRILL-4704, logic was added to CastIntDecimal.java template
to handle the situation where a precision is not supplied (i.e., the supplied precision is
zero) for an integer value that is to be casted to a decimal.  The Drill decimal implementation
uses a limited selection of fixed decimal precision data types (the total number of decimal
digits, i.e., Decimal9, 18, 28, 38) to represent decimal values.  If the destination precision
is too small to represent the input integer that is being casted, there is no clean way to
deal with the overflow error properly.
> While using fixed decimal precisions as is being done currently can lead to more efficient
use of memory, it often will actually lead to less efficient use of memory (when the fixed
precision is specified significantly larger than is actually needed to represent the numbers),
and it results in a tremendous mushrooming of the complexity of the code.  For each fixed
precision (and there are only a limited set of selections, 9, 18, 28, 38, which itself leads
to memory inefficiency), there is a separate set of code generated from templates.  For each
pairwise combination of decimal or non-decimal numeric types, there are multiple places in
the code where conversions must be handled, or conditions must be included to handle the difference
in precision between the two types.  A one-size-fits-all approach (using a variable width
vector to represent any decimal precision) would usually be more memory-efficient (since precisions
are often over-specified), and would greatly simplify the code.
> Also see the DRILL-4184 issue, which is related.

This message was sent by Atlassian JIRA

View raw message