drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5419) Calculate return string length for literals & some string functions
Date Thu, 27 Apr 2017 16:18:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986925#comment-15986925

ASF GitHub Bot commented on DRILL-5419:

Github user paul-rogers commented on a diff in the pull request:

    --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java ---
    @@ -636,43 +658,63 @@ public static String toString(final MajorType type) {
        * Get the <code>precision</code> of given type.
    -   * @param majorType
    -   * @return
    +   *
    +   * @param majorType major type
    +   * @return precision value
       public static int getPrecision(MajorType majorType) {
    -    MinorType type = majorType.getMinorType();
    -    if (type == MinorType.VARBINARY || type == MinorType.VARCHAR) {
    -      return 65536;
    -    }
         if (majorType.hasPrecision()) {
           return majorType.getPrecision();
    -    return 0;
    +    return isScalarStringType(majorType) ? MAX_VARCHAR_LENGTH : UNDEFINED;
    --- End diff --
    Drill loves to compute things, such as widths, based on metadata. But, Drill is... schemaless!
That means that we don't generally have metadata. As a result, we have to compute things based
on made-up numbers, as we are doing here.
    If we want to tell the client the actual sizes, we should sample the first batch of data.
See `RecordBatchSizer` that computes average column width from data. Of course, you want maximum
width, which is more costly to compute.

> Calculate return string length for literals & some string functions
> -------------------------------------------------------------------
>                 Key: DRILL-5419
>                 URL: https://issues.apache.org/jira/browse/DRILL-5419
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>         Attachments: version_with_cast.JPG
> Though Drill is schema-less and cannot determine in advance what the length of the column
should be but if query has an explicit type/length specified, Drill should return correct
column length.
> For example, JDBC / ODBC Driver is ALWAYS returning 64K as the length of a varchar or
char even if casts are applied.
> Changes:
> String literals length is the same as actual literal length.
> Example: for 'aaa' return length is 3.
> *CAST*
> Return length is the one indicated in cast expression. This also applies when user has
created view where each string columns was casted to varchar with some specific length.
> This length will be returned to the user without need to apply cast one more time. Below
mentioned functions can take leverage of underlying varchar length and calculate return length.
> Return length is underlying column length, i.e. if column is known, the same length will
be returned.
> Example:
> lower(cast(col as varchar(30))) will return 30.
> lower(col) will return max varchar length, since we don't know actual column length.
> Return length is underlying column length but column type will be nullable.
> Pads the string to the length specified. Return length is this specified length. 
> Return length is sum of underlying columns length. If length is greater then varchar
max length,  varchar max length is returned.
> Calculates return length according to each function substring rules, for example, taking
into account how many char should be substracted.
> When combining string columns with different length, return length is max from source

This message was sent by Atlassian JIRA

View raw message