drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5419) Calculate return string length for literals & some string functions
Date Thu, 27 Apr 2017 16:18:04 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15986927#comment-15986927
] 

ASF GitHub Bot commented on DRILL-5419:
---------------------------------------

Github user paul-rogers commented on a diff in the pull request:

    https://github.com/apache/drill/pull/819#discussion_r113605192
  
    --- Diff: common/src/main/java/org/apache/drill/common/types/Types.java ---
    @@ -27,10 +27,14 @@
     import org.apache.drill.common.types.TypeProtos.MinorType;
     
     import com.google.protobuf.TextFormat;
    +import org.apache.drill.common.util.CoreDecimalUtility;
     
     public class Types {
       static final org.slf4j.Logger logger = org.slf4j.LoggerFactory.getLogger(Types.class);
     
    +  public static final int MAX_VARCHAR_LENGTH = 65536;
    --- End diff --
    
    65535? Largest that will fit in 16 bits?
    
    Actually, why is this the limit? Is there any technical reason we can't handle, say, 1
MB strings?


> Calculate return string length for literals & some string functions
> -------------------------------------------------------------------
>
>                 Key: DRILL-5419
>                 URL: https://issues.apache.org/jira/browse/DRILL-5419
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Arina Ielchiieva
>            Assignee: Arina Ielchiieva
>         Attachments: version_with_cast.JPG
>
>
> Though Drill is schema-less and cannot determine in advance what the length of the column
should be but if query has an explicit type/length specified, Drill should return correct
column length.
> For example, JDBC / ODBC Driver is ALWAYS returning 64K as the length of a varchar or
char even if casts are applied.
> Changes:
> *LITERALS*
> String literals length is the same as actual literal length.
> Example: for 'aaa' return length is 3.
> *CAST*
> Return length is the one indicated in cast expression. This also applies when user has
created view where each string columns was casted to varchar with some specific length.
> This length will be returned to the user without need to apply cast one more time. Below
mentioned functions can take leverage of underlying varchar length and calculate return length.
> *LOWER, UPPER, INITCAP, REVERSE, FIRST_VALUE, LAST_VALUE* 
> Return length is underlying column length, i.e. if column is known, the same length will
be returned.
> Example:
> lower(cast(col as varchar(30))) will return 30.
> lower(col) will return max varchar length, since we don't know actual column length.
> *LAG, LEAD*
> Return length is underlying column length but column type will be nullable.
> *LPAD, RPAD*
> Pads the string to the length specified. Return length is this specified length. 
> *CONCAT, CONCAT OPERATOR (||)*
> Return length is sum of underlying columns length. If length is greater then varchar
max length,  varchar max length is returned.
> *SUBSTR, SUBSTRING, LEFT, RIGHT*
> Calculates return length according to each function substring rules, for example, taking
into account how many char should be substracted.
> *IF EXPRESSIONS (CASE STATEMENT, COALESCE), UNION OPERATOR*
> When combining string columns with different length, return length is max from source
columns.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message