drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Oshinsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4704) select statement behavior is inconsistent for decimal values in parquet
Date Thu, 04 Aug 2016 15:13:20 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15407913#comment-15407913
] 

Dave Oshinsky commented on DRILL-4704:
--------------------------------------

Thanks Jinfeng for the very clear explanation, and pointer to ExpressionTreeMaterializer.java.
 Rather than change ExpressionTreeMaterializer.java, with which I am completely unfamiliar,
I added another change to CastIntDecimal.java to only calculate the precision when the supplied
one is zero.  I hope the amended fix is sufficient for now.  There might be other cases
where the supplied precision is invalid (zero), and this change will handle those. 

    On Wednesday, August 3, 2016 7:18 PM, ASF GitHub Bot (JIRA) <jira@apache.org> wrote:
 

 
    [ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406800#comment-15406800
] 

ASF GitHub Bot commented on DRILL-4704:
---------------------------------------

Github user jinfengni commented on the issue:

    https://github.com/apache/drill/pull/517
  
    I agreed with Aman that the eval() of CastIntDecimal28Sparse() should not ignore the
specified scale/precision, if they are passed as parameters. 
    
    As you analyzed, the problem is the scale/precision is 0 for a int constant value, which
is not right. We should fix place where scale/precision is specified wrongly.  Ignoring scale/precision
and re-calcualting them in eval() means Drill has to apply that logic per input row.
    
    The reason that you saw eval() is called only once:
    Filter employee_id = 170, 
      is  employee_id = castIntDecimal28Sparse(170, scale, precision). // scale and precision
is wrongly 0.
    
    Since RHS is a constant, Drill will evaluate only once, even you have multiple rows.
However, this is not true, if it's not a constant expression. 
    
    To me, you point to the right cause of the problem. But we need get better place to
apply the fix ( not in eval() ) method. 
    
    The place where we get scale/precision as 0 
    
    https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/expr/ExpressionTreeMaterializer.java#L315-L316
    






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)




> select statement behavior is inconsistent for decimal values in parquet
> -----------------------------------------------------------------------
>
>                 Key: DRILL-4704
>                 URL: https://issues.apache.org/jira/browse/DRILL-4704
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.6.0
>         Environment: Windows 7 Pro, Java 1.8.0_91
>            Reporter: Dave Oshinsky
>             Fix For: Future
>
>
> A select statement that searches a parquet file for a decimal value matching a specific
value behaves inconsistently.  The query expressed most simply finds nothing:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
= 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  | HIRE_DATE |
> +--------------+-------------+------------+--------+---------------+-----------+
> +--------------+-------------+------------+--------+---------------+-----------+
> No rows selected (0.348 seconds)
> The query can be modified to find the matching row in a few ways, such as the following
(using between instead of '=', changing 100 to 100.0, or casting as decimal:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
between 100 and 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.226 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
= 100.0;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.259 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where cast(employee_id
AS DECIMAL) = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.232 seconds)
> 0: jdbc:drill:zk=local>
> The schema of the parquet data that is being searched is as follows:
> $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
> file:           file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
> creator:        parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
> .....
> file schema:    HR.EMPLOYEES
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:    REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
> FIRST_NAME:     OPTIONAL BINARY O:UTF8 R:0 D:1
> LAST_NAME:      REQUIRED BINARY O:UTF8 R:0 D:0
> EMAIL:          REQUIRED BINARY O:UTF8 R:0 D:0
> PHONE_NUMBER:   OPTIONAL BINARY O:UTF8 R:0 D:1
> HIRE_DATE:      REQUIRED BINARY O:UTF8 R:0 D:0
> JOB_ID:         REQUIRED BINARY O:UTF8 R:0 D:0
> SALARY:         OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> MANAGER_ID:     OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> DEPARTMENT_ID:  OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1:    RC:107 TS:9943 OFFSET:4
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:     FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 VC:107 ENC:PLAIN,BIT_PACKED
> FIRST_NAME:      BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> LAST_NAME:       BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 ENC:PLAIN,BIT_PACKED
> EMAIL:           BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 ENC:PLAIN,BIT_PACKED
> PHONE_NUMBER:    BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> HIRE_DATE:       BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
> JOB_ID:          BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
> SALARY:          FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> COMMISSION_PCT:  FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> MANAGER_ID:      FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> DEPARTMENT_ID:   FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 VC:107 ENC:PLAIN,RLE,BIT_PACKED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message