drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Oshinsky (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-4704) select statement behavior is inconsistent for decimal values in parquet
Date Thu, 02 Jun 2016 15:05:59 GMT
Dave Oshinsky created DRILL-4704:
------------------------------------

             Summary: select statement behavior is inconsistent for decimal values in parquet
                 Key: DRILL-4704
                 URL: https://issues.apache.org/jira/browse/DRILL-4704
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.6.0
         Environment: Windows 7 Pro, Java 1.8.0_91
            Reporter: Dave Oshinsky
             Fix For: 1.7.0


A select statement that searches a parquet file for a decimal value matching a specific value
behaves inconsistently.  The query expressed most simply finds nothing:

0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
= 100;
+--------------+-------------+------------+--------+---------------+-----------+

| EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  | HIRE_DATE |

+--------------+-------------+------------+--------+---------------+-----------+

+--------------+-------------+------------+--------+---------------+-----------+

No rows selected (0.348 seconds)


The query can be modified to find the matching row in a few ways, such as the following (using
between instead of '=', changing 100 to 100.0, or casting as decimal:

0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
between 100 and 100;
+--------------+-------------+------------+--------+---------------+-----------+

| EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |

+--------------+-------------+------------+--------+---------------+-----------+

| 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |

+--------------+-------------+------------+--------+---------------+-----------+

1 row selected (0.226 seconds)
0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
= 100.0;
+--------------+-------------+------------+--------+---------------+-----------+

| EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |

+--------------+-------------+------------+--------+---------------+-----------+

| 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |

+--------------+-------------+------------+--------+---------------+-----------+

1 row selected (0.259 seconds)
0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where cast(employee_id
AS DECIMAL) = 100;
+--------------+-------------+------------+--------+---------------+-----------+

| EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |

+--------------+-------------+------------+--------+---------------+-----------+

| 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |

+--------------+-------------+------------+--------+---------------+-----------+

1 row selected (0.232 seconds)
0: jdbc:drill:zk=local>

The schema of the parquet data that is being searched is as follows:

$ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
file:           file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
creator:        parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
.....
file schema:    HR.EMPLOYEES
--------------------------------------------------------------------------------
EMPLOYEE_ID:    REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
FIRST_NAME:     OPTIONAL BINARY O:UTF8 R:0 D:1
LAST_NAME:      REQUIRED BINARY O:UTF8 R:0 D:0
EMAIL:          REQUIRED BINARY O:UTF8 R:0 D:0
PHONE_NUMBER:   OPTIONAL BINARY O:UTF8 R:0 D:1
HIRE_DATE:      REQUIRED BINARY O:UTF8 R:0 D:0
JOB_ID:         REQUIRED BINARY O:UTF8 R:0 D:0
SALARY:         OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
MANAGER_ID:     OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
DEPARTMENT_ID:  OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1

row group 1:    RC:107 TS:9943 OFFSET:4
--------------------------------------------------------------------------------
EMPLOYEE_ID:     FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 VC:107 ENC:PLAIN,BIT_PACKED
FIRST_NAME:      BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
LAST_NAME:       BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 ENC:PLAIN,BIT_PACKED
EMAIL:           BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 ENC:PLAIN,BIT_PACKED
PHONE_NUMBER:    BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 ENC:PLAIN,RLE,BIT_PACKED
HIRE_DATE:       BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
JOB_ID:          BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
SALARY:          FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 VC:107 ENC:PLAIN,RLE,BIT_PACKED
COMMISSION_PCT:  FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
MANAGER_ID:      FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
DEPARTMENT_ID:   FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 VC:107 ENC:PLAIN,RLE,BIT_PACKED









--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message