drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dave Oshinsky (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4704) select statement behavior is inconsistent for decimal values in parquet
Date Mon, 20 Jun 2016 17:41:06 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340007#comment-15340007
] 

Dave Oshinsky commented on DRILL-4704:
--------------------------------------

Aman Sinha,I have added a new unit test that reproduces the DRILL-4704 bug when CastIntDecimal.java
changes are removed, and demonstrates the fix with those changes.  I gave up on trying to
work with TestDecimal.java - just too many things went wrong.  Please review.Dave Oshinsky


    On Sunday, June 19, 2016 11:42 PM, ASF GitHub Bot (JIRA) <jira@apache.org> wrote:
 

 
    [ https://issues.apache.org/jira/browse/DRILL-4704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15338952#comment-15338952
] 

ASF GitHub Bot commented on DRILL-4704:
---------------------------------------

Github user daveoshinsky commented on the issue:

    https://github.com/apache/drill/pull/517
  
    In the mean time, I switched over to a Mac, where all 6 existing TestDecimal unit tests
worked.  I added a new unit test testCastFromInt patterned after testCastFromFloat, with
some code to print the query that it's sending to the DrillBit.  It's failing as shown below.
 What did I do wrong?
    Query: {  "head" : {    "version" : 1,    "generator" : {      "type" : "org.apache.drill.exec.planner.logical.DrillImplementor", 
    "info" : ""    },    "type" : "APACHE_DRILL_PHYSICAL",    "resultMode" : "EXEC" 
},  graph:[  {      @id:1,      pop:"fs-scan",      format: {type: "json"},   
  storage:{type: "file", connection: "classpath:///"},      files:["/input_simple_decimal.json"] 
}, {    "pop" : "project",    "@id" : 2,    "exprs" : [ {      "ref" : "I4",   
  "expr" : " (cast(DEC9 as INTEGER)) "    },    { "ref" : "I8", "expr": "(cast(DEC18
as INTEGER))" }    ],
        "child" : 1  },{    "pop" : "project",    "@id" : 4,    "exprs" : [ { 
    "ref" : "DECIMAL_9",      "expr" : " cast(I4 as decimal9(9, 0))  "    },   
{"ref": "DECIMAL38", "expr" : "cast(I8 as decimal38sparse(38, 0))"}    ],
        "child" : 2  },{    "pop" : "screen",    "@id" : 5,    "child" : 4  }
]}
    org.apache.drill.exec.rpc.RpcException: org.apache.drill.common.exceptions.UserRemoteException:
SYSTEM ERROR: ExpressionParsingException: Expression has syntax error! line 1:15:no viable
alternative at input 'INTEGER'
    
    [Error Id: a7d6acc9-70ee-4e6d-9002-90715b8de351 on 10.0.0.211:31010] at org.apache.drill.exec.rpc.RpcException.mapException(RpcException.java:60)
at org.apache.drill.exec.client.DrillClient$ListHoldingResultsListener.getResults(DrillClient.java:479)
at org.apache.drill.exec.client.DrillClient.runQuery(DrillClient.java:329) at org.apache.drill.exec.physical.impl.TestDecimal.runQuery(TestDecimal.java:58)
at org.apache.drill.exec.physical.impl.TestDecimal.testCastFromInt(TestDecimal.java:180)
    Here's the new function in TestDecimal.java:
        @Test    public void testCastFromInt() throws Exception {
            // Function checks for casting from Float, Double to Decimal data types 
      try (RemoteServiceSet serviceSet = RemoteServiceSet.getLocalServiceSet();    
        Drillbit bit = new Drillbit(CONFIG, serviceSet);             DrillClient
client = new DrillClient(CONFIG, serviceSet.getCoordinator())) {
                // run query.            bit.run();            client.connect(); 
          enableDecimalDataType(client);            List<QueryDataBatch>
results = runQuery(client, "/decimal/cast_int_decimal.json", "/input_simple_decimal.json");
                RecordBatchLoader batchLoader = new RecordBatchLoader(bit.getContext().getAllocator());
                QueryDataBatch batch = results.get(0);            assertTrue(batchLoader.load(batch.getHeader().getDef(),
batch.getData()));
    
                // the first pair of values in the input consists of integers.  ignore
the rest.            String decimal9Output[] = {"99.0000"};            String
decimal38Output[] = {"123456789.0000"};
                Iterator<VectorWrapper<?>> itr = batchLoader.iterator();
                // Check the output of decimal9            ValueVector.Accessor
dec9Accessor = itr.next().getValueVector().getAccessor();            ValueVector.Accessor
dec38Accessor = itr.next().getValueVector().getAccessor();
    
                for (int i = 0; i < decimal9Output.length; i++) {         
      assertEquals(dec9Accessor.getObject(i).toString(), decimal9Output[i]);       
        assertEquals(dec38Accessor.getObject(i).toString(), decimal38Output[i]);   
        }            assertEquals(6, dec9Accessor.getValueCount());         
  assertEquals(6, dec38Accessor.getValueCount());
                batchLoader.clear();            for (QueryDataBatch result :
results) {              result.release();            }        }    }
        public List<QueryDataBatch> runQuery(DrillClient client, String cast, String
input) throws Exception {        String s1 = Files.toString(FileUtils.getResourceAsFile(cast),
Charsets.UTF_8);        String s2 = s1.replace("#{TEST_FILE}", input);        if (isWindows())
{            s2 = s2.replace("\n", "\r\n");        }System.out.println("Query: "
+ s2);  // DAO DEBUG        List<QueryDataBatch> results;        try {   
        results = client.runQuery(org.apache.drill.exec.proto.UserBitShared.QueryType.PHYSICAL,
s2);        }        catch (Exception ex) {            ex.printStackTrace(); 
          Throwable cause = ex.getCause();            if (cause != null) {   
            System.out.println("Caused by: " + cause.toString());             
  cause.printStackTrace();             }             throw ex;        } 
      return results;    }
    
    Here's the accompanying new json file cast_int_decimal.json:
    {  "head" : {    "version" : 1,    "generator" : {      "type" : "org.apache.drill.exec.planner.logical.DrillImplementor", 
    "info" : ""    },    "type" : "APACHE_DRILL_PHYSICAL",    "resultMode" : "EXEC" 
},  graph:[  {      @id:1,      pop:"fs-scan",      format: {type: "json"},   
  storage:{type: "file", connection: "classpath:///"},      files:["#{TEST_FILE}"]  },
{    "pop" : "project",    "@id" : 2,    "exprs" : [ {      "ref" : "I4",     
"expr" : " (cast(DEC9 as INTEGER)) "    },    { "ref" : "I8", "expr": "(cast(DEC18 as
INTEGER))" }    ],
        "child" : 1  },{    "pop" : "project",    "@id" : 4,    "exprs" : [ { 
    "ref" : "DECIMAL_9",      "expr" : " cast(I4 as decimal9(9, 0))  "    },   
{"ref": "DECIMAL38", "expr" : "cast(I8 as decimal38sparse(38, 0))"}    ],
        "child" : 2  },{    "pop" : "screen",    "@id" : 5,    "child" : 4  }
]}
    
    
        On Sunday, June 19, 2016 11:08 PM, Aman Sinha <notifications@github.com>
wrote:
    
    
    Sorry for the delay. Can you force the following flag to TRUE to get the more detailed
stack trace ? 
    https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java#L262
    (normally, one would do an 'alter session ...' but TestDecimal is doing a plan submission
instead of sql string). —
    You are receiving this because you authored the thread.
    Reply to this email directly, view it on GitHub, or mute the thread.  
    
      





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



> select statement behavior is inconsistent for decimal values in parquet
> -----------------------------------------------------------------------
>
>                 Key: DRILL-4704
>                 URL: https://issues.apache.org/jira/browse/DRILL-4704
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>    Affects Versions: 1.6.0
>         Environment: Windows 7 Pro, Java 1.8.0_91
>            Reporter: Dave Oshinsky
>             Fix For: 1.7.0
>
>
> A select statement that searches a parquet file for a decimal value matching a specific
value behaves inconsistently.  The query expressed most simply finds nothing:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
= 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  | HIRE_DATE |
> +--------------+-------------+------------+--------+---------------+-----------+
> +--------------+-------------+------------+--------+---------------+-----------+
> No rows selected (0.348 seconds)
> The query can be modified to find the matching row in a few ways, such as the following
(using between instead of '=', changing 100 to 100.0, or casting as decimal:
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
between 100 and 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.226 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where employee_id
= 100.0;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.259 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.`c:/archiveHR/HR.EMPLOYEES` where cast(employee_id
AS DECIMAL) = 100;
> +--------------+-------------+------------+--------+---------------+-----------+
> | EMPLOYEE_ID  | FIRST_NAME  | LAST_NAME  | EMAIL  | PHONE_NUMBER  |       HIR |
> +--------------+-------------+------------+--------+---------------+-----------+
> | 100          | Steven      | King       | SKING  | 515.123.4567  | 2003-06-1 |
> +--------------+-------------+------------+--------+---------------+-----------+
> 1 row selected (0.232 seconds)
> 0: jdbc:drill:zk=local>
> The schema of the parquet data that is being searched is as follows:
> $ java -jar parquet-tools*1.jar meta c:/archiveHR/HR.EMPLOYEES/1.parquet
> file:           file:/c:/archiveHR/HR.EMPLOYEES/1.parquet
> creator:        parquet-mr version 1.8.1 (build 4aba4dae7bb0d4edbcf7923ae1339f28fd3f7fcf)
> .....
> file schema:    HR.EMPLOYEES
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:    REQUIRED FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:0
> FIRST_NAME:     OPTIONAL BINARY O:UTF8 R:0 D:1
> LAST_NAME:      REQUIRED BINARY O:UTF8 R:0 D:0
> EMAIL:          REQUIRED BINARY O:UTF8 R:0 D:0
> PHONE_NUMBER:   OPTIONAL BINARY O:UTF8 R:0 D:1
> HIRE_DATE:      REQUIRED BINARY O:UTF8 R:0 D:0
> JOB_ID:         REQUIRED BINARY O:UTF8 R:0 D:0
> SALARY:         OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> COMMISSION_PCT: OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> MANAGER_ID:     OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> DEPARTMENT_ID:  OPTIONAL FIXED_LEN_BYTE_ARRAY O:DECIMAL R:0 D:1
> row group 1:    RC:107 TS:9943 OFFSET:4
> --------------------------------------------------------------------------------
> EMPLOYEE_ID:     FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:4 SZ:360/355/0.99 VC:107 ENC:PLAIN,BIT_PACKED
> FIRST_NAME:      BINARY SNAPPY DO:0 FPO:364 SZ:902/1058/1.17 VC:107 ENC:PLAIN_DICTIONARY,RLE,BIT_PACKED
> LAST_NAME:       BINARY SNAPPY DO:0 FPO:1266 SZ:913/1111/1.22 VC:107 ENC:PLAIN,BIT_PACKED
> EMAIL:           BINARY SNAPPY DO:0 FPO:2179 SZ:977/1184/1.21 VC:107 ENC:PLAIN,BIT_PACKED
> PHONE_NUMBER:    BINARY SNAPPY DO:0 FPO:3156 SZ:750/1987/2.65 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> HIRE_DATE:       BINARY SNAPPY DO:0 FPO:3906 SZ:874/2636/3.02 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
> JOB_ID:          BINARY SNAPPY DO:0 FPO:4780 SZ:254/302/1.19 VC:107 ENC:PLAIN_DICTIONARY,BIT_PACKED
> SALARY:          FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5034 SZ:419/580/1.38 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> COMMISSION_PCT:  FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5453 SZ:97/113/1.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> MANAGER_ID:      FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5550 SZ:168/363/2.16 VC:107 ENC:PLAIN,RLE,BIT_PACKED
> DEPARTMENT_ID:   FIXED_LEN_BYTE_ARRAY SNAPPY DO:0 FPO:5718 SZ:94/254/2.70 VC:107 ENC:PLAIN,RLE,BIT_PACKED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message