drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-3961) CONVERT_FROM should return INTEGER when converting from TINYINT/SMALLINT types
Date Tue, 20 Oct 2015 23:16:27 GMT
Victoria Markman created DRILL-3961:
---------------------------------------

             Summary: CONVERT_FROM should return INTEGER when converting from TINYINT/SMALLINT
types
                 Key: DRILL-3961
                 URL: https://issues.apache.org/jira/browse/DRILL-3961
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
    Affects Versions: 1.1.0, 1.2.0
            Reporter: Victoria Markman


We've disabled SMALLINT, TINYINT SQL types because we had bugs that we could not fix before
1.0.0 Some things were left in sort of half implemented state and we should either fix up
these things or re-implement TINYINT/SMALLINT types.

* We can't cast to TINYINT/SMALLINT:

{code}
0: jdbc:drill:schema=dfs> select cast(1 as tinyint) from sys.options limit 1;
Error: UNSUPPORTED_OPERATION ERROR: TINYINT is not supported
See Apache Drill JIRA: DRILL-1959
[Error Id: c86bb4f5-4ea9-47cc-856e-ce1e96463477 on atsqa4-133.qa.lab:31010] (state=,code=0)

0: jdbc:drill:schema=dfs> select cast(1 as smallint) from sys.options limit 1;
Error: UNSUPPORTED_OPERATION ERROR: SMALLINT is not supported
See Apache Drill JIRA: DRILL-1959
[Error Id: 14cc008a-532b-4f02-a9ad-6a54cf5dc6bc on atsqa4-133.qa.lab:31010] (state=,code=0)
{code}

* We hacked our implementation to read TINTINT/SMALLINT in hive as INTEGER: https://issues.apache.org/jira/browse/DRILL-3263
   Which works. Now, if you try querying hive table with TINYINT/SMALLINT attribute, you get
INTEGER in Drill.

Hive table, notice c9/c10 columns are TINYINT/SMALLINT respectively:
{code}
hive> describe alltypes;
OK
c1                      int
c2                      boolean
c3                      double
c4                      string
c5                      array<int>
c6                      map<int,string>
c7                      map<string,string>
c8                      struct<r:string,s:int,t:double>
c9                      tinyint
c10                     smallint
c11                     float
c12                     bigint
c13                     array<array<string>>
c14                     map<int,map<int,int>>
c15                     struct<r:int,s:struct<a:int,b:string>>
c16                     array<struct<m:map<string,string>,n:int>>
c17                     timestamp
c18                     decimal(10,0)
c19                     binary
c20                     date
Time taken: 0.094 seconds, Fetched: 20 row(s)
{code}

Same table accessed from Drill:
{code}
0: jdbc:drill:schema=dfs> use hive;
+-------+-----------------------------------+
|  ok   |              summary              |
+-------+-----------------------------------+
| true  | Default schema changed to [hive]  |
+-------+-----------------------------------+
1 row selected (0.337 seconds)
0: jdbc:drill:schema=dfs> describe alltypes;
+--------------+--------------------+--------------+
| COLUMN_NAME  |     DATA_TYPE      | IS_NULLABLE  |
+--------------+--------------------+--------------+
| c1           | INTEGER            | YES          |
| c2           | BOOLEAN            | YES          |
| c3           | DOUBLE             | YES          |
| c4           | CHARACTER VARYING  | YES          |
| c5           | ARRAY              | NO           |
| c6           | MAP                | NO           |
| c7           | MAP                | NO           |
| c8           | ROW                | NO           |
| c9           | INTEGER            | YES          |
| c10          | INTEGER            | YES          |
| c11          | FLOAT              | YES          |
| c12          | BIGINT             | YES          |
| c13          | ARRAY              | NO           |
| c14          | MAP                | NO           |
| c15          | ROW                | NO           |
| c16          | ARRAY              | NO           |
| c17          | TIMESTAMP          | YES          |
| c18          | DECIMAL            | YES          |
| c19          | BINARY VARYING     | YES          |
| c20          | DATE               | YES          |
+--------------+--------------------+--------------+
20 rows selected (1.379 seconds)
{code}

* However, if you try reading parquet file with TINYINT/SMALLINT types (generated by impala
for instance), CONVERT_FROM function still returns SMALLINT/TINYINT types.

I can successfully read SMALLINT field from impala generated parquet file:
{code}
0: jdbc:drill:schema=dfs> select asset_id from `impala.parquet` limit 1;
+--------------+
|   asset_id   |
+--------------+
| [B@22a3aec1  |
+--------------+
1 row selected (0.298 seconds)
{code}
Using CONVERT_FROM function:
{code}
0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') from `impala.parquet`
limit 1;
+---------+
| EXPR$0  |
+---------+
| 19535   |
+---------+
1 row selected (0.283 seconds)
{code}

Use of this function in an arithmetic expression results in an error:
{code}
0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'SMALLINT') + 1 from `impala.parquet`
limit 1;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.
 Errors:
Error in expression at index -1.  Error: Missing function implementation: [castINT(SMALLINT-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--..
Fragment 0:0
[Error Id: b5e063ee-85e0-48f6-9ca2-09dd9d6a84e6 on atsqa4-133.qa.lab:31010] (state=,code=0)
{code}

Attempt to cast to integer fails for the same reason:
{code}
0: jdbc:drill:schema=dfs> select CAST(CONVERT_FROM(asset_id, 'SMALLINT') as integer) from
`impala.parquet` limit 1;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.
 Errors:
Error in expression at index -1.  Error: Missing function implementation: [castBIGINT(SMALLINT-OPTIONAL)].
 Full expression: --UNKNOWN EXPRESSION--..
Fragment 0:0
[Error Id: 70820c35-f5fe-4229-8897-cafe0db5bdef on atsqa4-133.qa.lab:31010] (state=,code=0)
{code}

Trying to 'cheat it' and read as INT:
{code}
0: jdbc:drill:schema=dfs> select CONVERT_FROM(asset_id, 'INT') from `impala.parquet` limit
1;
Error: SYSTEM ERROR: IllegalArgumentException: Wrong length 2(2-0) in the buffer 'OL', expected
4.
Fragment 0:0
[Error Id: 47e0a0d7-fc40-4f3f-8d68-4bed63ac0a60 on atsqa4-133.qa.lab:31010] (state=,code=0)
{code}

The only thing I can do in order to make use of this column in a query is to write it to a
table:
{code}
0: jdbc:drill:schema=dfs> create table x(c1) as select CONVERT_FROM(asset_id, 'SMALLINT')
from `impala.parquet` limit 1;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 1                          |
+-----------+----------------------------+
1 row selected (0.568 seconds)

[Tue Oct 20 15:57:14] # ~/parquet-tools/parquet-schema 0_0_0.parquet
message root {
  optional int32 c1;
}
{code}

*  Probably the same with Hive UDFs




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message