drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3895) Can not join on int96 column coming from two different sources: hive and impala
Date Tue, 06 Oct 2015 00:41:26 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944333#comment-14944333
] 

Victoria Markman commented on DRILL-3895:
-----------------------------------------

This is not a bug, but my favorite issue where column c_timestamp does not exist in one of
the tables and is treated as null:int !
I used wrong table from my data set. 

> Can not join on int96 column coming from two different sources: hive and impala
> -------------------------------------------------------------------------------
>
>                 Key: DRILL-3895
>                 URL: https://issues.apache.org/jira/browse/DRILL-3895
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types
>    Affects Versions: 1.2.0
>            Reporter: Victoria Markman
>
> I would think that cross source join on a column of int96 type should work in drill,
since we convert it to varbinary internally. It is very possible that I'm hitting a bug related
to varbinary data type ...
> {code}
> Hive generated parquet: dfs.`test/type_test`
> [Mon Oct 05 09:37:37] # ~/parquet-tools/parquet-schema 000000_0
> message hive_schema {
>   optional int32 num;
>   optional binary word (UTF8);
>   optional int96 dtg;
>   optional double dollar;
> }
> Implala generated parquet: dfs.`drill/testdata/subqueries/imp_t1`
> [Mon Oct 05 09:38:40 ] # ~/parquet-tools/parquet-schema 243293260064ba0-808af32a4ab4e487_393209663_data.0.parq
> message schema {
>   optional binary c_varchar (UTF8);
>   optional int32 c_integer;
>   optional int64 c_bigint;
>   optional float c_float;
>   optional double c_double;
>   optional binary c_date (UTF8);
>   optional binary c_time (UTF8);
>   optional int96 c_timestamp;
>   optional boolean c_boolean;
>   optional double d9;
>   optional double d18;
>   optional double d28;
>   optional double d38;
> }
> 0: jdbc:drill:schema=dfs> select count(*) from `test/type_test`;
> +---------+
> | EXPR$0  |
> +---------+
> | 2       |
> +---------+
> 1 row selected (0.3 seconds)
> 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1`;
> +---------+
> | EXPR$0  |
> +---------+
> | 10000   |
> +---------+
> 1 row selected (0.259 seconds)
> {code}
> *Join 'AS IS' on int96 column results in an error*
> -- IN clause
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1`
where c_timestamp IN ( select c_timestamp from `test/type_test`);
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between
1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type:
INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: 8bce65e1-0e8f-45fe-9990-eb980aeae53e on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> -- NOT IN clause
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1`
where c_timestamp NOT IN ( select c_timestamp from `test/type_test`);
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between
1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type:
INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: 4307937f-fbc0-40c7-b2d4-8e4835e79ae8 on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> -- JOIN
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from dfs.`test/type_test` a, dfs.`drill/testdata/subqueries/imp_t1`
b where a.c_timestamp = b.c_timestamp;
> Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between
1. Numeric data
>  2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type:
INT. Add explicit casts to avoid this error
> Fragment 0:0
> [Error Id: e80225a3-eeb6-4c5b-bda1-a1f0d13d7edf on atsqa4-133.qa.lab:31010] (state=,code=0)
> {code}
> *Attempt to explicitly cast to varbinary type*(one of these queries should have returned
non zero row count)
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1`
where cast(c_timestamp as varbinary(10)) NOT IN ( select cast(c_timestamp as varbinary(10))
from dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (0.422 seconds)
> 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1`
where cast(c_timestamp as varbinary(10)) IN ( select cast(c_timestamp as varbinary(10)) from
dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> {code}
> *Use CONVERT_FROM function* (one of these queries should have returned non zero row count)
> {code}
> 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1`
where CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') IN ( select CONVERT_FROM(c_timestamp,
'TIMESTAMP_IMPALA') from dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (0.683 seconds)
> 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1`
where CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') NOT IN ( select CONVERT_FROM(c_timestamp,
'TIMESTAMP_IMPALA') from dfs.`test/type_test`);
> +---------+
> | EXPR$0  |
> +---------+
> | 0       |
> +---------+
> 1 row selected (0.858 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message