Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B14801884F for ; Tue, 6 Oct 2015 00:41:26 +0000 (UTC) Received: (qmail 13510 invoked by uid 500); 6 Oct 2015 00:41:26 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 13479 invoked by uid 500); 6 Oct 2015 00:41:26 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 13469 invoked by uid 99); 6 Oct 2015 00:41:26 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Oct 2015 00:41:26 +0000 Date: Tue, 6 Oct 2015 00:41:26 +0000 (UTC) From: "Victoria Markman (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-3895) Can not join on int96 column coming from two different sources: hive and impala MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-3895?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14944333#comment-14944333 ] Victoria Markman commented on DRILL-3895: ----------------------------------------- This is not a bug, but my favorite issue where column c_timestamp does not exist in one of the tables and is treated as null:int ! I used wrong table from my data set. > Can not join on int96 column coming from two different sources: hive and impala > ------------------------------------------------------------------------------- > > Key: DRILL-3895 > URL: https://issues.apache.org/jira/browse/DRILL-3895 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Data Types > Affects Versions: 1.2.0 > Reporter: Victoria Markman > > I would think that cross source join on a column of int96 type should work in drill, since we convert it to varbinary internally. It is very possible that I'm hitting a bug related to varbinary data type ... > {code} > Hive generated parquet: dfs.`test/type_test` > [Mon Oct 05 09:37:37] # ~/parquet-tools/parquet-schema 000000_0 > message hive_schema { > optional int32 num; > optional binary word (UTF8); > optional int96 dtg; > optional double dollar; > } > Implala generated parquet: dfs.`drill/testdata/subqueries/imp_t1` > [Mon Oct 05 09:38:40 ] # ~/parquet-tools/parquet-schema 243293260064ba0-808af32a4ab4e487_393209663_data.0.parq > message schema { > optional binary c_varchar (UTF8); > optional int32 c_integer; > optional int64 c_bigint; > optional float c_float; > optional double c_double; > optional binary c_date (UTF8); > optional binary c_time (UTF8); > optional int96 c_timestamp; > optional boolean c_boolean; > optional double d9; > optional double d18; > optional double d28; > optional double d38; > } > 0: jdbc:drill:schema=dfs> select count(*) from `test/type_test`; > +---------+ > | EXPR$0 | > +---------+ > | 2 | > +---------+ > 1 row selected (0.3 seconds) > 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1`; > +---------+ > | EXPR$0 | > +---------+ > | 10000 | > +---------+ > 1 row selected (0.259 seconds) > {code} > *Join 'AS IS' on int96 column results in an error* > -- IN clause > {code} > 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1` where c_timestamp IN ( select c_timestamp from `test/type_test`); > Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data > 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type: INT. Add explicit casts to avoid this error > Fragment 0:0 > [Error Id: 8bce65e1-0e8f-45fe-9990-eb980aeae53e on atsqa4-133.qa.lab:31010] (state=,code=0) > {code} > -- NOT IN clause > {code} > 0: jdbc:drill:schema=dfs> select count(*) from `drill/testdata/subqueries/imp_t1` where c_timestamp NOT IN ( select c_timestamp from `test/type_test`); > Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data > 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type: INT. Add explicit casts to avoid this error > Fragment 0:0 > [Error Id: 4307937f-fbc0-40c7-b2d4-8e4835e79ae8 on atsqa4-133.qa.lab:31010] (state=,code=0) > {code} > -- JOIN > {code} > 0: jdbc:drill:schema=dfs> select count(*) from dfs.`test/type_test` a, dfs.`drill/testdata/subqueries/imp_t1` b where a.c_timestamp = b.c_timestamp; > Error: SYSTEM ERROR: DrillRuntimeException: Join only supports implicit casts between 1. Numeric data > 2. Varchar, Varbinary data 3. Date, Timestamp data Left type: VARBINARY, Right type: INT. Add explicit casts to avoid this error > Fragment 0:0 > [Error Id: e80225a3-eeb6-4c5b-bda1-a1f0d13d7edf on atsqa4-133.qa.lab:31010] (state=,code=0) > {code} > *Attempt to explicitly cast to varbinary type*(one of these queries should have returned non zero row count) > {code} > 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as varbinary(10)) NOT IN ( select cast(c_timestamp as varbinary(10)) from dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > 1 row selected (0.422 seconds) > 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where cast(c_timestamp as varbinary(10)) IN ( select cast(c_timestamp as varbinary(10)) from dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > {code} > *Use CONVERT_FROM function* (one of these queries should have returned non zero row count) > {code} > 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') IN ( select CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') from dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > 1 row selected (0.683 seconds) > 0: jdbc:drill:schema=dfs> select count(*) from dfs.`drill/testdata/subqueries/imp_t1` where CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') NOT IN ( select CONVERT_FROM(c_timestamp, 'TIMESTAMP_IMPALA') from dfs.`test/type_test`); > +---------+ > | EXPR$0 | > +---------+ > | 0 | > +---------+ > 1 row selected (0.858 seconds) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)