Mailing-List: contact issues-help@drill.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@drill.apache.org
Date: Tue, 20 Oct 2015 18:17:27 +0000 (UTC)
From: "Deneche A. Hakim (JIRA)" <jira@apache.org>
To: issues@drill.apache.org
Message-ID: <JIRA.12827518.1430878080000.9845.1445365047937@Atlassian.JIRA>
In-Reply-To: <JIRA.12827518.1430878080000@Atlassian.JIRA>
References: <JIRA.12827518.1430878080000@Atlassian.JIRA>
 <JIRA.12827518.1430878080720@arcas>
Subject: [jira] [Commented] (DRILL-2967) Incompatible types error reported
 in a "not in" query with compatible data types
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable


    [ https://issues.apache.org/jira/browse/DRILL-2967?page=3Dcom.atlassian=
.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D1496=
5499#comment-14965499 ]=20

Deneche A. Hakim commented on DRILL-2967:
-----------------------------------------

number of cores is important in this case, I set {{planner.width.max_per_no=
de}} to 23 and I was able to reproduce the issue using j2/j6 query

> Incompatible types error reported in a "not in" query with compatible dat=
a types=20
> -------------------------------------------------------------------------=
--------
>
>                 Key: DRILL-2967
>                 URL: https://issues.apache.org/jira/browse/DRILL-2967
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0
>            Reporter: Victoria Markman
>            Assignee: Deneche A. Hakim
>             Fix For: 1.3.0
>
>         Attachments: j2_j6_tables.tar, t1.parquet, t2.parquet
>
>
> Two tables, parquet files (attached in the bug):
> {code}
> 0: jdbc:drill:schema=3Ddfs> select * from t1;
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | null       | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | null       | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | null       |
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 10 rows selected (0.119 seconds)
> 0: jdbc:drill:schema=3Ddfs> select * from t2;
> +------------+------------+------------+
> |     a2     |     b2     |     c2     |
> +------------+------------+------------+
> | 0          | zzz        | 2014-12-31 |
> | 1          | aaaaa      | 2015-01-01 |
> | 2          | bbbbb      | 2015-01-02 |
> | 2          | bbbbb      | 2015-01-02 |
> | 2          | bbbbb      | 2015-01-02 |
> | 3          | ccccc      | 2015-01-03 |
> | 4          | ddddd      | 2015-01-04 |
> | 5          | eeeee      | 2015-01-05 |
> | 6          | fffff      | 2015-01-06 |
> | 7          | ggggg      | 2015-01-07 |
> | 7          | ggggg      | 2015-01-07 |
> | 8          | hhhhh      | 2015-01-08 |
> | 9          | iiiii      | 2015-01-09 |
> +------------+------------+------------+
> 13 rows selected (0.116 seconds)
> {code}
> Disable hash join and set slice_target =3D 1:
> alter session set `planner.enable_hashjoin` =3D false;
> alter session set `planner.slice_target` =3D 1;
> Correct result:
> {code}
> 0: jdbc:drill:schema=3Ddfs> select * from t1 where b1 not in (select b2 f=
rom t2);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> | 10         | jjjjj      | 2015-01-10 |
> +------------+------------+------------+
> 1 row selected (0.625 seconds)
> {code}
> Swap tables and you get an error:
> {code}
> 0: jdbc:drill:schema=3Ddfs> select * from t2 where b2 not in (select b1 f=
rom t1);
> +------------+------------+------------+
> |     a1     |     b1     |     c1     |
> +------------+------------+------------+
> Query failed: SYSTEM ERROR: Join only supports implicit casts between 1. =
Numeric data
>  2. Varchar, Varbinary data Left type: INT, Right type: VARCHAR. Add expl=
icit casts to avoid this error
> Fragment 1:0
> [1a83aa50-39aa-452c-91dd-970bf4a8f03d on atsqa4-133.qa.lab:31010]
> java.lang.RuntimeException: java.sql.SQLException: Failure while executin=
g query.
>         at sqlline.SqlLine$IncrementalRows.hasNext(SqlLine.java:2514)
>         at sqlline.SqlLine$TableOutputFormat.print(SqlLine.java:2148)
>         at sqlline.SqlLine.print(SqlLine.java:1809)
>         at sqlline.SqlLine$Commands.execute(SqlLine.java:3766)
>         at sqlline.SqlLine$Commands.sql(SqlLine.java:3663)
>         at sqlline.SqlLine.dispatch(SqlLine.java:889)
>         at sqlline.SqlLine.begin(SqlLine.java:763)
>         at sqlline.SqlLine.start(SqlLine.java:498)
>         at sqlline.SqlLine.main(SqlLine.java:460)
> {code}
> Explain plan for the query with an error:
> {code}=20
> 0: jdbc:drill:schema=3Ddfs> explain plan for select * from t2 where b2 no=
t in (select b1 from t1);=20
> +------------+------------+=20
> | text | json |=20
> +------------+------------+=20
> | 00-00 Screen=20
> 00-01 Project(*=3D[$0])=20
> 00-02 UnionExchange=20
> 01-01 Project(T27=C2=A6=C2=A6*=3D[$0])=20
> 01-02 SelectionVectorRemover=20
> 01-03 Filter(condition=3D[NOT(CASE(=3D($2, 0), false, IS NOT NULL($6), tr=
ue, IS NULL($4), null, <($3, $2), null, false))])=20
> 01-04 MergeJoin(condition=3D[=3D($4, $5)], joinType=3D[left])=20
> 01-06 SelectionVectorRemover=20
> 01-08 Sort(sort0=3D[$4], dir0=3D[ASC])=20
> 01-10 Project(T27=C2=A6=C2=A6*=3D[$0], b2=3D[$1], $f0=3D[$2], $f1=3D[$3],=
 b20=3D[$4])=20
> 01-12 HashToRandomExchange(dist0=3D[[$4]])=20
> 02-01 UnorderedMuxExchange=20
> 04-01 Project(T27=C2=A6=C2=A6*=3D[$0], b2=3D[$1], $f0=3D[$2], $f1=3D[$3],=
 b20=3D[$4], E_X_P_R_H_A_S_H_F_I_E_L_D=3D[castInt(hash64AsDouble($4))])=20
> 04-02 Project(T27=C2=A6=C2=A6*=3D[$0], b2=3D[$1], $f0=3D[$2], $f1=3D[$3],=
 b20=3D[$1])=20
> 04-03 NestedLoopJoin(condition=3D[true], joinType=3D[inner])=20
> 04-05 Project(T27=C2=A6=C2=A6*=3D[$0], b2=3D[$1])=20
> 04-06 Scan(groupscan=3D[ParquetGroupScan [entries=3D[ReadEntryWithPath [p=
ath=3Dmaprfs:/drill/testdata/aggregation/t2]], selectionRoot=3D/drill/testd=
ata/aggregation/t2, numFiles=3D1, columns=3D[`*`]]])=20
> 04-04 BroadcastExchange=20
> 06-01 StreamAgg(group=3D[{}], agg#0=3D[$SUM0($0)], agg#1=3D[$SUM0($1)])=
=20
> 06-02 UnionExchange=20
> 07-01 StreamAgg(group=3D[{}], agg#0=3D[COUNT()], agg#1=3D[COUNT($0)])=20
> 07-02 Project(b1=3D[$0], $f1=3D[true])=20
> 07-03 Scan(groupscan=3D[ParquetGroupScan [entries=3D[ReadEntryWithPath [p=
ath=3Dmaprfs:/drill/testdata/aggregation/t1]], selectionRoot=3D/drill/testd=
ata/aggregation/t1, numFiles=3D1, columns=3D[`b1`]]])=20
> 01-05 Project(b1=3D[$0], $f10=3D[$1])=20
> 01-07 SelectionVectorRemover=20
> 01-09 Sort(sort0=3D[$0], dir0=3D[ASC])=20
> 01-11 HashAgg(group=3D[{0}], agg#0=3D[MIN($1)])=20
> 01-13 Project(b1=3D[$0], $f1=3D[$1])=20
> 01-14 HashToRandomExchange(dist0=3D[[$0]])=20
> 03-01 UnorderedMuxExchange=20
> 05-01 Project(b1=3D[$0], $f1=3D[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=3D[castInt=
(hash64AsDouble($0))])=20
> 05-02 HashAgg(group=3D[{0}], agg#0=3D[MIN($1)])=20
> 05-03 Project(b1=3D[$0], $f1=3D[true])=20
> 05-04 Scan(groupscan=3D[ParquetGroupScan [entries=3D[ReadEntryWithPath [p=
ath=3Dmaprfs:/drill/testdata/aggregation/t1]], selectionRoot=3D/drill/testd=
ata/aggregation/t1, numFiles=3D1, columns=3D[`b1`]]])=20
> {code}=20
> Correct result and correct plan with hash join distributed plan ( planner=
.slice_target =3D 1)
> alter session set `planner.enable_hashjoin` =3D true;
> alter session set `planner.slice_target` =3D 1;
> {code}
> 0: jdbc:drill:schema=3Ddfs> select * from t2 where b2 not in (select b1 f=
rom t1);
> +------------+------------+------------+
> |     a2     |     b2     |     c2     |
> +------------+------------+------------+
> +------------+------------+------------+
> No rows selected (0.458 seconds)
> 0: jdbc:drill:schema=3Ddfs> explain plan for select * from t2 where b2 no=
t in (select b1 from t1);
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(*=3D[$0])
> 00-02        Project(T25=C2=A6=C2=A6*=3D[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=3D[NOT(CASE(=3D($2, 0), false, IS NOT N=
ULL($6), true, IS NULL($4), null, <($3, $2), null, false))])
> 00-05              HashJoin(condition=3D[=3D($4, $5)], joinType=3D[left])
> 00-07                Project(T25=C2=A6=C2=A6*=3D[$0], b2=3D[$1], $f0=3D[$=
2], $f1=3D[$3], b20=3D[$1])
> 00-09                  NestedLoopJoin(condition=3D[true], joinType=3D[inn=
er])
> 00-11                    Project(T25=C2=A6=C2=A6*=3D[$0], b2=3D[$1])
> 00-12                      Scan(groupscan=3D[ParquetGroupScan [entries=3D=
[ReadEntryWithPath [path=3Dmaprfs:/drill/testdata/aggregation/t2]], selecti=
onRoot=3D/drill/testdata/aggregation/t2, numFiles=3D1, columns=3D[`*`]]])
> 00-10                    BroadcastExchange
> 01-01                      StreamAgg(group=3D[{}], agg#0=3D[$SUM0($0)], a=
gg#1=3D[$SUM0($1)])
> 01-02                        UnionExchange
> 03-01                          StreamAgg(group=3D[{}], agg#0=3D[COUNT()],=
 agg#1=3D[COUNT($0)])
> 03-02                            Project(b1=3D[$0], $f1=3D[true])
> 03-03                              Scan(groupscan=3D[ParquetGroupScan [en=
tries=3D[ReadEntryWithPath [path=3Dmaprfs:/drill/testdata/aggregation/t1]],=
 selectionRoot=3D/drill/testdata/aggregation/t1, numFiles=3D1, columns=3D[`=
b1`]]])
> 00-06                Project(b1=3D[$0], $f10=3D[$1])
> 00-08                  BroadcastExchange
> 02-01                    HashAgg(group=3D[{0}], agg#0=3D[MIN($1)])
> 02-02                      Project(b1=3D[$0], $f1=3D[$1])
> 02-03                        HashToRandomExchange(dist0=3D[[$0]])
> 04-01                          UnorderedMuxExchange
> 05-01                            Project(b1=3D[$0], $f1=3D[$1], E_X_P_R_H=
_A_S_H_F_I_E_L_D=3D[castInt(hash64AsDouble($0))])
> 05-02                              HashAgg(group=3D[{0}], agg#0=3D[MIN($1=
)])
> 05-03                                Project(b1=3D[$0], $f1=3D[true])
> 05-04                                  Scan(groupscan=3D[ParquetGroupScan=
 [entries=3D[ReadEntryWithPath [path=3Dmaprfs:/drill/testdata/aggregation/t=
1]], selectionRoot=3D/drill/testdata/aggregation/t1, numFiles=3D1, columns=
=3D[`b1`]]])
> {code}
> Same error with the columns of date, time and timestamp types.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)