drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5200) Nested query fails to push filter down near scan
Date Thu, 19 Jan 2017 01:22:26 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15829115#comment-15829115
] 

Jinfeng Ni commented on DRILL-5200:
-----------------------------------

The reason filter is not being pushed down is that it refers to column expanded from * column,
which happens dynamically in execution time. This is a known restriction in the optimizer
rule Drill uses (extended from Calcite). 
 

> Nested query fails to push filter down near scan
> ------------------------------------------------
>
>                 Key: DRILL-5200
>                 URL: https://issues.apache.org/jira/browse/DRILL-5200
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.9.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Consider the query described in DRILL-5198. The query was deliberately designed to do
a full sort and discard results. Unfortunately, the query succeeded when it should not have
been able to do so. The query:
> {code}
> select * from (select * from dfs.`/big-csv-file.csv` order by columns[0])d where d.columns[0]
= 'bogus value';
> {code}
> The resulting plan. Note that the filter (which removes all rows) is above the sort;
should be below.
> {code}
> 00-00    Screen : rowType = RecordType(ANY *): rowcount = 2.691360795E7, cumulative cost
= {1.6444214457450001E9 rows, 2.6992589029593388E10 cpu, 0.0 io, 3.67460460544E12 network,
2.870784848E9 memory}, id = 459
> 00-01      Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 2.691360795E7, cumulative
cost = {1.64173008495E9 rows, 2.698989766879839E10 cpu, 0.0 io, 3.67460460544E12 network,
2.870784848E9 memory}, id = 458
> 00-02        SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*): rowcount = 2.691360795E7,
cumulative cost = {1.64173008495E9 rows, 2.698989766879839E10 cpu, 0.0 io, 3.67460460544E12
network, 2.870784848E9 memory}, id = 457
> 00-03          Filter(condition=[=(ITEM(ITEM($0, 'columns'), 0), 'ljdfhwuehnoiueyf')])
: rowType = RecordType(ANY T0¦¦*): rowcount = 2.691360795E7, cumulative cost = {1.614816477E9
rows, 2.696298406084839E10 cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 memory}, id
= 456
> 00-04            Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount
= 1.79424053E8, cumulative cost = {1.435392424E9 rows, 2.613763341704839E10 cpu, 0.0 io, 3.67460460544E12
network, 2.870784848E9 memory}, id = 455
> 00-05              SingleMergeExchange(sort0=[1 ASC]) : rowType = RecordType(ANY T0¦¦*,
ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {1.435392424E9 rows, 2.613763341704839E10
cpu, 0.0 io, 3.67460460544E12 network, 2.870784848E9 memory}, id = 454
> 01-01                SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY EXPR$1):
rowcount = 1.79424053E8, cumulative cost = {1.255968371E9 rows, 2.470224099304839E10 cpu,
0.0 io, 2.204762763264E12 network, 2.870784848E9 memory}, id = 453
> 01-02                  Sort(sort0=[$1], dir0=[ASC]) : rowType = RecordType(ANY T0¦¦*,
ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {1.076544318E9 rows, 2.452281694004839E10
cpu, 0.0 io, 2.204762763264E12 network, 2.870784848E9 memory}, id = 452
> 01-03                    Project(T0¦¦*=[$0], EXPR$1=[$1]) : rowType = RecordType(ANY
T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {8.97120265E8 rows, 4.844449431E9
cpu, 0.0 io, 2.204762763264E12 network, 0.0 memory}, id = 451
> 01-04                      HashToRandomExchange(dist0=[[$1]]) : rowType = RecordType(ANY
T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, cumulative cost
= {8.97120265E8 rows, 4.844449431E9 cpu, 0.0 io, 2.204762763264E12 network, 0.0 memory}, id
= 450
> 02-01                        UnorderedMuxExchange : rowType = RecordType(ANY T0¦¦*,
ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.79424053E8, cumulative cost = {7.17696212E8
rows, 1.973664583E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 449
> 03-01                          Project(T0¦¦*=[$0], EXPR$1=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1)])
: rowType = RecordType(ANY T0¦¦*, ANY EXPR$1, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount =
1.79424053E8, cumulative cost = {5.38272159E8 rows, 1.79424053E9 cpu, 0.0 io, 0.0 network,
0.0 memory}, id = 448
> 03-02                            Project(T0¦¦*=[$0], EXPR$1=[ITEM($1, 0)]) : rowType
= RecordType(ANY T0¦¦*, ANY EXPR$1): rowcount = 1.79424053E8, cumulative cost = {3.58848106E8
rows, 1.076544318E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 447
> 03-03                              Project(T0¦¦*=[$0], columns=[$1]) : rowType = RecordType(ANY
T0¦¦*, ANY columns): rowcount = 1.79424053E8, cumulative cost = {1.79424053E8 rows, 3.58848106E8
cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 446
> 03-04                                Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/resource-manager/descending-col-length-8k.tbl,
numFiles=1, columns=[`*`], files=[maprfs:///drill/testdata/resource-manager/descending-col-length-8k.tbl]]])
: rowType = (DrillRecordRow[*, columns]): rowcount = 1.79424053E8, cumulative cost = {1.79424053E8
rows, 3.58848106E8 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 445
> {code}
> What should have happened is that the filter was pushed down near the scan. It is likely
that the clever nested query structure used used in the query tricks the planner into missing
an optimization opportunity.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message