drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arina Ielchiieva (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-5524) Remove no-op projects from query plan
Date Fri, 19 May 2017 10:22:05 GMT

    [ https://issues.apache.org/jira/browse/DRILL-5524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16017193#comment-16017193
] 

Arina Ielchiieva commented on DRILL-5524:
-----------------------------------------

There is ProjectRemoveRule rule in Calcite that can be added to Drill rules set so project
stage will be removed if is not needed.
But there is a problem with implicit columns. For example, we have star query with implicit
column:  select *, fqn from t.
On scan stage Drill passes list of columns to retrieve. But when there is star in query, Drill
assumes that other columns indicated in query will be retrieved anyway, so it simplifies list
of columns to "columns=[`*`]".

At this point we don't know if we may need implicit column or not, so we add them anyway.
https://github.com/apache/drill/blob/3e8b01d5b0d3013e3811913f0fd6028b22c1ac3f/exec/java-exec/src/main/java/org/apache/drill/exec/store/ImplicitColumnExplorer.java#L143

And if they are not needed, we filter out them during project stage.
https://github.com/apache/drill/blob/0dc237e3161cf284212cc63f740b229d4fee8fdf/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L357
https://github.com/apache/drill/blob/0dc237e3161cf284212cc63f740b229d4fee8fdf/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/project/ProjectRecordBatch.java#L379

But when ProjectRemoveRule rule removes project stage, implicit columns are shown. This rule
is used in Jdbc plugin and there is corresponding bug (DRILL-4903). 

So before applying this rule, we need to make sure that problem with implicit columns is resolved.
For example, we may forbid using implicit columns with star queries or include implicit column
in column list even if star is present -> columns=[`*, fqn`].

> Remove no-op projects from query plan
> -------------------------------------
>
>                 Key: DRILL-5524
>                 URL: https://issues.apache.org/jira/browse/DRILL-5524
>             Project: Apache Drill
>          Issue Type: Improvement
>    Affects Versions: 1.10.0
>            Reporter: Paul Rogers
>            Priority: Minor
>
> Consider a very simple query using the mock data source:
> {code}
> SELECT id_i, name_s10 FROM `mock`.`employees_10K`
> {code}
> This just says to create two columns: one int, one varchar of length 10, and fill them
with random data to create 10,000 records.
> The query simply passes the columns directly from the input to the client.
> Yet, the query plan includes a "no-op" project:
> {code}
>   "graph" : [ {
>     "pop" : "mock-scan",
>     "@id" : 2, ...
>   }, {
>     "pop" : "project",
>     "@id" : 1,
>     "exprs" : [ {
>       "ref" : "`id_i`",
>       "expr" : "`id_i`"
>     }, {
>       "ref" : "`name_s10`",
>       "expr" : "`name_s10`"
>     } ], ...
>   }, {
>     "pop" : "screen",
>     "@id" : 0, ...
>   } ]
> }
> {code}
> When executed, the project operator generates code that does nothing:
> {code}
> public class ProjectorGen0 extends ProjectorTemplate {
>     public void doEval(int inIndex, int outIndex)
>         throws SchemaChangeException
>     { }
>     public void doSetup(FragmentContext context, RecordBatch incoming, RecordBatch outgoing)
>         throws SchemaChangeException
>     { }
> }
> {code}
> Yet, the project code still insists on stepping through each row, despite the fact that
the code does nothing per record:
> {code}
>       for (i = startIndex; i < startIndex + recordCount; i++, firstOutputIndex++)
{
>         try {
>           doEval(i, firstOutputIndex);
>         } ...
>       }
> {code}
> The request is to both:
> 1. Skip the per-record loop if all transfers are at the vector level, and
> 2. Omit the entire project step if nothing changes.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message