drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4279) Improve query plan when no column is required from SCAN
Date Thu, 28 Jan 2016 04:51:39 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15120753#comment-15120753
] 

ASF GitHub Bot commented on DRILL-4279:
---------------------------------------

GitHub user jinfengni opened a pull request:

    https://github.com/apache/drill/pull/342

    DRILL-4279: Improve query performance when no column is required from…

    … scan operator.
    
    Use different approaches, when no column is not required from scan operator.
    1) If data source is schemed, use the first column in the schema.
    2) Use 'columns[0]' for text reader.
    3) Use the current skip_all reader for JSON input.
    4) Use a default column name for other schema-less input.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinfengni/incubator-drill DRILL-4279

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/342.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #342
    
----
commit 137f593bc70b484e97ab6838fc5a639267c2bf40
Author: Jinfeng Ni <jni@apache.org>
Date:   2016-01-20T00:47:08Z

    DRILL-4279: Improve query performance when no column is required from scan operator.
    
    Use different approaches, when no column is not required from scan operator.
    1) If data source is schemed, use the first column in the schema.
    2) Use 'columns[0]' for text reader.
    3) Use the current skip_all reader for JSON input.
    4) Use a default column name for other schema-less input.

----


> Improve query plan when no column is required from SCAN
> -------------------------------------------------------
>
>                 Key: DRILL-4279
>                 URL: https://issues.apache.org/jira/browse/DRILL-4279
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>
> When query does not specify any specific column to be returned SCAN,  for instance,
> {code}
> Q1:  select count(*) from T1;
> Q2:  select 1 + 100 from T1;
> Q3:  select  1.0 + random() from T1; 
> {code}
> Drill's planner would use a ColumnList with * column, plus a SKIP_ALL mode. However,
the MODE is not serialized / deserialized. This leads to two problems.
> 1).  The EXPLAIN plan is confusing, since there is no way to different from a "SELECT
* " query from this SKIP_ALL mode. 
> For instance, 
> {code}
> explain plan for select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> 00-03          Project($f0=[0])
> 00-04            Scan(groupscan=[EasyGroupScan [selectionRoot=file:/Users/jni/work/data/yelp/t1,
numFiles=2, columns=[`*`], files= ... 
> {code} 
> 2) If the query is to be executed distributed / parallel,  the missing serialization
of mode would means some Fragment is fetching all the columns, while some Fragment is skipping
all the columns. That will cause execution error.
> For instance, by changing slice_target to enforce the query to be executed in multiple
fragments, it will hit execution error. 
> {code}
> select count(*) from dfs.`/Users/jni/work/data/yelp/t1`;
> org.apache.drill.common.exceptions.UserRemoteException: DATA_READ ERROR: Error parsing
JSON - You tried to start when you are using a ValueWriter of type NullableBitWriterImpl.
> {code}
> Directory "t1" just contains two yelp JSON files. 
> Ideally, I think when no columns is required from SCAN, the explain plan should show
an empty of column list. The MODE of SKIP_ALL together with star * column seems to be confusing
and error prone. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message