drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-1500) Partition filtering might lead to an unnecessary column in the result set.
Date Mon, 12 Jan 2015 22:53:39 GMT

    [ https://issues.apache.org/jira/browse/DRILL-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14274339#comment-14274339
] 

Aman Sinha commented on DRILL-1500:
-----------------------------------

Agree that the constructor could be made private or at least protected.  As discussed with
Jinfeng, this is a more generic issue with all Prels.  Also, there are other considerations:
should we even have a separate ProjectAllowDupPrel or modify ProjectPrel to conditionally
allow duplicates.  I will open a separate JIRA if necessary for these. 

> Partition filtering might lead to an unnecessary column in the result set. 
> ---------------------------------------------------------------------------
>
>                 Key: DRILL-1500
>                 URL: https://issues.apache.org/jira/browse/DRILL-1500
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Aman Sinha
>            Priority: Critical
>             Fix For: 0.8.0
>
>         Attachments: 0001-DRILL-1500-Partial-fix-Don-t-overwrite-top-level-Pro.patch
>
>
> When partition filtering is used together with select * query, Drill might return the
partitioning column duplicately. 
> Q1 : 
> {code}
> select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
where dir0=1994 and dir1='Q1' order by dir0 limit 1;
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> |   dir00    |    dir0    |    dir1    |  o_clerk   | o_comment  | o_custkey  | o_orderdate
| o_orderkey | o_orderpriority | o_orderstatus | o_shippriority | o_totalprice |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994       | 1994       | Q1         | Clerk#000000743 | y pending requests integrate
| 1292       | 1994-01-20  | 66         | 5-LOW           | F             | 0            
 | 104190.66    |
> +------------+------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (2.097 seconds)
> {code}
> We can see that column "dir0" appeared twice in the result set.  In comparison, here
is the query without partition filtering and the query result:
> Q2:
> {code}
> select * from dfs.`/Users/jni/work/incubator-drill/exec/java-exec/src/test/resources/multilevel/parquet`
order by dir0 limit 1;
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> |    dir0    |    dir1    |  o_clerk   | o_comment  | o_custkey  | o_orderdate | o_orderkey
| o_orderpriority | o_orderstatus | o_shippriority | o_totalprice |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> | 1994       | Q1         | Clerk#000000743 | y pending requests integrate | 1292   
   | 1994-01-20  | 66         | 5-LOW           | F             | 0              | 104190.66
   |
> +------------+------------+------------+------------+------------+-------------+------------+-----------------+---------------+----------------+--------------+
> 1 row selected (0.761 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message