drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (DRILL-4250) File system directory-based partition pruning does not work when a directory contains both subdirectories and files.
Date Fri, 08 Jan 2016 00:28:40 GMT

     [ https://issues.apache.org/jira/browse/DRILL-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jinfeng Ni resolved DRILL-4250.
-------------------------------
       Resolution: Fixed
    Fix Version/s: 1.5.0

Fixed in commit: b9bc35a89208d2dd03f1ed751f71a0cd23651c9a

> File system directory-based partition pruning does not work when a directory contains
both subdirectories and files.  
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-4250
>                 URL: https://issues.apache.org/jira/browse/DRILL-4250
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Jinfeng Ni
>            Assignee: Jinfeng Ni
>             Fix For: 1.5.0
>
>
> When a directory contains both subdirectories and files, then the directory-based partition
pruning would not work. 
> For example, I have the following directory structure with nation.parquet (copied from
tpch sample dataset).
> .//2001/Q1/nation.parquet
> .//2001/Q2/nation.parquet
> The following query has the directory-based partition pruning work correctly. 
>  
> {code}
> explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1 = 'Q1';
> 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(*=[$0])
> 00-03          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet]],
selectionRoot=file:/tmp/fileAndDir, numFiles=1, usedMetadataFile=false, columns=[`*`]]])
> {code}
> However, if I add a nation.parquet file to 2001 directory, like the following:
> .//2001/nation.parquet
> .//2001/Q1/nation.parquet
> .//2001/Q2/nation.parquet
> Then, the same query will not have the partition pruning applied.
> {code}
> explain plan for select * from dfs.tmp.fileAndDir where dir0 = 2001 and dir1 = 'Q1';
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(*=[$0])
> 00-02        Project(T0¦¦*=[$0])
> 00-03          SelectionVectorRemover
> 00-04            Filter(condition=[AND(=($1, 2001), =($2, 'Q1'))])
> 00-05              Project(T0¦¦*=[$0], dir0=[$1], dir1=[$2])
> 00-06                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/nation.parquet],
ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q1/nation.parquet], ReadEntryWithPath [path=file:/tmp/fileAndDir/2001/Q2/nation.parquet]],
selectionRoot=file:/tmp/fileAndDir, numFiles=3, usedMetadataFile=false, columns=[`*`]]])
> {code}
> I should note that for the second case where partition pruning did not work, the query
did return the correct result. Therefore, this issue is only impact the query performance,
not the query result. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message