drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aman Sinha (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3892) Metadata cache not being leveraged when partition pruning is taking place
Date Sun, 04 Oct 2015 23:18:26 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942855#comment-14942855
] 

Aman Sinha commented on DRILL-3892:
-----------------------------------

I took a look at this.  The metadata file does get used and at first the usedMetadataFile
is set to true.  Subsequently, after partition pruning, we call ParquetGroupScan.clone() to
modify the file selection.  During this we call init() again and this time if there is a *single*
file in the file selection, then we set usedMetadataFile = false, thus overwriting the previous
setting.  I think once the flag has been set to true, it should not be changed. [~sphillips]
does that sound ok ?  I can submit a patch for this. 

I don't think this is a critical issue because of 2 reasons:
 - The metadata file does get used with partition pruning but the bug is in updating the flag.

 - It will only occur if the file selection is exactly 1;  [~rkins] you can confirm this by
adding a few more files in 
   the partition folders.  If there are 2 or more selections after partition pruning, we go
through a different code 
   path and the flag will be set correctly. 



> Metadata cache not being leveraged when partition pruning is taking place
> -------------------------------------------------------------------------
>
>                 Key: DRILL-3892
>                 URL: https://issues.apache.org/jira/browse/DRILL-3892
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Metadata
>    Affects Versions: 1.2.0
>            Reporter: Rahul Challapalli
>            Priority: Critical
>         Attachments: lineitem_deletecache.tgz
>
>
> git.commit.id.abbrev=92638dc
> As we can see from the below plan, metadata cache is not being leveraged even when the
cache file is being present
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> refresh table metadata dfs.`/drill/testdata/metadata_caching/lineitem_deletecache`;
> +-------+-------------------------------------------------------------------------------------------------+
> |  ok   |                                             summary                       
                     |
> +-------+-------------------------------------------------------------------------------------------------+
> | true  | Successfully updated metadata for table /drill/testdata/metadata_caching/lineitem_deletecache.
 |
> +-------+-------------------------------------------------------------------------------------------------+
> 1 row selected (0.402 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> explain plan for select count(*) from dfs.`/drill/testdata/metadata_caching/lineitem_deletecache`
where dir0=2006 group by l_linestatus;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$1])
> 00-02        HashAgg(group=[{0}], EXPR$0=[COUNT()])
> 00-03          Project(l_linestatus=[$0])
> 00-04            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache/2006/1/lineitem_999.parquet]],
selectionRoot=maprfs:/drill/testdata/metadata_caching/lineitem_deletecache, numFiles=1, usedMetadataFile=false,
columns=[`l_linestatus`, `dir0`]]])
> {code}
> I attached the data set used. Let me know if you need anything more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message