drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-4287) Do lazy reading of parquet metadata cache file
Date Fri, 29 Jan 2016 01:18:39 GMT

    [ https://issues.apache.org/jira/browse/DRILL-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15122687#comment-15122687
] 

ASF GitHub Bot commented on DRILL-4287:
---------------------------------------

GitHub user amansinha100 opened a pull request:

    https://github.com/apache/drill/pull/345

    DRILL-4287: During initial DrillTable creation don't read the metadat…

    …a cache file; instead do it during ParquetGroupScan.
    
    Maintain state in FileSelection to keep track of whether certain operations have been
done on that selection.
    
    Remove ParquetFileSelection since its only purpose was to carry the metadata cache information
which is not needed anymore.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/amansinha100/incubator-drill DRILL-4287

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #345
    
----
commit a479137911bc6d4821a138789db254d6eee43316
Author: Aman Sinha <asinha@maprtech.com>
Date:   2016-01-18T18:26:59Z

    DRILL-4287: During initial DrillTable creation don't read the metadata cache file; instead
do it during ParquetGroupScan.
    
    Maintain state in FileSelection to keep track of whether certain operations have been
done on that selection.
    
    Remove ParquetFileSelection since its only purpose was to carry the metadata cache information
which is not needed anymore.

----


> Do lazy reading of parquet metadata cache file
> ----------------------------------------------
>
>                 Key: DRILL-4287
>                 URL: https://issues.apache.org/jira/browse/DRILL-4287
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.4.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>
> Currently, the parquet metadata cache file is read eagerly during creation of the DrillTable
(as part of ParquetFormatMatcher.isReadable()).  This is not desirable from performance standpoint
since there are scenarios where we want to do some up-front optimizations - e.g. directory-based
partition pruning (see DRILL-2517) or potential limit 0 optimization etc. - and in such situations
it is better to do lazy reading of the metadata cache file.   
> This is a placeholder to perform such delayed reading since it is needed for the aforementioned
optimizations.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message