drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damien Profeta (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5795) Filter pushdown for parquet handles multi rowgroup file
Date Fri, 15 Sep 2017 23:09:01 GMT
Damien Profeta created DRILL-5795:
-------------------------------------

             Summary: Filter pushdown for parquet handles multi rowgroup file
                 Key: DRILL-5795
                 URL: https://issues.apache.org/jira/browse/DRILL-5795
             Project: Apache Drill
          Issue Type: Improvement
          Components: Storage - Parquet
            Reporter: Damien Profeta


DRILL-1950 implemented the filter pushdown for parquet file but only in the case of one rowgroup
per parquet file. In the case of multiple rowgroups per files, it detects that the rowgroup
can be pruned but then tell to the drillbit to read the whole file which leads to performance
issue.

Having multiple rowgroup per file helps to handle partitioned dataset and still read only
the relevant subset of data without ending with more file than really needed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message