drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehant Baid <baid.meh...@gmail.com>
Subject Moving directory based pruning to fire earlier
Date Mon, 23 Nov 2015 18:33:28 GMT
As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996> 
Jinfeng mentioned that he plans to move the directory based pruning rule 
earlier than column based pruning. I want to expand on that a little, 
provide the motivation and gather thoughts/ feedback.

Currently both the directory based pruning and the column based pruning 
is fired in the same planning phase and are based on Drill logical rels. 
This is not optimal in the case where data is organized in such a way 
that both directory based pruning and column based pruning can be 
applied (when the data is organized with a nested directory structure 
plus the individual files contain partition columns). As part of 
creating the Drill logical scan we read the footers of all the files 
involved. If the directory based pruning rule is fired earlier (rule to 
fire based on calcite logical rels) then we will be able to prune out 
unnecessary directories and save the work of reading the footers of 
these files.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message