drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Hsuan-Yi Chu <hsua...@usc.edu>
Subject Re: Moving directory based pruning to fire earlier
Date Mon, 23 Nov 2015 22:37:00 GMT
Does that mean we would use hep planner to do directory pruning as the
first stage of logical planning?

I think it does make sense to allow the rules, which can definitely reduce
the cost be fired before volcano. How about expression reduction?

I believe sometimes pruning need the simplified expressions to proceed.

On Mon, Nov 23, 2015 at 1:57 PM, Mehant Baid <baid.mehant@gmail.com> wrote:

> Currently all rules based on Calcite logical rels and Drill logical rels
> are put together and are fired together. As part of DRILL-3996, Jinfeng
> will break it down into different phases. I should be able to take
> advantage of this and move the directory based partition pruning to fire
> based on Calcite rels.
> Thanks
> Mehant
> On 11/23/15 10:58 AM, Hanifi GUNES wrote:
>> The general idea of multi-phase pruning makes sense to me. I am wondering,
>> though, are we referring to introducing a new planning phase before the
>> logical or separating out the logic so as to make directory pruning kick
>> off ahead of column partitioning?
>> 2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.mehant@gmail.com>:
>> As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996>
>>> Jinfeng mentioned that he plans to move the directory based pruning rule
>>> earlier than column based pruning. I want to expand on that a little,
>>> provide the motivation and gather thoughts/ feedback.
>>> Currently both the directory based pruning and the column based pruning
>>> is
>>> fired in the same planning phase and are based on Drill logical rels.
>>> This
>>> is not optimal in the case where data is organized in such a way that
>>> both
>>> directory based pruning and column based pruning can be applied (when the
>>> data is organized with a nested directory structure plus the individual
>>> files contain partition columns). As part of creating the Drill logical
>>> scan we read the footers of all the files involved. If the directory
>>> based
>>> pruning rule is fired earlier (rule to fire based on calcite logical
>>> rels)
>>> then we will be able to prune out unnecessary directories and save the
>>> work
>>> of reading the footers of these files.
>>> Thanks
>>> Mehant

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message