drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hsuan Yi Chu <hyi...@maprtech.com>
Subject Re: Moving directory based pruning to fire earlier
Date Tue, 24 Nov 2015 00:11:07 GMT
My understanding is:
In logical planning, we determine the "structure" of the tree (e.g., join
order)
And then in physical, we determine the implementation (e.g., merge vs hash
join).

This staging seems clean to me. So what is the motivation to merge them all
together?


On Mon, Nov 23, 2015 at 2:51 PM, Jacques Nadeau <jacques@dremio.com> wrote:

> Anybody think we should just get rid of Drels (Rel > Drel > Prel) and use
> Calcite's logical representation directly (Rel > Prel)?
>
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
>
> On Mon, Nov 23, 2015 at 1:57 PM, Mehant Baid <baid.mehant@gmail.com>
> wrote:
>
> > Currently all rules based on Calcite logical rels and Drill logical rels
> > are put together and are fired together. As part of DRILL-3996, Jinfeng
> > will break it down into different phases. I should be able to take
> > advantage of this and move the directory based partition pruning to fire
> > based on Calcite rels.
> >
> > Thanks
> > Mehant
> >
> >
> > On 11/23/15 10:58 AM, Hanifi GUNES wrote:
> >
> >> The general idea of multi-phase pruning makes sense to me. I am
> wondering,
> >> though, are we referring to introducing a new planning phase before the
> >> logical or separating out the logic so as to make directory pruning kick
> >> off ahead of column partitioning?
> >>
> >> 2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.mehant@gmail.com>:
> >>
> >> As part of DRILL-3996 <https://issues.apache.org/jira/browse/DRILL-3996
> >
> >>> Jinfeng mentioned that he plans to move the directory based pruning
> rule
> >>> earlier than column based pruning. I want to expand on that a little,
> >>> provide the motivation and gather thoughts/ feedback.
> >>>
> >>> Currently both the directory based pruning and the column based pruning
> >>> is
> >>> fired in the same planning phase and are based on Drill logical rels.
> >>> This
> >>> is not optimal in the case where data is organized in such a way that
> >>> both
> >>> directory based pruning and column based pruning can be applied (when
> the
> >>> data is organized with a nested directory structure plus the individual
> >>> files contain partition columns). As part of creating the Drill logical
> >>> scan we read the footers of all the files involved. If the directory
> >>> based
> >>> pruning rule is fired earlier (rule to fire based on calcite logical
> >>> rels)
> >>> then we will be able to prune out unnecessary directories and save the
> >>> work
> >>> of reading the footers of these files.
> >>>
> >>> Thanks
> >>> Mehant
> >>>
> >>>
> >>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message