drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Moving directory based pruning to fire earlier
Date Tue, 24 Nov 2015 00:55:12 GMT
I’m not sure what properties / behavior you want to override but remember that Calcite specifies
a lot of brings as traits or metadata.

For example, “double RelNode.getRows()" is deprecated and you would these days use RelMetadataQuery.getRowCount().
You would not need to sub-class a RelNode to override its row-count estimate, just supply
a different metadata provider.

Julian


> On Nov 23, 2015, at 4:50 PM, Jacques Nadeau <jacques@dremio.com> wrote:
> 
> Yes, my suggestion is removal of DRILL_LOGICAL. @Hsuan, this is independent
> from the number of phases and I'm not suggesting changing that.
> 
> My main thought was: if we only need to override one or two rels, do only
> that rather than having a wholesale copy of every operator with a bunch of
> basic noop rules.
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Mon, Nov 23, 2015 at 4:37 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
> 
>> @Jacques, are you talking about removing the convention DRILL_LOGICAL?
>> 
>> DrillRel extends Calcite's LogialRel. It overrides some LogicalRel's
>> methods, and adds new methods.  Therefore, even we remove
>> DRILL_LOGICAL convention, we still have to maintain a set of extended
>> class from Calcite Logical. I'm not clear what benefit we would get by
>> removing the DRILL_LOGICAL convention.
>> 
>> If we want to remove the complete set of DrillLogical classes, then
>> I'm not sure where we put the Drill specific logic, for instance,
>> Drill Join has certain restriction different from Calcite Join.
>> 
>> 
>> 
>> 
>> On Mon, Nov 23, 2015 at 4:11 PM, Hsuan Yi Chu <hyichu@maprtech.com> wrote:
>>> My understanding is:
>>> In logical planning, we determine the "structure" of the tree (e.g., join
>>> order)
>>> And then in physical, we determine the implementation (e.g., merge vs
>> hash
>>> join).
>>> 
>>> This staging seems clean to me. So what is the motivation to merge them
>> all
>>> together?
>>> 
>>> 
>>> On Mon, Nov 23, 2015 at 2:51 PM, Jacques Nadeau <jacques@dremio.com>
>> wrote:
>>> 
>>>> Anybody think we should just get rid of Drels (Rel > Drel > Prel) and
>> use
>>>> Calcite's logical representation directly (Rel > Prel)?
>>>> 
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> On Mon, Nov 23, 2015 at 1:57 PM, Mehant Baid <baid.mehant@gmail.com>
>>>> wrote:
>>>> 
>>>>> Currently all rules based on Calcite logical rels and Drill logical
>> rels
>>>>> are put together and are fired together. As part of DRILL-3996,
>> Jinfeng
>>>>> will break it down into different phases. I should be able to take
>>>>> advantage of this and move the directory based partition pruning to
>> fire
>>>>> based on Calcite rels.
>>>>> 
>>>>> Thanks
>>>>> Mehant
>>>>> 
>>>>> 
>>>>> On 11/23/15 10:58 AM, Hanifi GUNES wrote:
>>>>> 
>>>>>> The general idea of multi-phase pruning makes sense to me. I am
>>>> wondering,
>>>>>> though, are we referring to introducing a new planning phase before
>> the
>>>>>> logical or separating out the logic so as to make directory pruning
>> kick
>>>>>> off ahead of column partitioning?
>>>>>> 
>>>>>> 2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.mehant@gmail.com>:
>>>>>> 
>>>>>> As part of DRILL-3996 <
>> https://issues.apache.org/jira/browse/DRILL-3996
>>>>> 
>>>>>>> Jinfeng mentioned that he plans to move the directory based pruning
>>>> rule
>>>>>>> earlier than column based pruning. I want to expand on that a
>> little,
>>>>>>> provide the motivation and gather thoughts/ feedback.
>>>>>>> 
>>>>>>> Currently both the directory based pruning and the column based
>> pruning
>>>>>>> is
>>>>>>> fired in the same planning phase and are based on Drill logical
>> rels.
>>>>>>> This
>>>>>>> is not optimal in the case where data is organized in such a
way
>> that
>>>>>>> both
>>>>>>> directory based pruning and column based pruning can be applied
>> (when
>>>> the
>>>>>>> data is organized with a nested directory structure plus the
>> individual
>>>>>>> files contain partition columns). As part of creating the Drill
>> logical
>>>>>>> scan we read the footers of all the files involved. If the directory
>>>>>>> based
>>>>>>> pruning rule is fired earlier (rule to fire based on calcite
logical
>>>>>>> rels)
>>>>>>> then we will be able to prune out unnecessary directories and
save
>> the
>>>>>>> work
>>>>>>> of reading the footers of these files.
>>>>>>> 
>>>>>>> Thanks
>>>>>>> Mehant
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>> 
>>>> 
>> 


Mime
View raw message