drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <jh...@apache.org>
Subject Re: Moving directory based pruning to fire earlier
Date Tue, 24 Nov 2015 01:35:15 GMT
Yes. You don’t need an “implement” method (or yours can just throw). 

You could use your own serialization to/from JSON or you could use RelJsonWriter/RelJsonReader.

Julian


> On Nov 23, 2015, at 5:31 PM, Jacques Nadeau <jacques@dremio.com> wrote:
> 
> We could create serializers and deserializers for the logical plan stuff.
> It looks like we can resolve the costing through metadata providers unless
> I misunderstood what Julian was suggesting.
> 
> 
> 
> --
> Jacques Nadeau
> CTO and Co-Founder, Dremio
> 
> On Mon, Nov 23, 2015 at 5:12 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
> 
>> @Jacaues,
>> 
>> Every DrillLogicalRel has to override computeSelfCost(), and implement
>> implement() method. The latter is to get Logical Plan, which is one of
>> three input types Drill should accept (SQL, Logical Plan, Physical
>> Plan).
>> 
>> So, for now, we do have to override/exend all DrillLogicalRel.
>> 
>> 
>> On Mon, Nov 23, 2015 at 4:55 PM, Julian Hyde <jhyde@apache.org> wrote:
>>> I’m not sure what properties / behavior you want to override but
>> remember that Calcite specifies a lot of brings as traits or metadata.
>>> 
>>> For example, “double RelNode.getRows()" is deprecated and you would
>> these days use RelMetadataQuery.getRowCount(). You would not need to
>> sub-class a RelNode to override its row-count estimate, just supply a
>> different metadata provider.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Nov 23, 2015, at 4:50 PM, Jacques Nadeau <jacques@dremio.com> wrote:
>>>> 
>>>> Yes, my suggestion is removal of DRILL_LOGICAL. @Hsuan, this is
>> independent
>>>> from the number of phases and I'm not suggesting changing that.
>>>> 
>>>> My main thought was: if we only need to override one or two rels, do
>> only
>>>> that rather than having a wholesale copy of every operator with a bunch
>> of
>>>> basic noop rules.
>>>> 
>>>> --
>>>> Jacques Nadeau
>>>> CTO and Co-Founder, Dremio
>>>> 
>>>> On Mon, Nov 23, 2015 at 4:37 PM, Jinfeng Ni <jinfengni99@gmail.com>
>> wrote:
>>>> 
>>>>> @Jacques, are you talking about removing the convention DRILL_LOGICAL?
>>>>> 
>>>>> DrillRel extends Calcite's LogialRel. It overrides some LogicalRel's
>>>>> methods, and adds new methods.  Therefore, even we remove
>>>>> DRILL_LOGICAL convention, we still have to maintain a set of extended
>>>>> class from Calcite Logical. I'm not clear what benefit we would get by
>>>>> removing the DRILL_LOGICAL convention.
>>>>> 
>>>>> If we want to remove the complete set of DrillLogical classes, then
>>>>> I'm not sure where we put the Drill specific logic, for instance,
>>>>> Drill Join has certain restriction different from Calcite Join.
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Mon, Nov 23, 2015 at 4:11 PM, Hsuan Yi Chu <hyichu@maprtech.com>
>> wrote:
>>>>>> My understanding is:
>>>>>> In logical planning, we determine the "structure" of the tree (e.g.,
>> join
>>>>>> order)
>>>>>> And then in physical, we determine the implementation (e.g., merge
vs
>>>>> hash
>>>>>> join).
>>>>>> 
>>>>>> This staging seems clean to me. So what is the motivation to merge
>> them
>>>>> all
>>>>>> together?
>>>>>> 
>>>>>> 
>>>>>> On Mon, Nov 23, 2015 at 2:51 PM, Jacques Nadeau <jacques@dremio.com>
>>>>> wrote:
>>>>>> 
>>>>>>> Anybody think we should just get rid of Drels (Rel > Drel
> Prel) and
>>>>> use
>>>>>>> Calcite's logical representation directly (Rel > Prel)?
>>>>>>> 
>>>>>>> --
>>>>>>> Jacques Nadeau
>>>>>>> CTO and Co-Founder, Dremio
>>>>>>> 
>>>>>>> On Mon, Nov 23, 2015 at 1:57 PM, Mehant Baid <baid.mehant@gmail.com>
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Currently all rules based on Calcite logical rels and Drill
logical
>>>>> rels
>>>>>>>> are put together and are fired together. As part of DRILL-3996,
>>>>> Jinfeng
>>>>>>>> will break it down into different phases. I should be able
to take
>>>>>>>> advantage of this and move the directory based partition
pruning to
>>>>> fire
>>>>>>>> based on Calcite rels.
>>>>>>>> 
>>>>>>>> Thanks
>>>>>>>> Mehant
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On 11/23/15 10:58 AM, Hanifi GUNES wrote:
>>>>>>>> 
>>>>>>>>> The general idea of multi-phase pruning makes sense to
me. I am
>>>>>>> wondering,
>>>>>>>>> though, are we referring to introducing a new planning
phase before
>>>>> the
>>>>>>>>> logical or separating out the logic so as to make directory
pruning
>>>>> kick
>>>>>>>>> off ahead of column partitioning?
>>>>>>>>> 
>>>>>>>>> 2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.mehant@gmail.com>:
>>>>>>>>> 
>>>>>>>>> As part of DRILL-3996 <
>>>>> https://issues.apache.org/jira/browse/DRILL-3996
>>>>>>>> 
>>>>>>>>>> Jinfeng mentioned that he plans to move the directory
based
>> pruning
>>>>>>> rule
>>>>>>>>>> earlier than column based pruning. I want to expand
on that a
>>>>> little,
>>>>>>>>>> provide the motivation and gather thoughts/ feedback.
>>>>>>>>>> 
>>>>>>>>>> Currently both the directory based pruning and the
column based
>>>>> pruning
>>>>>>>>>> is
>>>>>>>>>> fired in the same planning phase and are based on
Drill logical
>>>>> rels.
>>>>>>>>>> This
>>>>>>>>>> is not optimal in the case where data is organized
in such a way
>>>>> that
>>>>>>>>>> both
>>>>>>>>>> directory based pruning and column based pruning
can be applied
>>>>> (when
>>>>>>> the
>>>>>>>>>> data is organized with a nested directory structure
plus the
>>>>> individual
>>>>>>>>>> files contain partition columns). As part of creating
the Drill
>>>>> logical
>>>>>>>>>> scan we read the footers of all the files involved.
If the
>> directory
>>>>>>>>>> based
>>>>>>>>>> pruning rule is fired earlier (rule to fire based
on calcite
>> logical
>>>>>>>>>> rels)
>>>>>>>>>> then we will be able to prune out unnecessary directories
and save
>>>>> the
>>>>>>>>>> work
>>>>>>>>>> of reading the footers of these files.
>>>>>>>>>> 
>>>>>>>>>> Thanks
>>>>>>>>>> Mehant
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>> 
>>> 
>> 


Mime
View raw message