drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jinfeng Ni <jinfengn...@gmail.com>
Subject Re: Moving directory based pruning to fire earlier
Date Tue, 24 Nov 2015 01:56:24 GMT
My understanding is RelMetadataProvider gives the estimation of row
count, distinct row count, etc. But it's still up to each Rel node to
decide how to estimate it's own cost, given the row count, distinct
row count etc from MetadataProvider. Are you suggesting we completely
remove the Drill's costing estimation method, and use Calcite's
default one?



On Mon, Nov 23, 2015 at 5:35 PM, Julian Hyde <jhyde@apache.org> wrote:
> Yes. You don’t need an “implement” method (or yours can just throw).
>
> You could use your own serialization to/from JSON or you could use RelJsonWriter/RelJsonReader.
>
> Julian
>
>
>> On Nov 23, 2015, at 5:31 PM, Jacques Nadeau <jacques@dremio.com> wrote:
>>
>> We could create serializers and deserializers for the logical plan stuff.
>> It looks like we can resolve the costing through metadata providers unless
>> I misunderstood what Julian was suggesting.
>>
>>
>>
>> --
>> Jacques Nadeau
>> CTO and Co-Founder, Dremio
>>
>> On Mon, Nov 23, 2015 at 5:12 PM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>>
>>> @Jacaues,
>>>
>>> Every DrillLogicalRel has to override computeSelfCost(), and implement
>>> implement() method. The latter is to get Logical Plan, which is one of
>>> three input types Drill should accept (SQL, Logical Plan, Physical
>>> Plan).
>>>
>>> So, for now, we do have to override/exend all DrillLogicalRel.
>>>
>>>
>>> On Mon, Nov 23, 2015 at 4:55 PM, Julian Hyde <jhyde@apache.org> wrote:
>>>> I’m not sure what properties / behavior you want to override but
>>> remember that Calcite specifies a lot of brings as traits or metadata.
>>>>
>>>> For example, “double RelNode.getRows()" is deprecated and you would
>>> these days use RelMetadataQuery.getRowCount(). You would not need to
>>> sub-class a RelNode to override its row-count estimate, just supply a
>>> different metadata provider.
>>>>
>>>> Julian
>>>>
>>>>
>>>>> On Nov 23, 2015, at 4:50 PM, Jacques Nadeau <jacques@dremio.com>
wrote:
>>>>>
>>>>> Yes, my suggestion is removal of DRILL_LOGICAL. @Hsuan, this is
>>> independent
>>>>> from the number of phases and I'm not suggesting changing that.
>>>>>
>>>>> My main thought was: if we only need to override one or two rels, do
>>> only
>>>>> that rather than having a wholesale copy of every operator with a bunch
>>> of
>>>>> basic noop rules.
>>>>>
>>>>> --
>>>>> Jacques Nadeau
>>>>> CTO and Co-Founder, Dremio
>>>>>
>>>>> On Mon, Nov 23, 2015 at 4:37 PM, Jinfeng Ni <jinfengni99@gmail.com>
>>> wrote:
>>>>>
>>>>>> @Jacques, are you talking about removing the convention DRILL_LOGICAL?
>>>>>>
>>>>>> DrillRel extends Calcite's LogialRel. It overrides some LogicalRel's
>>>>>> methods, and adds new methods.  Therefore, even we remove
>>>>>> DRILL_LOGICAL convention, we still have to maintain a set of extended
>>>>>> class from Calcite Logical. I'm not clear what benefit we would get
by
>>>>>> removing the DRILL_LOGICAL convention.
>>>>>>
>>>>>> If we want to remove the complete set of DrillLogical classes, then
>>>>>> I'm not sure where we put the Drill specific logic, for instance,
>>>>>> Drill Join has certain restriction different from Calcite Join.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Nov 23, 2015 at 4:11 PM, Hsuan Yi Chu <hyichu@maprtech.com>
>>> wrote:
>>>>>>> My understanding is:
>>>>>>> In logical planning, we determine the "structure" of the tree
(e.g.,
>>> join
>>>>>>> order)
>>>>>>> And then in physical, we determine the implementation (e.g.,
merge vs
>>>>>> hash
>>>>>>> join).
>>>>>>>
>>>>>>> This staging seems clean to me. So what is the motivation to
merge
>>> them
>>>>>> all
>>>>>>> together?
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Nov 23, 2015 at 2:51 PM, Jacques Nadeau <jacques@dremio.com>
>>>>>> wrote:
>>>>>>>
>>>>>>>> Anybody think we should just get rid of Drels (Rel > Drel
> Prel) and
>>>>>> use
>>>>>>>> Calcite's logical representation directly (Rel > Prel)?
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jacques Nadeau
>>>>>>>> CTO and Co-Founder, Dremio
>>>>>>>>
>>>>>>>> On Mon, Nov 23, 2015 at 1:57 PM, Mehant Baid <baid.mehant@gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Currently all rules based on Calcite logical rels and
Drill logical
>>>>>> rels
>>>>>>>>> are put together and are fired together. As part of DRILL-3996,
>>>>>> Jinfeng
>>>>>>>>> will break it down into different phases. I should be
able to take
>>>>>>>>> advantage of this and move the directory based partition
pruning to
>>>>>> fire
>>>>>>>>> based on Calcite rels.
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>> Mehant
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 11/23/15 10:58 AM, Hanifi GUNES wrote:
>>>>>>>>>
>>>>>>>>>> The general idea of multi-phase pruning makes sense
to me. I am
>>>>>>>> wondering,
>>>>>>>>>> though, are we referring to introducing a new planning
phase before
>>>>>> the
>>>>>>>>>> logical or separating out the logic so as to make
directory pruning
>>>>>> kick
>>>>>>>>>> off ahead of column partitioning?
>>>>>>>>>>
>>>>>>>>>> 2015-11-23 10:33 GMT-08:00 Mehant Baid <baid.mehant@gmail.com>:
>>>>>>>>>>
>>>>>>>>>> As part of DRILL-3996 <
>>>>>> https://issues.apache.org/jira/browse/DRILL-3996
>>>>>>>>>
>>>>>>>>>>> Jinfeng mentioned that he plans to move the directory
based
>>> pruning
>>>>>>>> rule
>>>>>>>>>>> earlier than column based pruning. I want to
expand on that a
>>>>>> little,
>>>>>>>>>>> provide the motivation and gather thoughts/ feedback.
>>>>>>>>>>>
>>>>>>>>>>> Currently both the directory based pruning and
the column based
>>>>>> pruning
>>>>>>>>>>> is
>>>>>>>>>>> fired in the same planning phase and are based
on Drill logical
>>>>>> rels.
>>>>>>>>>>> This
>>>>>>>>>>> is not optimal in the case where data is organized
in such a way
>>>>>> that
>>>>>>>>>>> both
>>>>>>>>>>> directory based pruning and column based pruning
can be applied
>>>>>> (when
>>>>>>>> the
>>>>>>>>>>> data is organized with a nested directory structure
plus the
>>>>>> individual
>>>>>>>>>>> files contain partition columns). As part of
creating the Drill
>>>>>> logical
>>>>>>>>>>> scan we read the footers of all the files involved.
If the
>>> directory
>>>>>>>>>>> based
>>>>>>>>>>> pruning rule is fired earlier (rule to fire based
on calcite
>>> logical
>>>>>>>>>>> rels)
>>>>>>>>>>> then we will be able to prune out unnecessary
directories and save
>>>>>> the
>>>>>>>>>>> work
>>>>>>>>>>> of reading the footers of these files.
>>>>>>>>>>>
>>>>>>>>>>> Thanks
>>>>>>>>>>> Mehant
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>

Mime
View raw message