drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Hyde <julianh...@gmail.com>
Subject Re: Approach for local ordering in planning
Date Mon, 26 Jan 2015 04:25:21 GMT
You can definitely define new traits (RelTraitDef instances). I
believe you can also add instances of these user-defined traits to
your RelNodes. If not, you should be able to.

Given that, the framework is flexible enough to allow people to choose
whether they want to combine ordering and partitioning or keep them
separate. You could have an ordering trait, a partitioning trait, and
an ordering+partitioning trait, and a particular run of the planner
could have any subset of these enabled.


On Tue, Jan 20, 2015 at 8:48 AM, Aman Sinha <asinha@maprtech.com> wrote:
> I believe keeping ordering and partitioning as separate traits gives more
> flexibility.  Combining them might preclude certain types of plans.  For
> instance, in many systems the assumption is any type of distribution
> destroys sortedness of the data, so a re-sort is needed after distribution
> (i.e just doing a merge is not enough, although Drill does actually
> preserve sortedness, so it does a merge).   Without knowing what the
> combined trait would look like, I have a feeling that it will be
> constraining for certain plans.
>
> Separately, I think the optimizer should allow for adding new traits..for
> instance compression.  Input streams may be hash/roundrobin distributed
> and/or ordered and/or compressed.
>
> Aman
>
> On Mon, Jan 19, 2015 at 12:55 PM, Julian Hyde <julianhyde@gmail.com> wrote:
>
>> We have discussed before whether ordering and partitioning should be
>> distinct traits or the same trait. I was (still am) ambivalent about it.
>> I’ve been having some discussions with the Hive team, and it looks as if
>> they will make ordering & partitioning the same trait.
>>
>> Julian
>>
>> On Jan 19, 2015, at 9:51 AM, Jinfeng Ni <jinfengni99@gmail.com> wrote:
>>
>> > For the case of "partition by x sort by y", I think planner currently
>> keeps
>> > the partition / sort in separate trait;  "partition by x" as a
>> distribution
>> > trait, "sort by y" as a collation.  Distribution trait has higher
>> priority
>> > than the sort collation. Drill's physical operators will have both those
>> > traits, when doing planning work.
>> >
>> >
>> > On Sun, Jan 18, 2015 at 9:33 PM, Jacques Nadeau <jacques@apache.org>
>> wrote:
>> >
>> >> In planning we currently state collation as total ordering. In some
>> cases
>> >> it would be useful to create a concept of local ordering. For example,
>> >> partition by x then sort by y.  Does anyone have any thoughts on how we
>> >> should define this in terms of traits/physical properties? The syntax
>> would
>> >> realistically only apply to ctas or as a description of existing files
>> so I
>> >> think we shouldn't need to enhance the language beyond those locations.
>> >>
>> >> J
>> >>
>>
>>

Mime
View raw message